[Beowulf] Problems with Dell M620 and CPU power throttling

Tue Sep 3 07:45:49 PDT 2013

Yes.  Especially when you do not see the power limit normal following 
it.  This says that the cores are limited (by power) but never actually 
receiving enough to then say that they are normal again.

We'd see both on ramp up to C0 state.  limited then normal; limited then 
normal.

Bill

On 09/03/2013 10:40 AM, Brice Goglin wrote:
> Hello,
> I am seeing messages like this quite often on our R720s in dmesg:
> 	CPU16: Core power limit notification (total events = 1)
> Do you think that's related to your problem?
> Brice
>
>
>
>
> Le 03/09/2013 14:44, Bill Wichser a écrit :
>> The solution appears to be BIOS configuration.
>>
>> We had:
>> < SysProfile=perfoptimized
>> < ;ProcPwrPerf=maxperf
>> < ;ProcC1E=disable
>> < ;ProcCStates=disable
>>
>> And changed to:
>> ---
>>   > SysProfile=custom
>>   > ProcPwrPerf=osdbpm
>>   > ProcC1E=enable
>>   > ProcCStates=enable
>>
>> Then added
>> modprobe acpi_cpufreq
>>
>> And we move from BIOS directed power control to OS enabled power
>> control.  While in the old mode we could set the processor states but
>> were unable to see some of the hooks Don had suggested here.
>>
>> Initial results look good.  We have a much better view of what the cores
>> are actually doing using the cpupower command, info we were unable to
>> obtain completely without this BIOS change.
>>
>> I'm not sure about the C1E state being enabled though and will
>> experiment further.
>>
>> Thanks to everyone who offered suggestions.  An extra thanks to Don
>> Holmgren who pointed us down this path.
>>
>> Bill
>>
>>
>> On 08/30/2013 11:23 AM, Don Holmgren wrote:
>>> It might be worth fooling a bit with the cpufreq settings, down in
>>>
>>>       /sys/devices/system/cpu/cpuX/cpufreq
>>>
>>> (where X=cpu#, one per core) To prevent non-thermal throttling, you can do
>>> for each core
>>>
>>>       echo userspace > scaling_governor
>>>       cat scaling_max_freq
>>>       echo 2400000 > scaling_setspeed
>>>
>>> (where substitute the max_freq reported for the 2400000).  For this to
>>> work you need the specific cpufreq driver for your processor loaded.
>>> For our (non-Dell) SB servers it's acpi_cpufreq.  In RedHat, the
>>> cpuspeed service loads the relevent drivers, not sure if there is a
>>> similar service in other distros.
>>>
>>> The above will lock the the cores at the max_freq, although if they get
>>> too hot they will still throttle down in speed.  There are statistics
>>> available on frequency changes from thermal throttling in
>>>
>>>       /sys/devices/system/cpu/cpu0/thermal_throttle/
>>>
>>> although I haven't used them, so I'm not sure about their functionality.
>>>
>>> If you do a
>>>
>>>        modprobe cpufreq_stats
>>>
>>> then a new directory
>>>
>>>        /sys/devices/system/cpu/cpu0/cpufreq/stats
>>>
>>> will show up that has statistics about cpu speed changes.  I'm not sure
>>> whether thermal throttling changes will also show here or not.    On one
>>> of our large Opteron clusters, we had a handful of nodes with somewhat
>>> similar slowdown problems as you are seeing on your SB's.   We now lock
>>> their frequencies, and we monitor
>>> /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans (which give total
>>> number of speed changes), alarming when total_trans is non-zero.
>>>
>>> Don Holmgren
>>> Fermilab
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 30 Aug 2013, Bill Wichser wrote:
>>>
>>>> Since January, when we installed an M620 Sandybridge cluster from Dell,
>>>> we have had issues with power and performance to compute nodes.  Dell
>>>> apparently continues to look into the problem but the usual responses
>>>> have provided no solution.  Firmware, BIOS, OS updates all are fruitless.
>>>>
>>>> The problem is that the node/CPU is power capped.  We first detected
>>>> this with the STREAM benchmark, a quick run, which shows memory
>>>> bandwidth around 2000 instead of the normal 13000 MB/s.  When the CPU is
>>>> in the C0 state, this drops to around 600.
>>>>
>>>> The effect appears randomly across the entire cluster with 5-10% of the
>>>> nodes demonstrating some slower performance.  We don't know what
>>>> triggers this.  Using "turbostat" we can see that the GHz of the cores
>>>> is >= 1 in most cases, dropping to about 0.2 in some of the worst cases.
>>>>   Looking at the power consumption by either the chassis GUI or using
>>>> "impitool sdr list" we see that there is only about 80 watts being used.
>>>>
>>>> We run the RH 6.x release and are up to date with kernel/OS patches.
>>>> All firmware is up to date.  Chassis power is configured as
>>>> non-redundant.  tuned is set for performance.  Turbo mode is
>>>> on/hyperthreading is off/performance mode is set in BIOS.
>>>>
>>>> A reboot does not change this problem.  But a power cycle returns the
>>>> compute node to normal again.  Again, we do not know what triggers this
>>>> event.  We are not overheating the nodes.  But while applications are
>>>> running, something triggers an event where this power capping takes
>>>> effect.
>>>>
>>>> At this point we remain clueless about what is causing this to happen.
>>>> We can detect the condition now and have been power cycling the nodes in
>>>> order to reset.
>>>>
>>>> If anyone has a clue, or better yet, solved the issue, we'd love to hear
>>>> the solution!
>>>>
>>>> Thanks,
>>>> Bill
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>