[Beowulf] Problems with Dell M620 and CPU power throttling

Tue Sep 3 07:40:37 PDT 2013

Hello,
I am seeing messages like this quite often on our R720s in dmesg:
	CPU16: Core power limit notification (total events = 1)
Do you think that's related to your problem?
Brice


Le 03/09/2013 14:44, Bill Wichser a écrit :
> The solution appears to be BIOS configuration.
>
> We had:
> < SysProfile=perfoptimized
> < ;ProcPwrPerf=maxperf
> < ;ProcC1E=disable
> < ;ProcCStates=disable
>
> And changed to:
> ---
>  > SysProfile=custom
>  > ProcPwrPerf=osdbpm
>  > ProcC1E=enable
>  > ProcCStates=enable
>
> Then added
> modprobe acpi_cpufreq
>
> And we move from BIOS directed power control to OS enabled power 
> control.  While in the old mode we could set the processor states but 
> were unable to see some of the hooks Don had suggested here.
>
> Initial results look good.  We have a much better view of what the cores 
> are actually doing using the cpupower command, info we were unable to 
> obtain completely without this BIOS change.
>
> I'm not sure about the C1E state being enabled though and will 
> experiment further.
>
> Thanks to everyone who offered suggestions.  An extra thanks to Don 
> Holmgren who pointed us down this path.
>
> Bill
>
>
> On 08/30/2013 11:23 AM, Don Holmgren wrote:
>> It might be worth fooling a bit with the cpufreq settings, down in
>>
>>      /sys/devices/system/cpu/cpuX/cpufreq
>>
>> (where X=cpu#, one per core) To prevent non-thermal throttling, you can do
>> for each core
>>
>>      echo userspace > scaling_governor
>>      cat scaling_max_freq
>>      echo 2400000 > scaling_setspeed
>>
>> (where substitute the max_freq reported for the 2400000).  For this to
>> work you need the specific cpufreq driver for your processor loaded.
>> For our (non-Dell) SB servers it's acpi_cpufreq.  In RedHat, the
>> cpuspeed service loads the relevent drivers, not sure if there is a
>> similar service in other distros.
>>
>> The above will lock the the cores at the max_freq, although if they get
>> too hot they will still throttle down in speed.  There are statistics
>> available on frequency changes from thermal throttling in
>>
>>      /sys/devices/system/cpu/cpu0/thermal_throttle/
>>
>> although I haven't used them, so I'm not sure about their functionality.
>>
>> If you do a
>>
>>       modprobe cpufreq_stats
>>
>> then a new directory
>>
>>       /sys/devices/system/cpu/cpu0/cpufreq/stats
>>
>> will show up that has statistics about cpu speed changes.  I'm not sure
>> whether thermal throttling changes will also show here or not.    On one
>> of our large Opteron clusters, we had a handful of nodes with somewhat
>> similar slowdown problems as you are seeing on your SB's.   We now lock
>> their frequencies, and we monitor
>> /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans (which give total
>> number of speed changes), alarming when total_trans is non-zero.
>>
>> Don Holmgren
>> Fermilab
>>
>>
>>
>>
>>
>> On Fri, 30 Aug 2013, Bill Wichser wrote:
>>
>>> Since January, when we installed an M620 Sandybridge cluster from Dell,
>>> we have had issues with power and performance to compute nodes.  Dell
>>> apparently continues to look into the problem but the usual responses
>>> have provided no solution.  Firmware, BIOS, OS updates all are fruitless.
>>>
>>> The problem is that the node/CPU is power capped.  We first detected
>>> this with the STREAM benchmark, a quick run, which shows memory
>>> bandwidth around 2000 instead of the normal 13000 MB/s.  When the CPU is
>>> in the C0 state, this drops to around 600.
>>>
>>> The effect appears randomly across the entire cluster with 5-10% of the
>>> nodes demonstrating some slower performance.  We don't know what
>>> triggers this.  Using "turbostat" we can see that the GHz of the cores
>>> is >= 1 in most cases, dropping to about 0.2 in some of the worst cases.
>>>  Looking at the power consumption by either the chassis GUI or using
>>> "impitool sdr list" we see that there is only about 80 watts being used.
>>>
>>> We run the RH 6.x release and are up to date with kernel/OS patches.
>>> All firmware is up to date.  Chassis power is configured as
>>> non-redundant.  tuned is set for performance.  Turbo mode is
>>> on/hyperthreading is off/performance mode is set in BIOS.
>>>
>>> A reboot does not change this problem.  But a power cycle returns the
>>> compute node to normal again.  Again, we do not know what triggers this
>>> event.  We are not overheating the nodes.  But while applications are
>>> running, something triggers an event where this power capping takes
>>> effect.
>>>
>>> At this point we remain clueless about what is causing this to happen.
>>> We can detect the condition now and have been power cycling the nodes in
>>> order to reset.
>>>
>>> If anyone has a clue, or better yet, solved the issue, we'd love to hear
>>> the solution!
>>>
>>> Thanks,
>>> Bill
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf