[Beowulf] Problems with Dell M620 and CPU power throttling

Bill Wichser bill at princeton.edu
Thu Feb 6 16:28:13 PST 2014


On 2/6/2014 9:30 AM, Aaron Knister wrote:
> Bill Wichser <bill <at> princeton.edu> writes:
>
>> We have tested using c1 instead of c0 but no difference.  We don't use
>> logical processors at all.  When the problems happens, it doesn't matter
>> what you set the cores for C1/C0, they never get up to speed again
>> without a power cycle/reseat.  We believe this to be something related
>> to power.  Maybe current limiting.
>>
>> As I stated yesterday, after a complete chassis power cycle on Tuesday
>> Sept 10, the entire 37 chassis have been outperforming their 2.6GHz
>> ratings flawlessly.  I don't know if this is going to be the solution we
>> have been searching to find but it has certainly been a week and a half
>> of some very happy researchers!
>>
>> Thanks,
>> Bill
>>
>> On 09/19/2013 11:32 AM, Christopher Samuel wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> On 18/09/13 10:49, Douglas O'Flaherty wrote:
>>>
>>>> "Run in C1. C0 over commits unpredictably, then throttles."
>>> I've seen a recommendation in a public Mellanox document of using C1
>>> not C0 when using hyperthreading/SMT, could be related to this..
>>>
>>> - -- 
>>>    Christopher Samuel        Senior Systems Administrator
>>>    VLSCI - Victorian Life Sciences Computation Initiative
>>>    Email: samuel <at> unimelb.edu.au Phone: +61 (0)3 903 55545
>>>    http://www.vlsci.org.au/      http://twitter.com/vlsci
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.12 (GNU/Linux)
>>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>>
>>> iEYEARECAAYFAlI7GPAACgkQO2KABBYQAh9zPQCfeOCdUupjqx7SDeFxQjBWG9NU
>>> FL4AnRYA3zLCNzEVNp0ypiW9KMYp3ohW
>>> =ntfO
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf <at> beowulf.org sponsored by Penguin
> Computing
>>> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> Hi Bill,
>
> I'm wondering if this issue has resurfaced for you after the firmware
> updates and chassis power cycles?
>
> I'm having what sounds to be the same issue but with R320's. So far
> BIOS/iDRAC/Lifecycle controller updates haven't helped but I haven't tried
> physically removing power to the node. I have been doing using the "ipmitool
> power cycle" command to reboot the nodes and get them out of their funk
> (running at 0.2GHz) but that, of course, still leaves part of the chassis
> energized.
>
> Thanks!
>
> -Aaron
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Aaron,

The problem has not resurfaced after the updates and power cycling of 
the chassis themselves.   Just doing the nodes never did help as the 
firmware in the chassis itself was the one which needed the power cycle.

Bill



More information about the Beowulf mailing list