[Beowulf] Problems with Dell M620 and CPU power throttling
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Fri Aug 30 10:55:43 PDT 2013
Note that the time constant from on-chip sensor and on-chip throttle might be pretty short. You could see intermittent throttling that wouldn't manifest itself with external temperature measurements (or even reading the on-chip sensor) every 10 seconds.
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Mark Hahn
Sent: Friday, August 30, 2013 10:48 AM
To: Bill Wichser
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] Problems with Dell M620 and CPU power throttling
> [root at r2c3n4 thermal_throttle]# ls
> core_power_limit_count core_throttle_count package_power_limit_count
> [root at r2c3n4 thermal_throttle]# cat *
> This was what led us to how the chassis was limiting power. We had
I don't mean to be pedantic, but to me, this is the cpu throttling itself, based on its own temperature readings and power rating. the coretemp module, from its modinfo, seems to be purely on-chip.
/sys/bus/platform/devices/coretemp.0 probably contains some other stuff which might be interesting - for instance, what your *_max values are.
> using redundancy and switched to non-redundant to try and eliminate.
> We believe that we see these messages when the CPU is throttling up in power.
I read the *_limit_count as meaning "18781048 times the core was down-clocked because it exceeded power limits." ie, not "throttling up", though I suppose these things are almost symmetric...
> These are E5-2670 0 @ 2.60GHz. Two per node.
so spec is 115 W and Tcase max 80C. that's not as low a threshold as some chips (67C seems pretty low, for instance).
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf