[Beowulf] 96 cores in silent and small enclosure

Jon Tegner tegner at renget.se
Tue Apr 13 11:40:11 PDT 2010


Mark Hahn wrote:
>> I find it strange with this rather large temp range, and 55 seems 
>> very low to my experience. Could they possibly stand for something 
>> else? Did not find any description of the numbers anywhere on that 
>> address.
>
> I think you should always worry about any temperature measured on a 
> system that's in the >= 65C range.  as Jim mentioned, the temps
> that matter are actually on-chip and not really accessible - and it's 
> unknown to us what they should be anyway, or how long they can 
> tolerate particular temps.  and whether over-temp failure
> modes would be transient (conductivity in semiconductors changes 
> rapidly as a function of temperature) or gradual (electromigration
> or perhaps the solder-ball problems nvidia had)...
>
> the original question was about wheter 60-65C is a safe operating
> temperature.  I think it's pretty clearly high - whether it's critical
> depends on how it's measured, the specific chip's specs, etc.
> but it's not the sort of operating range I'd be aiming for.
But there should be possible to save money by running hotter. Suppose 
you could accept 10 degrees higher temp, then you would not have to run 
the AC in the room as hard (and AC represents a significant part of the 
operating cost). If the price you pay is that your CPUS will only last 
for 4 years (I'm just speculating here, and for the moment only consider 
the cpu) instead of 10 years it would probably be an economically much 
better option.



More information about the Beowulf mailing list