[Beowulf] [External] Power Cycling Question

Lux, Jim (US 7140) james.p.lux at jpl.nasa.gov
Mon Jul 19 18:08:32 UTC 2021

On 7/19/21, 9:12 AM, "Beowulf on behalf of Prentice Bisbal via Beowulf" <beowulf-bounces at beowulf.org on behalf of beowulf at beowulf.org> wrote:



    I know they there is a direct relationship between system failure and 
    operating temperature, but I don't know if that applies to all 
    components, or just those with moving parts. Someone  somewhere must 
    have done research on this. I know Google did research on hard drive 
    failure that was pretty popular. I would imagine they would have 
    researched this, too.

In general, it follows the Arrhenius relationship with some TBD exponent.  10C rise ages twice as fast is a common rule of thumb.
There's all sorts of background physics to this - drift of metallization and doping , radiation accumulation, etc.,etc.  

Cycling is a different failure mechanism, and there it's propagation of microscopic defects with each cycle, as well as the more obvious "cracks in solder/PWB trace" kind of thing.  One of the big issues today is the difference in CTE between the chips (or their packages) and the PWB.  Column and Grid arrays that are soldered in have an issue with the corner pins/balls/columns being stressed more than the sides, and any time you have cyclic stress, you have the prospect of work hardening and micro crack propagation.  Sockets with interposers do help with this, because they allow changing misalignment without failure.  OTOH, now you have a socket and interposer, which can fail.


More information about the Beowulf mailing list