[Beowulf] looking for a reference on failure rates
James.P.Lux at jpl.nasa.gov
Mon Mar 7 15:26:17 PST 2005
At 12:03 PM 3/7/2005, Joe Landman wrote:
> Something I can refer to for primary literature for a paper. If it is
> anecdotal, that may be fine as well, though I will have to treat it
> This is largely for microprocessors, disks, networks, etc. General
> digital equipment, with a focus on computers in clusters.
One of our reliability guys recommended this:
E.A. Amerasekera & F.N. Najim, "Failure Mechanisms in Semiconductor
Devices", 2nd Ed., Wiley, NY, 1997
You might also take a look at MIL-HDBK-217F, which provides reliability
math models for just about everything electronic. There might be some
argument about the applicability of this in some instances, but it's
certainly a commonly used document. Chapter 5 talks about
microcircuits. Section 5.8 has the temperature factors (Ea in eV) for
various logical families... CMOS looks like 0.35, BiCMOS and LSTTL are 0.5,
Linears are 0.65
(Converting to life effects, it looks like a 20C rise in temp corresponds
to twice the failure rate for CMOS, and a 20C rise is a 4.9 factor increase
for Linears (10 deg= 2.3))... the actual assembly failure rate will depend
on how many of each kind of part, what temperature they're at, etc.
Something like a doubling per 10C is probably not too far from the overall
effect. It's not going to be 10 times and it's not going to be 10%
There's also a BellCore/Telcordia model which apparently takes into account
burnin and testing. It might be more relevant, depending on the environment.
James Lux, P.E.
Spacecraft Radio Frequency Subsystems Group
Flight Communications Systems Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
More information about the Beowulf