Frequency of upsets was Re: [Beowulf] ECC support on motherboards?
James.P.Lux at jpl.nasa.gov
Wed May 14 09:37:38 PDT 2008
At 03:38 PM 5/13/2008, Greg Lindahl wrote:
>On Tue, May 13, 2008 at 03:27:11PM -0700, Jim Lux wrote:
> > Some data from Fermilab with 160 Gbit of DRAM
> > showed 2.5 upset/day. Extrapolating (always
> > dangerous with these kinds of radiation effects
> > data, but I'll plunge in regardless).. that means
> > a workstation with 4-8 Gbyte of DRAM might see an upset per day.
>You can't extrapolate to devices of a different density or made
>with a different process, right?
You can and you can't.
In general, you are combining the overall flux through the device
against the cross-section of the devices. So, if you make the device
with half sized geometry, you get 4 times as many bits in the same
sized die. The odds of that particle hitting a specific bit has been
cut by 1/4, but there's 4 times as many. So, the "upsets/device/unit
time" will probably stay about the same.
But there's other factors too... smaller geometries mean more devices
might get affected in one event.
Different geometries have different sensitivities to particles of a
particular energy. (Consider the neutrino.. lots o' energy, small
cross section for interactions) Big slow heavy ions are very
different than zippy little protons.
However, if you're looking at rough order of magnitudes, and the year
of technology is similar, extrapolating is safe(r); i.e. everything
built from 2002 technology parts tends to have similar technologies
and feature sizes. Be aware that in the space biz, we build stuff
from old parts all the time. For instance, the Phoenix spacecraft
that will land on Mars next week was actually a spare from a 2001
mission, but in turn, was actually spares from the 1998 missions. So
if you see a paper in, say, 2010, talking about the upset behavior of
the Phoenix flight computer, you're talking about parts that were
probably bought in 1995, and based on technology that was matured in
1991 or 1992.
(Here at JPL, we keep those old databooks around.. hiding them from
the office neatness police, of course: "why do you need those dusty
old books, everything is on line, isn't it?" Uh, no, not for parts
made in 1985, so we keep that ancient National Semiconductor databook
printed on the grubby newsprint that is decaying as you read
this) My 1977 National Semiconductor CMOS Databook, with all the
data for the CD4000 and 74C series logic is invaluable,
nothwithstanding that it was printed before many of the engineers
here were born. The old 4000 series CMOS is quite radiation tough
(giant feature sizes!), and, although ESD sensitive, can tolerate
huge voltage ranges. And, they still make it... probably some guy
with an old 3" fab line in a warehouse or something..
More information about the Beowulf