[Beowulf] Acceptable rad limits for cluster rooms?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jim Lux James.P.Lux at jpl.nasa.govMon Jun 19 06:02:45 PDT 2006
- Previous message: [Beowulf] Acceptable rad limits for cluster rooms?
- Next message: [Beowulf] Acceptable rad limits for cluster rooms?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 02:21 PM 6/16/2006, Brian Oborn wrote: >The cluster for our Physics department is next to a room that, at the time >of installation, was an empty accelerator hall. However, a new electron >accelerator has been installed and the cluster room is now a mild >radiation area. Before we start considering shielding options, I was >wondering if anybody on this list could offer insight into "acceptable" >radiation limits for normal, rackmount cluster nodes with ECC memory, and >if there are energy threasholds in beta and gamma radiation that might be >significant. You've got a couple issues to worry about. First off, what is "mild"? I would think that if you're in a 5 mrem/yr kind of environment (or whatever that is in SI units), you probably don't have much to worry about. That is, if people can be in there, then the electronics can probably tolerate it. OK, now to the gory details. 1) Total dose effects - Most ICs suffer some sort of change in properties with dose. Optoelectronics is notorious for displacement damage effects. However, the doses where this kind of thing becomes an issue is up in the kilorad area. There's also a particularly annoying property called "Enhanced Low Dose Rate" effects which essentially means that for some parts, the cumulative effects of a low dose rate for some time are greater than getting the whole dose all at once. This, of course makes testing a bit tricky, since you want to zap your parts in the test fixture all at once. 2) Single Event Effects - There's a lot of flavors of this, upsets(bit flips) being but one. There's also "latchup" and "single event gate rupture", etc. These are all because some charged particle hits the IC and deposits charge in the structure, causing some sort of trouble. (A high energy photon could also ionize the silicon on the way as it slows down, too) It might be just changing the state of a stored bit, but it can also be something as catastrophic as upsetting the relative biasing of P and N layers, causing large currents to flow where normally they wouldn't. In this world the usual specification is whether the part is immune at a Linear Energy Transfer (LET) of X MeV/cm, where X is a number greater than some tens. ECC RAM is a way to mitigate just one kind of SEE, the upset (SEU) in a memory area, where a) it's easy to do, and b) there's lots of target area to get hit, and c) the data sits there a long time. Bear in mind, though, that the processor itself is probably pretty susceptible to SEU, and doesn't have ECC internally. Nor is the data and address bus protected. The same applies to most of the peripheral chips. DRAM is considered particularly susceptible, because the storage mechanism is a tiny amount of charge, and there isn't a huge amount of margin in the decision about whether it's a one or zero. Compare this to a big old flipflop in the processor, or a static RAM cell, where the information storage mechanism isn't a packet of charge, but is something like a couple cross coupled transistors, one On and the other Off, with huge energy margins. There's also some historical data: there were some notorious DRAM pacakges where the plastic itself was slightly radioactive; DRAM has always skated near the edge of functioning because they push the density; and some of the early studies of radiation effects on computers focussed on DRAMs, because they were sensitive, and it was easy to tell if there was a problem. (i.e. you can read the data back and see if it's changed) >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875
- Previous message: [Beowulf] Acceptable rad limits for cluster rooms?
- Next message: [Beowulf] Acceptable rad limits for cluster rooms?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
