Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Memory errors poll

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Greg Lindahl lindahl at pbm.com
Mon Mar 30 17:48:26 PDT 2009


On Mon, Mar 30, 2009 at 01:11:20AM -0400, Mark Hahn wrote:
>> /Could those of you running ECC memory give me an updated figure on the
>> number of errors detected/corrected per day per system? /
>
> we replace dimms which show > 1000 corrected ECCs per day
> (or any overflows, for which counts are inaccurate, or any uncorrectable 
> errors.)

These systems are a couple of generations old, right?

I think I have Linux set up to record single-bit errors, and the rate
I get is basically zero oh, uh, 5 terabytes of modern ram, at sea
level.

When I installed some new memory I had a few systems with modest
numbers of single-bit upsets, and the vendor was happy to swap dimms
until the problem went away. I think he also does that during his
factory burn-in.

-- greg





More information about the Beowulf mailing list