[Beowulf] Memory errors poll
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Greg Lindahl lindahl at pbm.comMon Mar 30 17:48:26 PDT 2009
- Previous message: [Beowulf] Memory errors poll
- Next message: [Beowulf] Memory errors poll
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Mar 30, 2009 at 01:11:20AM -0400, Mark Hahn wrote: >> /Could those of you running ECC memory give me an updated figure on the >> number of errors detected/corrected per day per system? / > > we replace dimms which show > 1000 corrected ECCs per day > (or any overflows, for which counts are inaccurate, or any uncorrectable > errors.) These systems are a couple of generations old, right? I think I have Linux set up to record single-bit errors, and the rate I get is basically zero oh, uh, 5 terabytes of modern ram, at sea level. When I installed some new memory I had a few systems with modest numbers of single-bit upsets, and the vendor was happy to swap dimms until the problem went away. I think he also does that during his factory burn-in. -- greg
- Previous message: [Beowulf] Memory errors poll
- Next message: [Beowulf] Memory errors poll
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
