[Beowulf] Not quite Walmart, or, living without ECC?
coutinho at dcc.ufmg.br
Mon Nov 26 13:15:06 PST 2007
I heard that the major source of memory corruption in servers is the memory
And this becomes worse as you add memory sticks.
With 8 memory stics that have 8 chips in both sides, you has 128 chips.
So the main purpose of ECC is correcting bus errors.
2007/11/26, David Mathog <mathog at caltech.edu>:
> I ran a little test over the Thanksgiving holiday to see how common
> random errors in nonECC memory are. I used the memtest86+ bit fade test
> mode, which writes all 1s, waits 90 minutes, checks the result, then
> does the same thing for all 0s. Anyway, this was the best test I could
> find for detecting the occasional gamma ray type data loss event. The
> result: no errors logged in 5 solid days of testing. So this class of
> error (the type ECC would detect and probably fix) apparently occurs
> on these machines at a rate of less than 1 per 840 Gigabyte-hours.
> Possibly the upper limit is half that if data can only be lost
> on 1 -> 0 transition, or vice versa. This assumes the bit fade test
> works, which cannot be independently verified from these results.
> On the web there are references to an IBM study which found 1 bit
> error/256Mb/Month, which would have been (.25 *30 * 24) =
> 1 per 180 Gigabyte-hours. If IBM's numbers held for my hardware
> there should have seen 4 or 5 errors in total. Mine are in a basement
> in a concrete building, perhaps that provided some shielding relative to
> what IBM used for their test conditions.
> The memory was Corsair Twinx1024-3200C2. When first installed all
> of this memory had run for 24 hours with no errors in normal
> memtest86+ testing.
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf