[Beowulf] Re: Not quite Walmart, or, living without ECC?

David Mathog mathog at caltech.edu
Wed Nov 28 10:41:10 PST 2007


Joe Landman <landman at scalableinformatics.com> wrote


>    We have been using some GAMESS runs for about 3 years now.  Causes 
> systems to generate MCEs at prodigious rates if the memory system is 
> flaky.

I've started a thread in the memtest86+ forum here:

  http://forum.x86-secret.com/showthread.php?t=7739

for discussing memory errors found by other methods which are not
detected by memtest86+.  That would probably be a better place to put
further observations than here.

I must apologize for one sentence in my first post there. I used a
phrase which did a really poor job of conveying what I meant.   There
is no "edit" on that site, so I can't fix it.  Where it said "assuming
these reports are correct" I didn't mean to imply that any of you
weren't seeing memory errors with these other methods that previous
memtest86+ tests missed.  What I meant was that "not found by
memtest86+" wasn't very well defined, either in terms of which test
modes were run or for how long.

Somebody reading this thread cannot know from what has been
posted so far if memtest86+ might not have flagged the memory as bad if
ECC or cache were disabled, if bit fade tests were run, or if one of
these modes was run for twice as long.  The best way to settle that
issue is to try running these non-default modes on a system which has
already been shown to have bad memory by the other programs.  That will
show if those memtest86+ configuration changes are enough to allow it to
detect errors.  I don't currently have such a bad system, but if one of
you does, please take a moment to investigate this issue.

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list