[Fwd: Re: [Beowulf] ECC exerciser/exorciser?]
landman at scalableinformatics.com
Mon Jan 26 15:33:06 PST 2009
Tony Travis wrote:
> Excluded by a SPAM filter + reposted, by the list owner's request :-)
I am on gmail.com as joe.landman if spam filters are doing bad things ...
> Joe Landman wrote:
>> Mark Hahn wrote:
>>> - do you have or know of a good exerciser for testing ECC's? yes, I
>>> know about memtest86, but I'm more curious about a load that could be
>>> run under
>>> linux. my thinking is that ecc's are triggered by bad reads, so something
>>> which allocates all memory and then continually reads it would be best.
>> Thats memtest. We found it doesn't trigger MCEs, and often will report
>> a system as good, that once it leaves the lab, generates lots of MCEs on
>> customer code. So we run specific codes (GAMESS and others) to burn in
>> the machine.
> Hello, Joe.
> Do you mean Memtester?
There are two that I know of ... memtest and memtest86, one of which is
a fork of the other. While I like both for coarse testing, we run a
bunch of GAMESS runs to burn nodes in. Some folks like HPL for this. I
like large dense matrix computations that pound on the memory subsystem.
> I stress test non-ECC memory in our compute nodes by running 100
> memtester passes on 128MB of the available RAM. This test often reveals
> problems in the memory management system that an initial 24h memtest86+
> burn-in on all the memory on a node doesn't detect. Memtester is a more
This is good to hear (that others find memtest86 and alike don't trigger
the errors that end users/customers see in the field).
> empirical stress test than Memtest86+, but I believe it's more realistic
> and I chose 128MB as typical for the type of jobs running on our system.
I really like running end user code as a test. GAMESS is one, probably
some Gromacs and other similar things (NAMD, BLAST, HMMer) as well.
Combined with Octobonnie, it makes for some really good loads on machines :)
Right now we have customers hammering on JackRabbits using 15-20
simultaneous bonnies over channel bonded gigabit. A little stress test.
I prefer to stress it in lab, because its harder to fix it in the field.
> Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
> and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
> tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
> mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf