Memory type? (ECC vs non-ECC) - memory testing

Jeff Layton laytonjb at bellsouth.net
Fri Aug 17 16:45:36 PDT 2001


Thomas R Boehme wrote:

> > i think that the scope of the question for "how certain you are of the
> > results of the computation" is beyond the scope of memory and should also
> > include cpu, disks, io, motherboard, cables, programs, etc..etc..
> >
> That is correct. Like I said, our problems were heat related and not a
> memory problem.
>
> > -- when was the last time you had a memory failure compared to
> >    other things that needed fixing...
> >       - power cable, programming bugs, disk cabling, etc..etc..
> >
>
> Well I don't know - without ECC I have no way of telling when I had the last
> memory failure. And with all the bad cheap memory chips out there, I would
> prefer knowing it. That's why ECC makes sense.

I recommend that if you do have ECC memory, then use the
ECC monitor. It logs the ECC error to the system logs. That
way if you see a few errors, you can quickly check the node.

http://www.anime.net/~goemon/linux-ecc/

Good Luck!

Jeff


>
>
> Sure, it is not the only source. And I do agree that programming bugs are
> probably the biggest problem that can't really be fixed.
> I know my codes have numerous bugs -- I just don't know where :-)
>
> Bye, Thommy
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf







More information about the Beowulf mailing list