Memory type? (ECC vs non-ECC) - memory testing
laytonjb at bellsouth.net
Fri Aug 17 16:45:36 PDT 2001
Thomas R Boehme wrote:
> > i think that the scope of the question for "how certain you are of the
> > results of the computation" is beyond the scope of memory and should also
> > include cpu, disks, io, motherboard, cables, programs, etc..etc..
> That is correct. Like I said, our problems were heat related and not a
> memory problem.
> > -- when was the last time you had a memory failure compared to
> > other things that needed fixing...
> > - power cable, programming bugs, disk cabling, etc..etc..
> Well I don't know - without ECC I have no way of telling when I had the last
> memory failure. And with all the bad cheap memory chips out there, I would
> prefer knowing it. That's why ECC makes sense.
I recommend that if you do have ECC memory, then use the
ECC monitor. It logs the ECC error to the system logs. That
way if you see a few errors, you can quickly check the node.
> Sure, it is not the only source. And I do agree that programming bugs are
> probably the biggest problem that can't really be fixed.
> I know my codes have numerous bugs -- I just don't know where :-)
> Bye, Thommy
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf