[Beowulf] Fwd: H8DMR-82 ECC error

Greg Lindahl lindahl at pbm.com
Wed Aug 17 13:59:58 PDT 2011


> Memtest was ok, I done 9 cycles without any problems.

You should be using the HPL implementation of the Linpack benchmark
for testing memory. It exercises all of the memory and all of the
cores, and is what most HPC vendors seem to use for node burnin.
There's even a bootable DVD with a kernel with enhanced EDAC that was
mentioned here a while back.

> Hardware Error
> CPU0 Machine Check Exception  4 Bank 2 b200200000000863
> TSC 108dd369444
> Processor 2:40f13 Time 1311847912 Socket 0 APIC 0
> MC2-Status: Uncorredted error, report: yes MisV: invalid
> CPU context corrupt: yes UECC Error
> Bud Unit Error: prefetch/ECC error in data read from NB: local node originated 
> (SRC)
> Transaction type: prefetch (mem access), no timeout, cache level L3/generic. 
> Participating Processors: local node originated (SRC)

And I take it that the location information given here (socket 0, bank
2) isn't useful?

-- greg




More information about the Beowulf mailing list