[Beowulf] Seeing ECC errors since upgraded from Opteron 246 to 275

Greg Lindahl lindahl at pbm.com
Sat Aug 23 16:51:42 PDT 2008


On Wed, Aug 06, 2008 at 02:56:51PM -0500, Jason Clinton wrote:

> We have a tool on our website called "breakin" that is Linux 2.6.25.9
> patched with K8 and K10f Opteron EDAC reporting facilities. It can
> usually find and identify failed RAM in fifteen minutes (two hours at
> most). The EDAC patches to the kernel aren't that great about naming
> the correct memory rank, though.
> 
> Make sure you have multibit (sometimes says 4-bit) ECC enabled in your BIOS.
> 
> http://www.advancedclustering.com/software/breakin.html

I just gave this a try, and it seems to be a very nicely packaged
utility. Thanks for making it available. I've used some similar stuff
before, but this is really easy.

-- greg





More information about the Beowulf mailing list