[Beowulf] RAM ECC errors
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Henning Fehrmann henning.fehrmann at aei.mpg.deMon Feb 22 06:04:14 PST 2010
- Previous message: [Beowulf] which mpi library should I focus on?
- Next message: [Beowulf] Re: RAM ECC errors (Henning Fehrmann)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello, we started monitoring the rate of correctable errors appearing in the RAM. We also observed few uncorrectable errors. The corresponding kernel module 'edac_core' can cause a Kernel Panic when such an event occurs, which makes sense to avoid corrupted results. Is there a way to get some useful information before the kernel panics? In particular are we looking for the process list to find out which user was running what before the UE errors occurred. Thank you. Cheers, Henning
- Previous message: [Beowulf] which mpi library should I focus on?
- Next message: [Beowulf] Re: RAM ECC errors (Henning Fehrmann)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
