[Beowulf] Multisocket mainboard hardware problems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Chris Samuel csamuel at vpac.orgThu Jan 15 18:54:41 PST 2009
- Previous message: [Beowulf] Multisocket mainboard hardware problems
- Next message: [Beowulf] Multisocket mainboard hardware problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
----- "Francesco Pietra" <francesco.pietra at accademialucchese.it> wrote: > Therefore, is it any software way to check if the CPUs are fully in > order, including the memory controller? lshw and other software > provided only partial help in my hands. Make sure that you have ECC turned to MAX in your BIOS, on our SuperMicro mainboards that enables scrubs of RAM and CPU caches as well as spotting ECC memory errors. For some reason the SuperMicro BIOS's we've had recently have defaulted to turning ECC off which isn't particularly useful, especially on motherboards that can only take ECC memory! We found that the hard way recently, and you can work that out from the output of dmidecode like this: dmidecode | grep -A7 "Physical Memory Array" | grep "Error Correction"| grep ECC Make sure you're also running mcelog to pull any MCE or ECC hardware reports that the kernel has recorded from the CPUs out to a logfile. We find that running it with the --k8 and --dmi options is important to decode more information about these events. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency
- Previous message: [Beowulf] Multisocket mainboard hardware problems
- Next message: [Beowulf] Multisocket mainboard hardware problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
