[Beowulf] ECC exerciser/exorciser?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at mcmaster.caMon Jan 26 07:30:50 PST 2009
- Previous message: [Beowulf] : Q&A
- Next message: [Beowulf] ECC exerciser/exorciser?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all, we're having some trouble with nodes showing high ECC corrected error (CE) counts. I'm wondering whether you have any wisdom on the following: - first, how would you go about setting a threshold for how high is an acceptable CE count? we by default are using the mce module, which by default polls at 1Hz. my thinking is that if we get overflow events (the multiple error bit is set), then it's too fast. - do you have or know of a good exerciser for testing ECC's? yes, I know about memtest86, but I'm more curious about a load that could be run under linux. my thinking is that ecc's are triggered by bad reads, so something which allocates all memory and then continually reads it would be best. - how about layout of memory -> dimms? take a single page, for example: I presume that the first cacheline (16B) will be "striped" across both channels of one bank (for instance, the first dimm-pair.) is it normal for the 17th byte to begin on the next dimm-pair (csrow)? dmidecode seems to indicate that 8 1GB dimms are mapped to contiguous addresses (which would imply no channel interleaving, which is wrong...) - does "numactl --hardware" work correctly for you? I see something like: available: 2 nodes (0-1) node 0 size: 5375 MB node 0 free: 3550 MB node 1 size: 4095 MB node 1 free: 3874 MB 9470 MB total, which, on a machine with only 8x 1GB dimms is unexpected... thanks for any comments, mark hahn
- Previous message: [Beowulf] : Q&A
- Next message: [Beowulf] ECC exerciser/exorciser?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
