Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

intermittent crashing of programs

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Daniel Kidger Daniel.Kidger at quadrics.com
Thu Feb 21 09:46:00 PST 2002


Donald Becker wrote:
>I think of parity errors being connected to NMI as being an obscure
>legacy part of the PC architecture, much like the "A20" line being
>switched by the keyboard controller.  If the backwards compatibility
>broke, no one would notice.



Nope not legacy - just look for example at any brand new Dell Pentium 4
system with RAMBUS ECC memory. 

Any 'multibit errors', generate an NMI. 

Single bit errors in ecc memory get spotted by the BIOS too but the O/S will
not be told - since they are corrected 'on-the-fly' by the hardware on
reading the data. Hence 'memtest' will never detect these single-bit errors.

The other thing to get is 'ecc.o'. This is a kernal module that polls the
motherboard chipset every second - it will show in /proc/ram the single and
multibit errors and will collate them by  memory bank. 

eg.
[dan at fridge8]$ cat /proc/ram
Chipset ECC capability : ECC detection and correction
Current ECC mode : ECC detection and correction
Bank    Size    Type    ECC     SBE     MBE
0       256M    RMBS    Y       202758  0
1       256M    RMBS    Y       0       5
2       256M    RMBS    Y       0       2
3       256M    RMBS    Y       0       0
4       256M    RMBS    Y       0       0
5       256M    RMBS    Y       0       257
6       256M    RMBS    Y       0       0
7       256M    RMBS    Y       0       0



Yours,
Daniel.

--------------------------------------------------------------
Dr. Dan Kidger, Quadrics Ltd.      daniel.kidger at quadrics.com
One Bridewell St., Bristol, BS1 2AA, UK         0117 915 5505
----------------------- www.quadrics.com --------------------



More information about the Beowulf mailing list