[Beowulf] Tyan 2466 crashes, no obvious reason why
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduSun Sep 5 13:32:06 PDT 2004
- Previous message: [Beowulf] CLIC 2.0 Questions
- Next message: [Beowulf] Tyan 2466 crashes, no obvious reason why
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
After a few more crashes with nothing in the log files a shell script was run that logged all sensors readings every 10 seconds to a file. When it next crashed (6 hours after a restart) there was no significant difference between any of the numbers, be they voltage, RPM, or Temp. I would have expected that if the power supply or on board voltage regulator was flaking out it would most likely result in noise showing up in sensors - but it didn't. This time I also left a monitor plugged into the node and was greeted by this message on the down machine: CPU 0: Machine Check Exception: 000000000000004 Bank 0: e67aa00000000833 at 000000003f9c8688 Bank 1: f600200000000853 at 00000000001ab948 Kernel panic CPU context corrupt In interrupt handler - not syncing That message must be new though, because when I plugged in that monitor the system had recently crashed, and there was nothing on the screen then. The motherboard capacitors have all been visually inspected and none of them are leaking, bulging, or otherwise showing signs of failure. memtest86 is running now (and for the next 36 hours or so) but if it doesn't find anything, does the console error suggest a region of memory to test more intensively, or a particular test to run in memtest86??? Looks like I'm going to need a bunch of spare parts for a "fun" game of "swap components and wait for the crash"... Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] CLIC 2.0 Questions
- Next message: [Beowulf] Tyan 2466 crashes, no obvious reason why
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
