[Beowulf] Tyan 2466 crashes, no obvious reason why
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at mendel.bio.caltech.eduFri Sep 3 11:31:30 PDT 2004
- Previous message: [Beowulf] 3com isa and other ethernet adapter : no link
- Next message: [Beowulf] Tyan 2466 crashes, no obvious reason why
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
One of 20 identical nodes containing Tyan 2466 Single Athlon MP 2200+ 1GB ECC memory is starting to flake out. For no apparent reason it just drops dead (as far as linux is concerned) after a few minutes to a few days. At that point the network is down, the serial lines are down, and near as I can tell the OS just blew up. There is zip, nothing, nada in the log files to indicate a problem. I pulled the unit and monitored it closely and it does not seem to be an overheating problem: all the fans are spinning as they should be even after it has crashed. The network port lights are still flashing. After reboot smartctl shows no errors on the hard drive. Running sensors every few seconds in a loop shows nothing odd happening to the voltages or temps or fan speeds up through the last log point before it dies. Running memcheck86 for 10 minutes showed no memory errors. I'm thinking about replacing the power supply (for lack of a better idea.) What else might be causing this??? There's not much in these systems, just the one CPU, a floppy, 1GB RAM and a cheap S3 graphics card (normally not used.) The other 19 (identical) nodes are working reliably. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] 3com isa and other ethernet adapter : no link
- Next message: [Beowulf] Tyan 2466 crashes, no obvious reason why
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
