[Beowulf] Geriatric computer does not stay up
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Eric Thibodeau kyron at neuralbs.comMon Dec 21 11:05:45 PST 2009
- Previous message: [Beowulf] Geriatric computer does not stay up
- Next message: [Beowulf] Geriatric computer does not stay up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This smells like the hell I went through when one of the CPUs needed to be changed in our dep's Tyan VX50... Try swapping CPUs if you have spares. ET On 2009-12-16, at 5:36 PM, Jack Carrozzo wrote: > I assume you've done this but forgot to mention it in the email - did > you test the RAM? > > -Jack Carrozzo > > On Wed, Dec 16, 2009 at 5:27 PM, David Mathog <mathog at caltech.edu> wrote: >> So we have a cluster of Tyan S2466 nodes and one of them has failed in >> an odd way. (Yes, these are very old, and they would be gone if we had a >> replacment.) On applying power the system boots normally and gets far >> into the boot sequence, sometimes to the login prompt, then it locks up. >> If booted failsafe it will stay up for tens of minutes before locking. >> It locked once on "man smartctl" and once on "service network start". >> However, on the next reboot, it didn't lock with another "man smartctl", >> so it isn't like it hit a bad part of the disk and died. Smartctl test >> has not been run, but "smartctl -a /dev/hda" on the one disk shows it as >> healthy with no blocks swapped out. Power stays on when it locks, and >> the display remains as it was just before the lock. When it locks it >> will not respond to either the keyboard or the network. (The network >> interface light still flashes.) There is nothing in any of the logs to >> indicate the nature of the problem. >> >> The odd thing is that the system is remarkably stable in some ways. For >> instance, the PS tests good and heat isn't the issue: after running >> sensors in a tight loop to a log file, waiting for it to lock up, then >> looking at the log on the next failsafe boot, there were negligible >> fluctuation on any of the voltages, fan speeds, or temperatures. It >> will happily sit for 30 minutes in the BIOS, or hours running memtest86 >> (without errors). The motherboard battery is good, and the inside of >> the case is very clean, with no dust visible at all. Reset the BIOS but >> it didn't change anything. >> >> Here are my current hypotheses for what's wrong with this beast: >> >> 1. The drive is failing electrically, puts voltage spikes out on some >> operations, and these crash the system. >> 2. The motherboard capacitors are failing and letting too much noise in. >> The noise which is fatal is only seen on an active system, so sitting >> in the BIOS or in Memtest86 does not do it. (But the caps all look good, >> no swelling, no leaks.) It will run memtest86 overnight though, just in >> case. >> 3. The PS capacitors are failing, so that when loaded there is enough >> voltage fluctuation to crash the system. (Does not agree very well with >> the sensors measurements, but it could be really high frequency noise >> superimposed on a steady base voltage.) >> 4. Evil Djinn ;-( >> >> Any thoughts on what else this might be? >> >> Thanks. >> >> David Mathog >> mathog at caltech.edu >> Manager, Sequence Analysis Facility, Biology Division, Caltech >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] Geriatric computer does not stay up
- Next message: [Beowulf] Geriatric computer does not stay up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
