Help on cluster hang problem...

Josip Loncaric josip at icase.edu
Wed May 30 11:11:25 PDT 2001


"Robert G. Brown" wrote:
> 
> On Tue, 29 May 2001, David Vos wrote:
> 
> > Hmmm.  I've seen Windows do that to enough computers I doubt the problem
> > is the power supply.  Although to make Linux hang like that is usually a
> > hardware problem.
> 
> I don't think it is Linux or Windows -- I think it is just a mismatch
> between the power supply capacity and the hardware configuration.

I've seen both types of failures: (1) insufficient power supply capacity
(fixed by upgrading to 400W power supplies) and (2) total machine crash
where even the power button (pressed >5sec) did not work (rare but not
fixed; typically caused by malfunctioning applications using VIA
userspace access to devices).

ATX power is under software control.  In the case (1) the power supply
can drop its 'power good' signal and the machine shuts off.  In the case
(2) the CPU fails to tell the power supply to shut off.

The power switch is just a momentary contact switch, which the PC is
supposed to read, and then interpret the length of time the switch was
closed as 'suspend' or 'power off' requests (this is usually defined in
BIOS), then send the appropriate signal to the power supply.  When the
machine is totally crashed, this process cannot be carried out as
intended.  Unfortunately, inexpensive ATX power supplies seldom include
normal power switches.  Of course, pulling the power cord always works
:-)

Sincerely,
Josip

-- 
Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134




More information about the Beowulf mailing list