[Beowulf] RE: S2466 systems won't reboot after linux poweroff

David Mathog mathog at mendel.bio.caltech.edu
Wed Dec 15 10:01:11 PST 2004


> Problem:  each node in a 20 node beowulf typically will
> not reboot following a linux poweroff command.  Power comes
> back on, but it never even shows the BIOS screens.
> 
> Hardware:
> 
> S2466 MPX mobo

Two of these nodes are flakey and aren't in the compute pool.
These were both upgraded to BIOS v4.06.  This DID resolve
the problem with a "poweroff" followed by "turning power switch
on" not rebooting.  In other words, they now boot as they should
following a poweroff/power switch on cycle.  The oddball
message cited in the first post that comes out the serial line
at the end of "poweroff" remains. 

Tests:

"poweroff" followed by "power switch on": worked 5/5 times

"reboot":  worked 5/5 times

However, the new BIOS didn't make these two nodes any more
stable - they still crash at about the same rate.

Conclusion, it might be worth the effort to upgrade the BIOS
if your cluster is down for some reason anyway.

WARNING1.  All my nodes seemed to "forget" how to read floppy disks.
If the nodes had been up for a while and then were rebooted,
and a known good floppy placed in the drive,
they would NOT boot from it. If, however, while the node was up,
the same floppy was put into the drive and explicitly mounted,
listed, and unmounted a couple of times, THEN on the subsequent
reboot the system could read from the floppy.  I've never seen
this on any other system (Tyan's are just full of suprises :-( ).
Subsequent to the V4.06 upgrade these nodes seem to recognize
the floppy better and so far have not had any problems
rebooting directly from a floppy without the kludge
described above.  However, if you are at Bios V4.03
(which is what they were at, not V4.01 as I had previously
posted) you may have the same problems booting from floppy
in order to do the BIOS upgrades.  So either flash from the
net (I have no idea how) or verify that your floppy drives work
before rebooting the nodes to be upgraded.

WARNING2:  update with:

 >phlash16 244v406.rom

left the BIOS settings as they were.  But:

 >flash

which ran flash.bat, WIPED the BIOS settings.

WARNING3:  These are BIOS settings seem to be equivalent:

              v4.03        v4.06
quickboot     enabled      disabled
diagnostic    disabled     disabled
summary       disabled     disabled

If quickboot is enabled in v4.06 it appears to skip the
BIOS memory test entirely. It boots MUCH faster but you
may have a hard time ever getting in an F2 to get back
to change the BIOS.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech




More information about the Beowulf mailing list