[Beowulf] Remote console management
mathog at mendel.bio.caltech.edu
Fri Sep 23 08:25:43 PDT 2005
> We don't need the power strips. The inexpensive IPMI cards let
> us to both hard and soft power cycle.
Those power strips are relatively expensive, but as a way
to remotely force a boot on a node they are cheaper per
node than the IPMI cards. Consider the following scenario:
a node fails and you're on the other side of the planet.
Being able to force a hard power cycle is all that's
required to diagnose the problem if the nodes are set to boot
on power on and their boot order is PXE then hard disk.
Force the hard power cycle, leave DHCP NOT loading an OS, so
the node boots from the internal disk.
1. The node comes up - diagnose from the running machine.
2. The node fails to come up. Reset the DHCP entry on that
node to boot to a diagnostic linux (your choice, something
that runs SSHD automatically and doesn't try to mount
the hard drive, which may well be broken.) Force another
hard power cycle:
A: The diagnostic OS comes up. Work from there.
B: The diagnostic OS does not come up. You have a hardware
failure. You don't know what kind but it hardly matters
since you cannot fix it remotely. Admittedly if you had
the IPMI card or serial port access you might be able to
remotely diagnose it, for instance, run memtest86. That
may or may not be important. If you have a service
contract it just gets shipped back to the manufacturer
and it's their problem.
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the Beowulf