[Beowulf] Remote console management

David Mathog mathog at mendel.bio.caltech.edu
Fri Sep 23 08:25:43 PDT 2005


> We don't need the power strips.  The inexpensive IPMI cards let
> us to both hard and soft power cycle.

Those power strips are relatively expensive, but as a way
to remotely force a boot on a node they are cheaper per
node than the IPMI cards. Consider the following scenario: 
a node fails and you're on the other side of the planet.
Being able to force a hard power cycle is all that's
required to diagnose the problem if the nodes are set to boot
on power on and their boot order is PXE then hard disk.
Example:

Force the hard power cycle, leave DHCP NOT loading an OS, so
the node boots from the internal disk.
 1. The node comes up - diagnose from the running machine.
 2. The node fails to come up.  Reset the DHCP entry on that
    node to boot to a diagnostic linux (your choice, something
    that runs SSHD automatically and doesn't try to mount
    the hard drive, which may well be broken.)  Force another
    hard power cycle:
    A:  The diagnostic OS comes up.  Work from there.
    B:  The diagnostic OS does not come up.  You have a hardware
        failure.  You don't know what kind but it hardly matters
        since you cannot fix it remotely.  Admittedly if you had
        the IPMI card or serial port access you might be able to
        remotely diagnose it, for instance, run memtest86.  That
        may or may not be important. If you have a service
        contract it just gets shipped back to the manufacturer
        and it's their problem.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list