[Beowulf] Remote console management

Guy Coates gmpc at sanger.ac.uk
Fri Sep 23 01:28:10 PDT 2005


> Could people on this list please report their experiences with these or
> other approaches?  In particular, does someone have a simple and
> inexpensive solution (say < $100/node) which lets them remotely:
>   - power cycle a machine
>   - examine/set BIOS values
>   - look at console output even for a dead/locked/unresponsive box
>   - ???

We've had good experiences with remote management. We are extremely facist
about it, and won't buy cluster kit without it.

The good news is that the capabilities are a lot more robust than they
used to be, and that different vendors are now getting standardising their
implementations (the movement towards IPMI).

The bad news is that these features tend to be restricted to "server
grade" (ie expensive) hardware.

The systems I've had direct experience with are Dell's DRAC3, HP's ILO and
IBM's blade management module.   IBM also do a management interface for
their non-blade server, but I've not used it.

The systems all work in approximately the same way. Each server (or rack
of blades) has an extra ethernet management interface on it which runs an
embedded webserver and ssh/telnet client. When you want to manage a server
you can connect to the management interface via your preferred protocal,
and do all the usual management things: pull up a console, power on/off
etc.

Access to the BIOS varies; most management interfaces allow you to change
the boot order, but not much else. However, IBM, HP and DELL all provide
linux utilties to upgrade and twiddle the BIOS, so you can do it from
userspace instead of the management interface.

These management interfaces also have a scripting interface nowadays, so
you can run scripts against large number of machine at the same time. This
is extremely useful, as you hardly ever have to do something to just one
machine in a cluster.

>For a large cluster (100+ nodes) and sub $100/node, the cheapest
>solution is to give a PhD or grad student an extra $10k and get a
>small trolley with keyboard/monitor/mouse.

I'm glad I'm not that student when it is time to update the buggy bios on
100+ nodes of cluster!

Guy

-- 
Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 494919









More information about the Beowulf mailing list