revamping our beowulf

Robert G. Brown rgb at phy.duke.edu
Fri Sep 6 06:19:04 PDT 2002


On Thu, 5 Sep 2002, Tintin J Marapao wrote:

> Hi All,
> 
> In the lab I work in, we have an 10 node suse linux cluster. its about 2-3
> years old and has started to act really really funky. We are planning to
> overhaul the whole thing, replacing the hard drive of the world node and
> installing a newer version of suse 8.0. Does anyone have any tips before I
> start thrashing it (aside from crossing my fingers?)
> I am actually more concerned about how I can go about cloning the nodes
> efficiently...with the least amount of anxiety

Convert to Red Hat, learn to use kickstart, PXE, DHCP and yum.

With kickstart, PXE, DHCP and yum, one can develop a node "image"
(kickstart recipe) and boot the nodes via PXE/DHCP, kickstart install
them to an identical configuration, and maintain them transparently with
yum.

If your nodes are too old for PXE support in the BIOS, you can
accomplish the same thing with a suitable boot floppy and no PXE.  Boot
from (standard RH netboot) floppy, entering "ks" at the boot prompt.
Node gets identity and directions to KS file and install sources from
DHCP, installs itself, reboots into production with a terminating
"reboot" command in %post.

This approach has many lovely things about it.  All nodes are identical.
All nodes are automagically maintained to REMAIN identical.  All nodes
can be upgraded or reinstalled in about 30 minutes of your time from a
standing start at any time you wish.  If the nodes support PXE, you
don't even have to be there and the time required might be as little as
ten or fifteen minutes.  

I put on dog-and-pony shows with our cluster from time to time, and one
amusing trick is to put an install boot floppy hacked to make ks the
timeout default (a bit of a pain, involving mkinitrd and so forth, but
not horribly difficult) into the drive of an idle node, punch reset,
wait until the floppy stops spinning and remove it (all the while
talking about how easy it is to install and maintain a cluster, wave
one's arms a bit and talk about the importance of scalability in cluster
administration, and have the node finish its reinstallation back to
EXACTLY the same state it started from just as you finish the spiel (a
few minutes depending on how loaded the install server is).

Some friends of mine on this list with Gbit ethernet and fast servers
have installed order of 60 nodes in about 10 minutes, without even using
PXE.

   rgb

> 
> Any input is welcome :)
> 
> Thanks,
> 
> Tintin
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list