Public slaves

Fri Nov 16 05:45:38 PST 2001

On Thu, 15 Nov 2001, Andrew B. Raij wrote:

> I've heard much about scyld's cluster management tools, so  I thought it
> made senes to stick with scyld and modify things to fit my situation.  If
> I were to use kickstart and a standard linux distro, what would I be
> losing from scyld?

A better way to put it would be what do you need from scyld on your
cluster?  As you say, scyld has cluster management tools and so forth,
but clusters existed for years before scyld and it isn't too hard to set
up a cluster without it.  Indeed, if your cluster is intended to be a
compute farm where you would like folks to be able to log into nodes one
at a time by name to do work (which seems quite possible if you want to
give explicit nodes permanent names) then scyld is quite possibly not
your best bet as it follows the "true beowulf" paradigm of the cluster
being a single-headed virtual parallel supercomputer, where you would no
more login to a node than you would login to a specific processor in a
big SMP box.

I will echo Bill's suggestion as it is how we set up our clusters here
as well (they are primarily compute farms used to run many instances of
embarrassingly parallel code for e.g. Monte Carlo or nuclear theory
computations (generating slices of ongoing collision processes, for
example).

The engineering of the cluster is pretty simple:

   a) Server(s) provide(s) shared NFS mounts to nodes for users, DHCP
for nodes, NFS or FTP or HTTP export of e.g. RH distro and kickstart
files.

   b) Build kickstart file for "typical node".  I can give you one if
you need it that we use here.  We make the nodes relatively "fat",
since they have small local hard disks and "small" local hard disks are
currently so absurdly large that you could drop three or four completely
different OS installations on them and still have room for swap and
twenty GB of user scratch space.  In fact, you could easily install RH
AND scyld on the nodes and select which way you wanted to boot the
cluster at boot time.  It's just a matter of how you choose to
partition -- save 4 GB partition per boot.  The kickstart file specifies
how the node disk is to be laid out, packages to be installed, what (if
any) video support, and more, culminating in a post-install script that
can be run to "polish" the setup -- installing the appropriate
/etc/passwd, /etc/shadow, /etc/group, building /etc/fstab, and so forth.

   c) Set up the dhcpd.conf on the dhcp server.  Here is a typical node
entry for my "ganesh" cluster:

host g01 {
        hardware ethernet 00:01:03:BD:C5:7a;
        fixed-address 152.3.182.155;
        next-server install.phy.duke.edu;
        filename "/export/install/linux/rh-7.1/ks/beowulf";
        option domain-name "phy.duke.edu";
        option dhcp-class-identifier "PXEClient";
}

Note that this maps one MAC address to one IP number (in many cases one
would assign node addresses out of a private internal network space like
192.168.x.x -- these nodes for the time being are publically accessible
and secured like any other workstation).  One defines the name of the
server to be used by name or IP number.  Elsewhere there are global
definitions for things like NIS servers, nameservers, and the like, so
the booted host knows how to resolve the name.  filename gives the path
to the kickstart file that will then direct the install.  If one wishes
to provide it from a web or ftp server, prepend the appropriate http://.
The other options are local (and hopefully obvious in purpose).  This
particular node has PXE booting set up and can be installed by just
turning it on.  Without this, one probably needs a boot floppy from the
matching distribution and a floppy drive per node.

Once these things are set up, one merely boots the system.  If you use a
boot floppy, just enter "ks" at the boot prompt when requested OR cut a
custom boot floppy where ks is the default (I generally do this for
nodes without ks as it means that you don't need a monitor or keyboard
to reinstall).  Otherwise it is pretty much just turn it on.

On a good day, it will boot, find the dhcp server, get an IP number and
identity, and start building, loading, mounting "install" ramdisks as
fast as the network and server load permit.  (If PXE booting, it does
all this but in a somewhat different order as it has to get the boot
kernel over the network first).  It then rips through the kickstart
file's instructions (partition and format the disk, install swap, and
start installing packages).  When finished, it runs the post script,
which can end with instructions to reboot the newly installed node ready
for production.

On a good day, we can reinstall nodes in about 3-4 minutes.  In fact,
when I give folks a tour of our cluster, I generally include a reinstall
of a node just to show them how trivial it is.  We keep (or rather can
build dynamically on demand) a special "install" lilo.conf file on the
systems so that we can even reinstall them remotely from a script --
copy in the install lilo.conf, run lilo, reboot (system installs and
reboots into operational mode).  An impressive display of the
scalability of modern linux distributions, since exactly the same trick
will work for every workstation in an organization.  To manage a
network, one only needs to "work" on servers (as it should be).  nodes,
workstation clients, desktops, all of them should be complete kickstart
boilerplate with minimal customization all encapsulated in a (possibly
host specific) kickstart file.  If one crashes or becomes corrupt or is
cracked, a three or four minute reboot and it has a clean fresh install.

Regarding parallel computing support, of course your kickstart file can
contain e.g. MPI(s) of your prefered flavor(s), PVM, and so forth.  It
can also include at least the standard remote workstation management
tools, e.g. syslogng, and perhaps a few that are more cluster
management/monitoring tools although there is indeed yet a bit of a
dearth of these the mainstream linuces.  You have to decide whether you
are willing to live with these tools in order to have nodes that look
like remote access workstations or would prefer Scyld's paradigm of
nodes that look like multiple processors on a single system (with
matching "single system" management tools).

Or both.  Set it up to boot both ways on demand, and see which one works
better for you.  Neither one is particularly difficult to build and
configure, and the time you save making the truly correct decision for
your enterprise will likely pay for the time you spend figuring out the
truly correct decision to make.

    rgb

> 
> -Andrew
> 
> On Thu, 15 Nov 2001, William T. Rankin wrote:
> 
> > > From: "Andrew B. Raij" <raij at cs.unc.edu>
> > > 
> > > Hi everybody,
> > > 
> > > I'd like to set up a scyld cluster with slaves open to the public
> > > network.  I'd also like each slave to get the same ip of my choosing every
> > > time it is booted and slave ips shouldn't have to be confined to any
> > > specific range.  I understand that doing this is contradictory to the
> > > beowulf design but is it possible?  
> > 
> > What you are talking about is to set up all the nodes as general
> > purpose workstations and using them as a cluster.  This isn't
> > "contrary" to the beowulf design (that's how my first cluster was
> > set up).  It is contrary IIRC to the basic Scyld assumptions.
> > 
> > Have you considered just using kickstart with a standard linux
> > distribution to configure your machines?  Or is there something
> > specific to Scyld that you are interested in?
> > 
> > -bill
> > 
> > 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu