Good Tutorial for Clusters

Robert G. Brown rgb at phy.duke.edu
Wed May 8 01:47:43 PDT 2002


On Wed, 8 May 2002, Raju Mathur wrote:

> >>>>> "rgb" == Robert G Brown <rgb at phy.duke.edu> writes:
> 
>     rgb> [snip
>     rgb> These days building a "generic" cluster is often not more
>     rgb> complex that installing an out-of-the-box distr, e.g. RH 7.3
>     rgb> (released yesterday, hooray) and hand-picking the beowulfish
>     rgb> packages already therein, such as pvm from the list of
>     rgb> available RPM's.
> 
> Curious: is redhat generically preferred for runing Beowulf, or is
> that your personal choice?  In general (no, I don't want to start
> another distro war here!) is there any particular reason for
> preferring one distribution over another /for running Beowulf/ ?

It is almost certainly not generically preferred.  It is locally
preferred.  After all, here I sit about eight miles away from the RH
corporate office (at least until they finish moving to Raleigh, the
rats! at which point it will be more like 25 miles:-). Duke
(dulug.duke.edu) has one of the primary RH mirrors -- we moved 1.5 TB of
data off our mirror yesterday as folks grazed on 7.3.  We also have a
linux genius named Seth Vidal who has built a fully automated RH
installation site for the campus -- one can install RH onto any system
on campus in about five minutes (plus the time required to navigate the
setup panels), depending on load and bandwidth, and the local installs
come preconfigured to automatically update themselves nightly so we
basically never have unpatched systems on campus (security or functional
updates both).  The install setup fully supports DHCP and kickstart, so
we can install beowulf nodes in about three minutes over 100BT back to
the campus server with no hands at all.  We are thus so damn scalable
that one person, in ADDITION to being the physics department primary
sysadmin, "supports" close to 1000 linux boxes all over campus (Hmmm, I
wonder how many there really are at this point:-).  Of course, the dulug
mailing list and a few other very good linuxoid humans provide
additional support to newbies and others, but the sethbot is legendary
for answering most questions (including some that are truly boneheaded)
some slightly before they are asked.  (I wonder how he DOES that...:-)

RH is also the base for Scyld.

Now, to prevent being Debianized, or Mandrake-curse'd, or SuSE-Q'd, or
Slackwarified, I will openly and freely admit that in all probability
one can create an equally scalable and transparent operation with those
distros, possibly working a bit harder or a bit less hard (aye, that's
the rub:-).  However, we just happen to do RH, and at this point it is
now VERY VERY VERY easy.

The sethbot is working on a rewrite of yup (the yellow dog update tool)
that will make it even easier as well as much faster.  We have real
hopes of being able to yup-update a running system to 7.3, for example,
without having to do a full reinstall (probably will need a reboot, of
course, to manage the new kernel and might need a bit of extra or re
configuration to support new features, but the PACKAGES should all
update correctly without rewriting their existing configurations or
killing /etc, which is very nice).  If this works, we'll probably
require full (re)installs only at major distro releases (8.0, for
example) and even there we're working on ways for a system to do an
automated reinstall to a higher distibution number without losing the
basic configuration data and preserving at least the same optional
packages that were in the previous install.

At that point our scalability will be approaching the theoretical
maximum.  Complete linux idiots will be able to manage a network install
into a reasonably bulletproof configuration, and once installed their
systems will automatically do all of those update-thingies that are so
critical to real security.  Unless they work actively to defeat it,
their system will track all the minor version releases without having to
do anything but reboot post update into the new kernel, and will be ABLE
to do a major version release update without ruining their setup,
although they may have to follow some instructions for that one.

Support will then consist of telling newbies to read the README on how
to install, and developing x.x into the duke release form (we add this
and that, test, and so forth before certifying it for yup-update or
reinstall to all campus hosts).  And bug fixes and answering questions,
of course.  Overall, one person plus a good LUG will indeed be able to
manage all the "wild" users (students and faculty on personal systems)
and do almost all the work required to support departmental operations
BUT the actual management of their network -- this doesn't, of course,
remove the need for departmental administrators, it just makes their job
MUCH easier.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list