[Beowulf] Parallel Development Tools

Robert G. Brown rgb at phy.duke.edu
Thu Oct 18 07:58:04 PDT 2007


On Thu, 18 Oct 2007, Bogdan Costescu wrote:

> On Thu, 18 Oct 2007, Robert G. Brown wrote:
>
>> but on highly INhomogeneous hardware, slow and undependable networks, and 
>> the like -- if it dies
>
> This discussion being on the beowulf list, I can agree with kickstart being 
> used on INhomogenous hardware, but not on slow and undependable networks; how 
> do you intend to run the cluster later with a crappy network ?

Ah, sure, but I personally use kickstart at home as well, where I run a
cluster that is extremely inhomogeneous, as it consists of whatever
hardware I've bought, per year, over the last umpty years.  There is
"professional cluster" -- a big server room where your point is
absolutely correct.  There is "learning cluster" where it may well not
be correct -- these are the ones I advise e.g. high school students who
are often using leftover networking parts and old computers.  Then there
are "personal/development clusters" like my home cluster, which is
basically a notch up from a learning cluster but way short of a true
pro-grade production cluster.

And even pro-grade production clusters exist (usually) in an
organization-wide LAN, where the ultimate endpoint is a user's desktop
or laptop.  Sure, these aren't always kickstart installed but life is a
lot easier in most environments if they are.  Ultimately, a cluster is
just a specialized LAN and very similar tools function for scalable LAN
installation, maintenance, and management and for cluster installation,
maintenance and management.  The primary differences are only in the
selection of packages, as there are cluster designs where the cluster IS
all the desktops in an ordinary LAN, and in a LAN where everybody is
just browsing the web, reading mail and so on such a LAN permits one to
recover a huge number of otherwise wasted cycles for zero marginal cost.

>> and checkpointing of some sort on the script(s) that finish off the system, 
>> so that if a particular package crashes the install one can just remove it 
>> from the list and restart the package list install to pick up where it left 
>> off and deal with the missing piece later
>
> I think that this is a limitation in updating the RPM database; you make a 
> transaction with a set of packages which has to have dependencies satisfied; 
> if you want to eliminate one package you need to recompute the package set, 
> as removing that one might remove many others pulled in through dependencies 
> trees - so it's not so easily checkpointable.

I agree, but yum has a plugin now called "skip-broken" that can manage
all of that in real-time.  It is, however, a plugin and would need to be
installed before running the second-pass yum, and there is still the
issue of documenting it and being able to deal with it.  As always, I'm
thinking of a specific case where I was installing at home from a
freshly created mirror (so I didn't have to squeeze five or six installs
through a small pipe -- it is much faster to squeeze one mirror, then do
the five or six installs, then maybe update locally, and faster still to
just carry my laptop to Duke and do installs there on a gigibit
backbone:-).  One of the mirrored packages was broken with respect to
the repo dependencies, and it was a total pain to fix.  And of course I
had to do 2-3 installs to get one to come out right.

skip-broken and protectbase plugins are pretty much a walking, talking
solution to most of the real evil that can occur here, but they do rely
on the soon-come two pass install to be useful in the general
installation cycle.

>> Of course this requires a binary and configurational standard at LEAST 
>> through the base install (the kernel, glibc, /etc layout, more base-class 
>> libraries).
>
> ... and packaging. Each of these pieces comes from a package and whatever 
> further-install program runs later will need to deal with other packages and 
> should know about what is installed already. And that's where a big problem 
> lies...

I agree, no question about it.  For a while yum and rpm and apt persons
where talking and working a bit on seeing if they could come up with a
universal and sharable scheme here.  These are the Forces of Good I was
referring to.  However, I'm not in touch with how much progress they've
made with it.  Packaging has room for infinite evil -- dependency loops
or worse -- and often that evil creeps in in spite of distro managers
and testing processes designed to keep them out.  8 to 20 thousand
packages, all contributed by and maintained by volunteers, some of them
obsolete, some of the functional "by accident" but subject to revealed
bugs as stuff changes around it.

It is really like the dancing bears:  It isn't how gracefully they dance
that is amazing, but that they dance at all...;-)

    rgb

>
>

-- 
Robert G. Brown
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone(cell): 1-919-280-8443
Web: http://www.phy.duke.edu/~rgb
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977



More information about the Beowulf mailing list