[Beowulf] Differenz between a Grid and a Cluster???

Thu Sep 22 04:57:45 PDT 2005

Leif Nixon writes:

> Robert G. Brown <rgb at phy.duke.edu> writes:
> 
>> Some of the gridware packages do exactly this -- you don't distribute
>> binaries, you distribute tarball (or other) packages and a set of rules
>> to build and THEN run your application.  I don't think that any of these
>> use rpms, although they should -- a well designed src rpm is a nearly
>> ideal package for this.  Similarly it would be quite trivial to set up a
>> private yum repository to maintain binary rpm's built for at least the
>> major hardware architectures and simply start a job with a suitable yum
>> install and end it with a suitable yum erase.
> 
> But then you need either to:
> 
> - Give users root access to install their rpms system-wide. Ouch.

Or just "give" users the nodes for the duration of their computations.
This is one of the ideas of e.g. COD -- computing on demand and related
schemes. Depending on the granularity of the computation scheduling
(that is, do they get the node for a full day?  A week?  A minute?) the
five minutes or so it takes to reboot a node into a pristine image can
be an acceptable startup cost.

So sure, they can have root, but when they're done Poof, the node is
clean again.  Stuff like warewulf makes this REALLY easy -- it takes
only a few minutes to build a clean node image and with a big disk you
could actually maintain a library of clean node images customized to
different purposes.  Users just install their packages into what amounts
to a ramdisk and their efforts for good or evil just go where the lights
go when the lights go out upon any sort of node interruption.

Also, since "installation" is a matter of dropping their packages into a
queue along with a web-authenticated set of instructions, the only
question is whether and how you trust the rpm contents.  You don't give
the users root privileges, you install their (possibly gpg signed
according to a database of keys for an additional layer of
authentication) rpms for them as root.  And then see below.

> - Let users maintain private rpm databases (in their home directories,
>   for example). But then you lose the ability to track dependencies on
>   the system-wide rpm database - rpm can only handle one database at a
>   time.

Instead of using --root, I think one can use --relocate and install
things built for installation in e.g. /usr/bin, /usr/share/man, /usr/lib
into /home/user/usr/bin, /home/user/usr/share/man..., follow the install
with a chown -R user.group (to eliminate any suid root apps they might
have tried to sneak in that way -- they can suid themselves all they
want).  That gives their installs the Power of Yum to manage
dependencies -- although yum might need to be hacked with a new plugin
to manage the relocate part, since I think Seth was (for pretty good
reasons) violently opposed to relocation for rpms that weren't built for
it...

> - Use some virtualization stuff to give users their own virtual
>   machine.

This is along the same lines as what I suggest above, but sounds more
permanent -- a chroot jail image inside a semipermanent node image?

This is all precisely the kind of thing that gridware is designed to
manage.  The only solution I've actually seen so far worked on the basis
of tarballs or something, and IT was REALLY REALLY UGLY -- basically
"no" dependency resolution and infinite problems.  Worse than slackware,
that kind of thing.  I'm trying to remember the name of the packaging
system but drawing a blank -- I'll probably have to look it up.  It was
the ATLAS grid project, though -- there are members on the list and some
of them are probably struggling with it and cursing even now.

So yeah, it is very different from what we are used to in controlling
local clusters or LAN workstations, but there are solutions.  With a bit
of effort, one could, I think, even craft robust and secure solutions,
more than one way.

One thing that works in the favor of grid managers is that in most
dedicated function grid environments the users are relatively carefully
selected, grant supported, and can be brought to task and held
accountable in lots of extremely unpleasant ways.  Screwing up the
resource deliberately or through failure to follow the rules can get you
kicked off the resource, which in turn can EASILY be a career ending
move for grad students or postdocs and isn't all that healthy a move for
PIs.

"Consortium" grids or campus grids made out of gluing together a lot of
standalone clusters with their own "owners" have to use a different
standard, as their users are sometimes more elusive and harder to track,
but not THAT much harder to track or hold accountable.  And in all cases
a well-designed, well-managed grid node is at most a five minute clean
reinstall away from a pristine node image, so you can leave any problems
behind in the time it takes to go for a cup of coffee.  Or maybe a beer.

   rgb

> 
> -- 
> Leif Nixon                       -            Systems expert
> ------------------------------------------------------------
> National Supercomputer Centre    -      Linkoping University
> ------------------------------------------------------------
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050922/ba966e14/attachment.sig>