Process Accounting on a Linux Cluster

Robert G. Brown rgb at phy.duke.edu
Fri Oct 27 16:22:58 PDT 2000


On Fri, 27 Oct 2000, Gerry Creager N5JXS wrote:

> Erik Paulson wrote:
> > 
> > On Fri, 27 Oct 2000, Dan Yocum wrote:
> > > Date: Fri, 27 Oct 2000 13:36:34 -0500
> > > From: Dan Yocum <yocum at linuxcare.com>
> > > CC: beowulf at beowulf.org
> > > Sender: beowulf-admin at beowulf.org
> > > Subject: Re: Process Accounting on a Linux Cluster
> > >
> > > Users are only allowed access to a master or I/O node on a cluster and
> > > jobs are submitted from there.  This is an integral part of all batch
> > > and beowulf processing systems.
> > >
> > > Dan
> > 
> > Bah - this is not an integral part, this is merely convenience for the
> > admin.
> 
> A number of us have found that, in fact, for both administrative and
> security reasons, not allowing node login has been a *GOOD* thing.  It's
> not merely convenience.

Yes, this is true.  But to be fair, it is still not "an integral part",
so Erik's main point is valid as well.  In part, this is associated with
the careless use of the word "cluster" in Dan's statement.

This is one of the (somewhat) differentiating things about a "true
beowulf" and a "cluster used to do compute intensive work".  On a Scyld
beowulf, you CAN'T login to a node, which is presumed to be on an
isolated network where the master node is also de facto a firewall.  On
a Scyld beowulf, "logging into" a node is as foreign an idea as logging
into processor 0 or processor 1 on an dual SMP box.

On a distributed compute cluster (like brahma), which might well be a
mix of dedicated nodes and desktops, you can not only can login to
nodes, but they typically run a pretty normal linux except for not
having a monitor and keyboard and have pretty normal (that is, much,
much more) security as they live on an open network with sshd open for
connections.

There are also certainly beowulfs by the "book definition" where the
nodes are login-accessible, although it isn't necessarily the purest
realization of the cluster-is-a-named-pseudosmp-supercomputer concept.
However, beowulfery is a spectrum that at one end is probably Scyld and
at the other end is a bunch of kids running a canned demo in parallel on
their small network of desktop Linux workstations.  The issue isn't just
one of performance or parallel scaling.  In fact, it is entirely
possible to run massive parallel calculations using e.g.  /bin/sh or
/usr/bin/perl and ssh and nfs for job submission, job control, data
collection, and so forth on a cluster that is architecturally a "true
beowulf" and have this be the EASIEST way to proceed.

If one has an embarrassingly parallel task developed and running as a
single threaded application, porting it to PVM or MPI or even installing
MOSIX or some other over-the-counter task distribution system may be way
overkill.  If so, why bother?  I've run my EP MC code on vast clusters
using shell scripts and expect.  Worked fine.  With a ratio of perhaps a
day of CPU to a few seconds of shell-based "IPC" time, my parallel
scaling was, ahem, quite excellent as well.  Embarrassingly parallel
code is IDEAL for clusters.  One WANTS to use clusters where the
parallel scaling is nearly perfectly linear with slope one for the task,
and sometimes have to expend a great deal of energy to accomplish this.
Other times, you don't.

In conclusion, there are good reasons to make nodes login accessible for
some task mixes and environments.  There are good reasons not to in
others.  So perhaps we could leave out the "Bah" part (although to some
extent I sympathize;-) and agree that it isn't merely convenience (as
Gerry correctly points out there are perfectly valid functional and
performance and security reasons to do it, possibly even at the EXPENSE
of the INCONVENIENCE to administrators that can occur the first time a
node hangs at a level that might have been recoverable without driving
in in the middle of the night to do a power cycle reboot IF the node
were network accessible) but ALSO agree that it is by no means a
"requirement" or even necessarily a good idea to permit access to a
cluster "only via a single master node with node logins (direct or
through the master) prohibited".

Peace.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list