Why no rlogin to nodes?

Mon Oct 16 11:04:46 PDT 2000

On Mon, 16 Oct 2000, Walt Dabell wrote:

> So are you saying that I _SHOULD_ run SB? I'm not now, and hadn't
> intended to... There would have to be a monumental reason to switch
> at this point. But for the future cluster... 

If you have a true beowulf, you should at least give it a try.  I know
Don and Erik pretty well (and just met Dan Ridge this weekend:-) and
they are pretty sharp.  As in air molecules spontaneously ionize when
they get near their heads and their feet spark when they walk;-).  I
spent more than an hour picking over its structure with Erik and how
bproc (which he mostly wrote) does this and that.  It is about as close
to "beowulf in a box" as you can get, and if you are concerned with
scalability or efficiency/overhead, it is almost certainly the most
scalable and most efficient setup you are likely to ever see.

One very nice thing about it that I plan to try is that you don't have
to totally commit to it to give it a spin.  I think, from what Erik and
Don said at different times, that it works best if you can give the
distribution a GB or two of hard disk on the "master node", but the
nodes can boot non-destructively from their CD's or floppies (leaving
their current install images intact).  Since the nodes pretty much run
out of ramdisk after boot, they don't need room on any local disk.
Since they don't use NFS (or at least don't have to use NFS) they don't
even need any room on an exported volume.  The nodes are cosmically thin
and don't even support a true login.  I'm hoping to test it in the next
day or so on my home beowulf without altering the current setup of my
nodes/slave workstations in the slightest.

The minimal node hardware configuration sounded like MoBo+CPU+128MB(or
more)+NIC+[floppy OR CD] -- call it $500 in Celeron or Duron packaging;
a hundred or two more in PIII or full Athlon.  You might need a single
video card at first to set the BIOS not to bitch when it discovers no
KVM and to tell it to boot from the CD-ROM, if that's the configuration
you choose (which is the one they prefer, since floppies deteriorate
from dust when sitting open in a floppy drive for six months).  You can
also remote boot the whole thing diskless if you have the appropriate
NIC proms (and save, I supposed the cost of a floppy or CD drive).

Hopefully they'll correct me if I've got any of the above wrong.

So you don't have to do much of anything to try it out, provided that
you've got a tiny bit of free space on a "master" node somewhere.  If
you don't like it, just put lilo back the way it was and delete it.

They made the email cluster at ALSC (systems they'd never seen before)
into a beowulf, and then turned it back into the email cluster by just
pulling out the Scyld CD's and rebooting.  Pretty cool, actually.

> > You do not (and should not) run rshd, rlogind, telnetd and ftpd.  The
> > only service a typical workstation needs to offer these days to enable
> > just about all the kinds of incoming access one requires to support
> > parallel calculations, remote logins, remote file copies (bidirectional)
> > and so forth is sshd.  sshd replaces rshd (but is run standalone, not
> 
> Doesn't ssh have some extra overhead? I assume too that you
> were making that recommendation assuming that I was not running
> a dedicated cluster. Current and future clusters sit behind a mother
> node, future also behind a firewall. Is there an advantage to running
> ssh on such a cluster? I assume not?

Before the Scyld solution, I would have said yes, and would still say
yes on a roll-your-own cluster.  First of all, yes ssh has extra
overhead but a) that is its strength more than its weakness and b) the
overhead is almost certainly negligible to a parallel programmer unless
they're doing something like using the shell to farm out lots of little
programs and manage I/O for them.  If you're usage pattern is "start a
big PVM or MPI calculation and then come back a day later" you hardly
care if the remote shell that starts the local pvmd's on the slave nodes
takes 0.2 seconds vs 0.1 seconds (vs the astronomically small amount of
time it takes bproc to do it, since it doesn't have to fork a shell at
all on the node to run the program).  Even the longest time is still
negligible.  

The reasons I think it is good to have on nodes on roll-your-own
beowulfs are:

  a) It is a pain to put a monitor on a headless node, so you'll want to
be able to run an xterm on the nodes.  ssh facilitates this and even
sets things up so you can run X-GUI commands on the nodes (should you
have/need/prefer an X interface to something) transparently.
  b) ssh allows you to install an /etc/environment file on nodes to set
key environment variables for all users without needing access to their
home directories or .files.
  c) You've got to have SOMETHING to get onto the nodes and ssh is
superior in so many ways to telnet and/or rsh.

You can always turn off encryption if you trust the internal beowulf
network.

However, if you have a trusted internal network beowulf with headless
nodes, I suspect Scyld is the way to go.

   rgb

> 
> Thanks for all the input!
> 
>      Walt Dabell  302-831-1499  walt at bartol.udel.edu
> Computer Facilities Manager - Bartol Research Institute
>                      University of Delaware
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu