[Beowulf] What services do you run on your cluster nodes?

Wed Sep 24 10:46:10 PDT 2008

On Tue, 23 Sep 2008, Eric Thibodeau wrote:

> Ashley Pittman wrote:
> > On Mon, 2008-09-22 at 15:44 -0400, Eric Thibodeau wrote:
> >> Ashley Pittman wrote:
> >>> On Mon, 2008-09-22 at 14:56 -0400, Eric Thibodeau wrote:
> >>>       
> >>> If it were up to me I'd turn *everything* possible off except sshd and
> >>> ntp.

You can and should go further than this if you are designing a cluster 
system to scale up to large node counts.

NTP is wrong time synchronization model for compute nodes.  It frequently 
wakes up, and the best you can hope for is that it does nothing.
If it does do something, it's jerking your clock forward and back.
The model you want is pretty much timing a foot race: you don't care if it 
starts exactly at noon, but you do care if someone is adjusting the 
stopwatch while the race is being run.  (Doh, we are supposed to only use 
automotive analogies.  OK, substitute "car race" above.)

Our implementation of time synchronization is simpler to understand and
more suitable.  If the wall-clock time drifts far enough for a human to
notice, it's slammed toward the correct time by a full unit.

That keeps the timestamps on log entries usable (it turns out users hate
compute nodes being five minutes wrong, who woulda' guessed) while letting
you trust the microsecond clock when doing diagnostics.  There isn't any
guessing "was that 10 millisecond gap real, or was NTP mucking with me".  
The clock is either stable or has a huge jump e.g half or full second.

We put the time synchronization in the remote process starting system to 
avoid an extra daemon and extra work.  That system already gets timestamps 
from the control messages, and knows which connection is the cluster 
master.  (You only want to adjust the clock from one head node.)

SSH is slow and expensive for internal cluster process creation.  It does
more than you want and less than you need.  It takes several seconds to
initiate a connection, and always encrypts.  There are good reasons to do
this on a long distance connection, but not within a cluster.  Then when
it does start a process, it's not precise.  It just passes a text string
to a shell.  It doesn't even do easy things to make certain that the
application run is the same as what would run on the originating host.  
It doesn't duplicate the environment variables, verify that the executable
and libraries are the same as the originating host, duplicate the library
link order, put you in the same directory, or provide a general mechanism
for passing credentials.

> >>>  The problem however is the maintenance cost of doing this, it's
> >>> fine if you've only got one cluster and one app but as soon as you try
> >>> to support multiple users on multiple distributions the cost of ensuring
> >>> everything is shut down on all of them skyrockets and it becomes easier
> >>> which is to stick with the status quo :(

That's true for the "strip down a distribution" approach.  You have to 
repeat the work for each new release.  And test it, since an 
updated release might change to rely up additional daemons e.g. DBus.

-- 
Donald Becker				becker at scyld.com
Penguin Computing / Scyld Software
www.penguincomputing.com		www.scyld.com
Annapolis MD and San Francisco CA