[Beowulf] What services do you run on your cluster nodes?

Joe Landman landman at scalableinformatics.com
Mon Sep 22 19:02:10 PDT 2008

Prentice Bisbal wrote:
> The more services you run on your cluster node (gmond, sendmail, etc.)
> the less performance is available for number crunching, but at the same
> time, administration difficulty increases. For example, if you turn off
> postfix/sendmail, you'll no longer get automated e-mails from your
> system to alert you to a problem.

Does every node need to be running sendmail/postfix?  In most cases, 
nodes should be fairly "dumb", in the sense of having as absolutely 
little as possible actively running.  They largely need little more than 
an authentication service, a login/process start service, a disk service 
(NFS, panfs, glusterfs, ... ...).

> My question is this: how extreme do you go in disabling non-essential
> services on your cluster nodes? Do you turn off *everything* that's not
> absolutely necessary, do you leave somethings running to make
> administration easier?

As long as you have an ssh portal in as root, you should be fine for 
admin.  Though, from an admin point of view, as you scale up the number 
of nodes, you want the admin load to remain constant, that is, not to 
scale with increasing node count.  Moreover, you want to actively reduce 
the number of moving parts, as it were, as you scale up, as moving parts 
tend to break.  These are things like installs, or images.  We have 
customers who occasionally (against our advice) test the limits of their 
"cluster installer".  What is interesting is that they can't 
*successfully* install/image more than about 20-24 successfully at a 
time.  Yes they can install more than that, but no, the systems they 
install that way seem to have some problems which go away at next reload.

Basically as you scale up the system, you want to scale down, if not 
completely eliminate, node level admin.  You definitely don't want the 
nodes to be spending cycles (and therefore power, time, resources) on 
things that they really ought not to spend time on.


> I'm curious to see how everyone else has their cluster(s) configured.

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list