[Beowulf] What services do you run on your cluster nodes?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Sep 23 03:09:36 PDT 2008
- Previous message: [Beowulf] What services do you run on your cluster nodes?
- Next message: [Beowulf] What services do you run on your cluster nodes?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 22 Sep 2008, Joe Landman wrote:
> Prentice Bisbal wrote:
>> The more services you run on your cluster node (gmond, sendmail, etc.)
>> the less performance is available for number crunching, but at the same
>> time, administration difficulty increases. For example, if you turn off
>> postfix/sendmail, you'll no longer get automated e-mails from your
>> system to alert you to a problem.
>
> Does every node need to be running sendmail/postfix? In most cases, nodes
> should be fairly "dumb", in the sense of having as absolutely little as
> possible actively running. They largely need little more than an
> authentication service, a login/process start service, a disk service (NFS,
> panfs, glusterfs, ... ...).
One can always run xmlsysd instead, which is a very lightweight
on-demand information service. It costs you, basically, a socket, and
you can poll the nodes to get their current runstate every five seconds,
every thirty seconds, every minute, every five minutes. Pick a
granularity that drops its impact on a running computation to a level
you consider tolerable, while still providing you with node-level state
information when you need it.
Just a thought...;-)
rgb
>
>> My question is this: how extreme do you go in disabling non-essential
>> services on your cluster nodes? Do you turn off *everything* that's not
>> absolutely necessary, do you leave somethings running to make
>> administration easier?
>
> As long as you have an ssh portal in as root, you should be fine for admin.
> Though, from an admin point of view, as you scale up the number of nodes, you
> want the admin load to remain constant, that is, not to scale with increasing
> node count. Moreover, you want to actively reduce the number of moving
> parts, as it were, as you scale up, as moving parts tend to break. These are
> things like installs, or images. We have customers who occasionally (against
> our advice) test the limits of their "cluster installer". What is
> interesting is that they can't *successfully* install/image more than about
> 20-24 successfully at a time. Yes they can install more than that, but no,
> the systems they install that way seem to have some problems which go away at
> next reload.
>
> Basically as you scale up the system, you want to scale down, if not
> completely eliminate, node level admin. You definitely don't want the nodes
> to be spending cycles (and therefore power, time, resources) on things that
> they really ought not to spend time on.
>
> Joe
>
>>
>> I'm curious to see how everyone else has their cluster(s) configured.
>>
>
>
>
--
Robert G. Brown Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
- Previous message: [Beowulf] What services do you run on your cluster nodes?
- Next message: [Beowulf] What services do you run on your cluster nodes?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
