[Beowulf] What services do you run on your cluster nodes?
becker at scyld.com
Tue Sep 23 10:03:29 PDT 2008
On Mon, 22 Sep 2008, Perry E. Metzger wrote:
> Prentice Bisbal <prentice at ias.edu> writes:
> > The more services you run on your cluster node (gmond, sendmail, etc.)
> > the less performance is available for number crunching, but at the same
> > time, administration difficulty increases. For example, if you turn off
> > postfix/sendmail, you'll no longer get automated e-mails from your
> > system to alert you to a problem.
> If a machine isn't sending out more than, say, 20,000 email
> messages an hour, you won't notice the additional load Postfix puts on
> a modern machine with any reasonable measurement tool.
> FYI, a modern box running postfix can handle millions of messages per
> hour before it starts getting into trouble.
The overall load isn't the issue, it's the scheduling interference.
If you have a dozen nodes working on a fine-grained, lock-step
computation, nodes taking a millisecond off every second isn't noticed.
If you have a few hundred nodes working on the problem, that millisecond
is a huge problem.
We recognized this effect over a decade ago. It was a motivation
when we designed the Scyld cluster system in early 2000, and was a key
point when we started talking about it back then. The effect has been
independently discovered many times, but I think that we have one of the
We solved the problem by using a full featured, fully-installed head
("master") node that ran all standard services, and having the rest of the
nodes be start-from-zero compute slaves that don't run anything but the
application. This is much different than "what can I eliminate" mindset.
Designs that start from a full install and strip it down often eliminate
too much, or don't understand that unused "idle" things aren't really
Idle daemons frequently wake up, look around, and go back to
sleep. Look at the research that has gone into making the Linux kernel
"tick free". The focus has been on power savings rather than HPC, but
their findings provide third-party confirmation. They eliminated periodic
timer ticks, instead using a countdown timer to wake the kernel only when
needed. Except that so many things wake up, look around, and go back to
sleep that they didn't see much savings!
The secondary effects are the real cost, and they are difficult to
directly measure. Every time a daemon wakes, it kills application ...
uhmm "momentum". It flushes a bunch of cache lines, and PTE lookaside
entries. It might kick out a few pages and D-cache entries. These might
break up application I/O that could otherwise be coalesced into a big
request. How much time does all this cost? Well, much of the time not
very much. But occasionally the coincidences stack up and
become really expensive. Like a single driver stopping during rush-hour
traffic, the whole cluster-wide app stops.
Next posting: how the app itself can be the cause of slow-downs, and why
cluster-specific nameservices and why library/executable memory
"wire-downs" solve problems.
Donald Becker becker at scyld.com
Penguin Computing / Scyld Software
Annapolis MD and San Francisco CA
More information about the Beowulf