[Beowulf] What services do you run on your cluster nodes?
Perry E. Metzger
perry at piermont.com
Wed Sep 24 07:00:55 PDT 2008
Patrick Geoffray <patrick at myri.com> writes:
> Perry E. Metzger wrote:
>>> You realize that most big HPC systems are using interconnects that
>>> don't generate many or any interrupts, right?
>> Of course. Usually one even uses interrupt pacing/mitigation even in
>> gig ethernet on a modern machine -- otherwise you're not going to get
>> reasonable performance. (For 10Gig, you have to do even uglier
> What Greg is trying to say is that high-speed interconnects used in
> HPC do not raises interrupts at all. Data is delivered directly in
> user-space, and the app (or the communication library) busy polls on
See the message I sent to Larry Stewart a few minutes ago -- no need
for me to repeat myself...
> However, it is only important for large machines with tightly coupled
> codes. For the majority of the cases, it's just being anal.
Even in large machines with very tight coupling, unless you've done
very special things to the kernel, you have no random incoming
interrupts (many devices on modern hardware will demand attention at
intervals a lot more frequently than every few hours even if you
aren't touching them), you've turned off SMM, you're doing no disk
i/o, etc., you have to be a *little* tolerant of timing not being what
you want, because things will get in the way. Not too often, but a lot
more often than every few hours, so if a problem every few hours on
one node in the cluster is an issue, you're going to have trouble on
stock PC hardware.
A Postfix daemon going off at 2am to send out a grep of the logs is
down in the noise compared to that sort of thing. Not that I think
this is the right way to manage a machine -- you want machines sending
each other machine generated and parsed status information -- but I'm
just pointing out an extra daemon doing nothing isn't your biggest
Perry E. Metzger perry at piermont.com
More information about the Beowulf