[Beowulf] What services do you run on your cluster nodes?

Lawrence Stewart larry.stewart at sicortex.com
Tue Sep 23 18:58:50 PDT 2008

On Sep 23, 2008, at 9:18 PM, Perry E. Metzger wrote:

> Greg Lindahl <lindahl at pbm.com> writes:
>> On Tue, Sep 23, 2008 at 07:43:19PM -0400, Perry E. Metzger wrote:
>>> As for the daemons, remember that with a proper scheduler, you will
>>> switch straight from an incoming network interrupt to a high  
>>> priority
>>> process that is expecting the incoming packet, and that even works
>>> correctly on some (but not all) Linux kernels. A user process cannot
>>> take priority over other tasks, at least not without someone being
>>> quite deliberate about it.
>> You realize that most big HPC systems are using interconnects that
>> don't generate many or any interrupts, right?
> Of course. Usually one even uses interrupt pacing/mitigation even in
> gig ethernet on a modern machine -- otherwise you're not going to get
> reasonable performance. (For 10Gig, you have to do even uglier
> tricks.)
> However, my argument still holds without any change. Until you
> actually process the packet, which happens in the kernel, userland
> won't see it anyway, and when the kernel processes it, it is free to
> switch to whatever userland process it wishes, and (under normal
> circumstances) it will do the right thing.

I think Greg is talking about HPC interconnects that do OS bypass, and
Perry is talking about the kernel IP stack.  Different things.

IB, Quadrics, Myrinet, and SiCortex stuff does not go through the  
does not interrupt, does not schedule.  Typically the application thread
calling SEND directly interacts with the NIC, and at the other end, the
thread calling RECV directly polls the NIC queue to receive a packet.

A sufficently fancy ethernet controller could do similar things,  
direct" and so forth.

In our code, the fast path from application calling SEND to the  
returning from RECV at the other end is 250 machine instructions.  There
is no time for the kernel to get in the way, no time to switch  
contexts or
address spaces or save registers.

I'm sure the OS kernel is a fine thing.  We throw it an occasional TLB  
miss as
a bribe not to bother the application.  Useful for initialization and  
ECC, but
you wouldn't loan it your car.


More information about the Beowulf mailing list