[Beowulf] What services do you run on your cluster nodes?

Perry E. Metzger perry at piermont.com
Tue Sep 23 04:26:17 PDT 2008

Greg Lindahl <lindahl at pbm.com> writes:
>> By the way, if you really can't afford for things to "go away" for
>> 1/250th of a second very often, I have horrible news for you: NO
> You haven't done much HPC, have you? Why do you think we build
> interconnects with latencies on order 1 microsecond?

Insult me all you like. If you have any stock Intel architecture based
machine, the machine will vanish into System Management Mode for long
periods. SMM operates below the level of the operating system. The SM
interrupt, which the OS cannot control, comes in, the processor goes
into the protected SM BIOS with its own protected memory and stays
there as long as it likes to do its job and returns. The OS is totally
out of the loop on this.

If you aren't aware of that, well, fine, but it is true. You can call
me ignorant all day and it will still be true.

I've had to build systems that synchronize their clocks to very very
high precisions and it is a big issue in that situation.  You can
easily watch the effect on a machine where you're doing high precision
timings. It is very, very difficult to do hard realtime on PCs because
of this.

So, if you claim that postfix will somehow do something unacceptable
to your latencies and cause our mesh to desynchronize but you dont
know about SM Mode, well, fine, but Postfix won't prevent your machine
from processing interrupts, or prevent your OS from properly switching
to a high priority process following an interrupt, but SMM will and
you can't get rid of it.

