[Beowulf] precise synchronization of system clocks

Robert G. Brown rgb at phy.duke.edu
Tue Sep 30 09:37:50 PDT 2008

On Tue, 30 Sep 2008, Robert G. Brown wrote:

> On Tue, 30 Sep 2008, Lux, James P wrote:
> This is a very nice response, and I think you're on a very good track.
> IIRC from discussion a few years ago, GPS can yield what, microsecond or
> better timing (if used to adjust drift and resync all clocks)?  In
> principle sub-microsecond, since a microsecond is order of 300 meters
> and GPS can get you within 30.
>> The GPS synchronization problem is actually substantially easier.  The
>> propagation delay from satellite to receiver is varying in a very
>> predictable manner (in fact, the nav solution solves for it); the signal is
>> specifically designed for accurate timing (i.e. A PN  code generated from a
>> Cs clock is a darn good way to transmit timing and frequency information)

I hate to reply to myself, but driving into Duke another idea occurred
to me.  One reason NTP sucks is because there is a variable latency that
can range from tens of usec to msec in length depending on all sorts of
uncontrolled state parameters.  Therefore the timestamp placed on the
packet by a timeserver is unpredictably displaced from the time it is
received on a node or LAN client.

However, if one timestamps at the switch, there are no hops.  The
latency between transmission and reception is basically a fairly
predictable interval associated with forming and stamping the packet and
signal propagation in the wire, plus the inverse process at the other
end.  I looked, and lo, this has been thought of:


I have no idea if anyone has built such a beast -- it looks like there
are some patent-pending notices on this.  Also, there are articles
claiming a total latency of 200 nsec in 10 Gb/sec ethernet switches,
which (if true) might permit one to achieve sub-usec syncronization on
top of at least 10Gbe.

If one COULD get a really reliable <1 usec time signal from any master
to all clients, one could actually think about building a really
interesting cluster that e.g. took a master time signal (even on a
dedicated line that did nothing else), handled the interrupt with a
special kernel module that preempted all other tasks (?) and checked its
clock against the timestamp, and then used a modified scheduling
algorithm (maybe?) to ensure that housekeeping chores occur
synchronously across all nodes relative to this clock to the extent
possible.  Use a completely different network for all IPC, and make all
IPC traffic synchronous (taking the frame available for work into

That seems doable -- the first part even pretty easy, although I would
guess that one mucks about with the scheduler at one's peril.  But at
that point one might be able to achieve near-synchronous behavior by
deliberately placing microsleeps into the application code to
"encourage" the kernel to take housekeeping timeslices at the same time,
or perhaps there is a better way.  I tried the same sort of thing in
some of my microbenchmarking code -- (sleep right before a pass through
a test cycle, so that one has the best chance of not being interrupted)
and it helps, but is far from foolproof.  It might require adding
something creative to the kernel to get all the windows in sync.

Of course, I don't do this kind of computing, so this is all somewhat
abstract interest.  But it is worth noting that if one ever gets the
delay from a single time sync source to all nodes to where it is more or
less uniform (within light/propagation speed delays on e.g. different
wire lengths) then a simple self-damping algorithm that drives one to
the mean time (presumed/set to be the same across all nodes after
applying systematic corrections) should let one get well within 1 usec
of synchronized starting with a 1 usec sigma on the source -- narrowing
like 1/sqrt{N} until uncorrected systematic errors dominate.
Nanosecond?  I don't know -- that starts to get into interactions with
CPU microstate, but with a sufficiently disciplined kernel, perhaps.

OK, MUST stop thinking about this now.  But it is a fun problem.  Too
bad I don't have a gazillion dollars and time on my hands...;-)


Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977

More information about the Beowulf mailing list