[Beowulf] precise synchronization of system clocks
becker at scyld.com
Tue Sep 30 11:05:03 PDT 2008
On Mon, 29 Sep 2008, Lawrence Stewart wrote:
> > The IEEE-1588 "Precision Time Protocol" can provide such levels of
> > global clock
> > synchronization.
> That's the one I was trying to remember, but I didn't compose a good
> query and couldn't find it.
> IIRC the NIC timestamps arriving packets right off the wire? We have
> an on-chip logic analyzer gadget that can do that, but the
> synchronization problem we have is
> only to find one-time offsets, so we didn't need to go this deep.
Only a very few NICs add a timestamp at receive time, and the Linux kernel
doesn't have a portable way to extract those timestamps. Even with a
hardware receive timestamp, the number is less useful (and accurate)
than you might initially expect. Some chunk of code really should correct
for inaccuracies in the NIC clock -- apply a offset and linear drift
If you want real accuracy from the timestamp you need to know if it
represents the initial symbol, header, final byte or terminating symbol of
the packet. Oh, and the sending system has to synchronize transmission of
the packet. There is only one NIC I know of that had a "defer
transmission until time T" feature, and it appeared that no one had
actually used/debugged that feature. (It appeared to be intended for
low-rate, higher priority quasi-isochronous traffic, as it was a separate
Back to the original topic: why is there a belief that the we need
accurate time synchronization? The paper referenced was:
> "The Case of Missing Supercomputer Performance: Achieving Optimal
> Performance on the 8,192 processor ASCI Q" (Petrini, Kerbisin and Pakin)
If you read it you find that they started by suspecting the already-known
problem: that the performance hit they were seeing with large-node-count,
lock-step applications was because of scheduling "noise". They were
running a bunch of daemons that were frequently waking up, doing a
trivial amount of work and going back to sleep.
Their first, too-simple tests didn't confirm this. Only when they
re-wrote their tests to use all of the cores on a node busy were they
able to accurately reproduce the effect and confirm that indeed it was
OS daemons (in their case, TruCluster and Quadrics network control)
causing the performance loss.
It's easy to mis-remember what the paper actually says. They addressed
the problem by mapping processes that management nodes kept one core free
for OS daemons and random kernel work.
What the paper DOES NOT say is that you need a globally synchronized clock
to fix the problem. They happened to have Quadrics, which had global
synchronization operations. Given that large, expensive hammer it was
natural to propose using the network to synchronize the execution of the
"noise" (junk jobs), rather than re-think the need for running them at
all. IIRC, it was companies such as Octiga Bay that actually implemented
global-clock gang scheduling of system daemons, again with a network that
implemented global synchronization operations.
Donald Becker becker at scyld.com
Penguin Computing / Scyld Software
Annapolis MD and San Francisco CA
More information about the Beowulf