[Beowulf] precise synchronization of system clocks

Donald Becker becker at scyld.com
Tue Sep 30 11:05:03 PDT 2008

On Mon, 29 Sep 2008, Lawrence Stewart wrote:
> > The IEEE-1588 "Precision Time Protocol" can provide such levels of  
> > global clock
> > synchronization.
> That's the one I was trying to remember, but I didn't compose a good  
> query and couldn't find it.
> IIRC the NIC timestamps arriving packets right off the wire?   We have  
> an on-chip logic analyzer gadget that can do that, but the 
> synchronization problem we have is 
> only to find one-time offsets, so we didn't need to go this deep.

Only a very few NICs add a timestamp at receive time, and the Linux kernel
doesn't have a portable way to extract those timestamps.  Even with a
hardware receive timestamp, the number is less useful (and accurate)  
than you might initially expect.  Some chunk of code really should correct
for inaccuracies in the NIC clock -- apply a offset and linear drift

If you want real accuracy from the timestamp you need to know if it 
represents the initial symbol, header, final byte or terminating symbol of 
the packet.  Oh, and the sending system has to synchronize transmission of 
the packet.  There is only one NIC I know of that had a "defer 
transmission until time T" feature, and it appeared that no one had 
actually used/debugged that feature.  (It appeared to be intended for 
low-rate, higher priority quasi-isochronous traffic, as it was a separate 
transmit queue.)

Back to the original topic: why is there a belief that the we need 
accurate time synchronization?  The paper referenced was:

> "The Case of Missing Supercomputer Performance: Achieving Optimal
> Performance on the 8,192 processor ASCI Q" (Petrini, Kerbisin and Pakin)
> http://hpc.pnl.gov/people/fabrizio/papers/sc03_noise.pdf

If you read it you find that they started by suspecting the already-known
problem: that the performance hit they were seeing with large-node-count,
lock-step applications was because of scheduling "noise".  They were 
running a bunch of daemons that were frequently waking up, doing a 
trivial amount of work and going back to sleep.

Their first, too-simple tests didn't confirm this.  Only when they 
re-wrote their tests to use all of the cores on a node busy were they
able to accurately reproduce the effect and confirm that indeed it was 
OS daemons (in their case, TruCluster and Quadrics network control) 
causing the performance loss.

It's easy to mis-remember what the paper actually says.  They addressed 
the problem by mapping processes that management nodes kept one core free 
for OS daemons and random kernel work.

What the paper DOES NOT say is that you need a globally synchronized clock 
to fix the problem.  They happened to have Quadrics, which had global 
synchronization operations.  Given that large, expensive hammer it was 
natural to propose using the network to synchronize the execution of the 
"noise" (junk jobs), rather than re-think the need for running them at 
all.  IIRC, it was companies such as Octiga Bay that actually implemented 
global-clock gang scheduling of system daemons, again with a network that 
implemented global synchronization operations.

Donald Becker				becker at scyld.com
Penguin Computing / Scyld Software
www.penguincomputing.com		www.scyld.com
Annapolis MD and San Francisco CA

More information about the Beowulf mailing list