[Beowulf] precise synchronization of system clocks

Vincent Diepeveen diep at xs4all.nl
Tue Sep 30 02:53:52 PDT 2008


Hmm,

1 uS accuracy whereas the cpu has a hardware counter for all this.

To be honest i find 1 microsecond very inaccurate now that cards have  
latencies near that.

Let's assume now a simple example of 2 nodes.

node A and node B.

Node A has time X
Node A ships to B time X

Then we do a loop.

Node A ships data to B and B responds to A.

A then measures the time needed for the 2 way pingpong latency,
based upon that gives to B a new time X'.

Nowadays network cards need a microsecond or 2 for this.

Doing that a couple of thousands of times, we should get a fairly  
accurate
timing in B, far more accurate than 1 microsecond, as the deviation in
one way pingpong latency isn't real big. It's quite constant.

Only the deviation of that latency is a measure for the accuracy at  
which you can
synchronize the clocktime.

Now this is a simple 2 node example. It is of course possible for a  
cluster to use
the measurements of many nodes and synchronize to that, just like the  
coordinate calculation
for GPS uses several satellites. Using many nodes that'll get the  
average
error down. Of course to synchronize many nodes each node uses its  
own clock as
new 'source' of measurement; if for the synchronization accuracy we  
always assume the
same clock from node A, then getting the error down is a lot tougher.

Vincent


On Sep 29, 2008, at 11:21 PM, Lombard, David N wrote:

> On Mon, Sep 29, 2008 at 01:10:49PM -0700, Prentice Bisbal wrote:
>> In the previous thread I instigated about running services in cluster
>> nodes, there was some mentioning of precisely synchronizing the  
>> system
>> clocks and this issue is also mentioned in this paper:
>>
>> "The Case of Missing Supercomputer Performance: Achieving Optimal
>> Performance on the 8,192 processor ASCI Q" (Petrini, Kerbisin and  
>> Pakin)
>> http://hpc.pnl.gov/people/fabrizio/papers/sc03_noise.pdf
>>
>> I've also read a few other papers on the topic, and it seems you  
>> need to
>> sync the system clocks to ~1 uS. On top of that, I imagine you  
>> also need
>> to synch the activities of each system so they all stop to do the  
>> same
>> system-level tasks at the same time.
>
> The IEEE-1588 "Precision Time Protocol" can provide such levels of  
> global clock
> synchronization.
>
> Shameless plug: See "Hardware Assisted Precision Time Protocol  
> (PTP, IEEE-1588)
> - Design and Case Study" presented at the recent LCI conference;
> <http://www.linuxclustersinstitute.org/conferences/archive/2008/ 
> technicalpapers.html>
>
> -- 
>
> David N. Lombard, Intel, Irvine, CA
> I do not speak for Intel Corporation; all comments are strictly my  
> own.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>




More information about the Beowulf mailing list