[Beowulf] precise synchronization of system clocks
Nifty niftyompi Mitch
niftyompi at niftyegg.com
Tue Sep 30 19:04:21 PDT 2008
On Tue, Sep 30, 2008 at 11:37:12AM -0400, Robert G. Brown wrote:
>> Keeping it beowulf'y, if you want fine grained synchronization so that you
>> don't lose performance when doing barriers, you're probably going to need
>> some sort of common clock. The typical microprocessor crystal just isn't
>> good enough. Actually, though, when talking about this sort of sync, aren't
>> we getting close to SIMD sort of processing? Is a "cluster of commodity
>> computers" actually a "good" way to be doing this sort of thing?
> There is a natural synchronization driven by task advancement and
> barriers already. The problem, I think, is in getting "everything else"
> to be at least moderately synchronous, as it is the noise of this that
> degrades the otherwise synchronous task is it not? If one could
> convince the kernel to "start" all of its housekeeping task timeslices
> within (say) 1 usec worst case across all nodes, you would effectively
> parallelize and synchronize this noise.....
... more snip...
This almost makes sense except that I suspect the precision of the
natural synchronization is much less than the numbers we are talking
about and less than I suspect NTP can maintain with a local master
Different interconnects and different transports should have a spectrum
of per rank precision when coming out of a MPI barrier. It might be
interesting to gather some data for different cluster sizes, different
transports and different MPIs.
Also while focusing on network/transport in this discussion none of us
made a comment on rotational latency as a source of uncertainty for the
kernel state. If we had the ability to synchronize the systems exactly
starting a process would lag for want of rotational/seek disk latency
Shared memory machines and transports will behave differently.
The very high accuracy and high precision clock synchronization is a very real
problem for some data gathering systems. Once the data is gathered the
computation should be less sensitive. These are different problems and
might be addressed by the data sampling devices.
Synchronization brings problems.... for example a well synchronized campus
can hammer yp server and file servers when cron triggers the same actions on
5000+ systems... I try never to fetchmail at the hour, half hour...
I suspect that some system cron tasks should no longer run from cron. Common
housekeeping tasks necessary for system health should be run via the batch system
in a way that is fashionably late enough to not hammer site services.
One site service of interest is AC power. A modern processor sitting
in an idle state that then starts a well optimized loop will jump from
a couple of watts to 100 watts in as many clocks as the set of pipelines
is deep behind the instruction decode and instruction cache fill. A 1000
processor (4000 cores) might jump from 4000 watts to 100000 watts in the
blink of an eye (err did the lights blink). Buffer that dI/dT through
the PS and it is less but still interesting on the mains which are synchronized.
T o m M i t c h e l l
Found me a new hat, now what?
More information about the Beowulf