[Beowulf] Really efficient MPIs??
Hakon.Bugge at scali.com
Wed Nov 28 11:29:40 PST 2007
At 16:07 28.11.2007, "Michael H. Frese" <Michael.Frese at NumerEx.com> wrote:
>Oops, sorry. Early morning typing-while-sleeping.
>The latencies claimed by Argonne for core-to-core
>on-board communication with MPICH2 compiled using the ch3:nemesis
>device are 0.3-0.5 microseconds, not 0.06. There's also no claim
>about what happens when you use it for mixed on-board and off-board comms.
>Our recent dual-core 64-bit AMD boards get 0.6 microsecond latency
>core-to-core, while our older 32-bit ones get 1.6. That's all by
Unless you use an MPI which let you control how processes are bound
to cores (or use taskset), you really don't know what you're measuring.
On modern systems, two cores could be a) on the same die, b) on the
same socket but different dies, c) on different sockets, and d) on
different socket where the traffic is routed through a third one.
Moreover, on Clovertown, the Snoop filter could be enabled or disabled.
So, a core-to-core comparison by two different people, using
different MPIs and different systems, probably measures two different
More information about the Beowulf