[Beowulf] MPICH vs. OpenMPI
jan.heichler at gmx.net
Fri Apr 25 05:04:38 PDT 2008
Freitag, 25. April 2008, meintest Du:
HB> Hi Jan,
HB> At Wed, 23 Apr 2008 20:37:06 +0200, Jan Heichler <jan.heichler at gmx.net> wrote:
>> >From what i saw OpenMPI has several advantages:
>>- better performance on MultiCore Systems
>>because of good shared-memory-implementation
HB> A couple of months ago, I conducted a thorough
HB> study on intra-node performance of different MPIs
HB> on Intel Woodcrest and Clovertown systems. I
HB> systematically tested pnt-to-pnt performance
HB> between processes on a) the same die on the same
HB> socket (sdss), b) different dies on same socket
HB> (ddss) (not on Woodcrest of course) and c)
HB> different dies on different sockets (ddds). I
HB> also measured the message rate using all 4 / 8
HB> cores on the node. The pnt-to-pnt benchmarks used
HB> was ping-ping, ping-pong (Scali?s `bandwidth´ and osu_latency+osu_bandwidth).
HB> I evaluated Scali MPI Connect 5.5 (SMC), SMC 5.6,
HB> HP MPI 126.96.36.199, MVAPICH 0.9.9, MVAPICH2 0.9.8, Open MPI 1.1.1.
HB> Of these, Open MPI was the slowest for all
HB> benchmarks and all machines, upto 10 times slower than SMC 5.6.
You are not gonna share these benchmark results with us, right? Would be very interesting to see that!
HB> Now since Open MPI 1.1.1 is quite old, I just
HB> redid the message rate measurement on an X5355
HB> (Clovertown, 2.66GHz). On an 8-byte message size,
HB> OpenMPI 1.2.2 achieves 5.5 million messages per
HB> seconds, whereas SMC 5.6.2 reaches 16.9 million
HB> messages per second (using all 8 cores on the node, i.e., 8 MPI processes).
HB> Comparing OpenMPI 1.2.2 with SMC 5.6.1 on
HB> ping-ping latency (usec) on an 8-byte payload yields:
HB> mapping OpenMPI SMC
HB> sdss 0.95 0.18
HB> ddss 1.18 0.12
HB> ddds 1.03 0.12
Impressive. But i never doubted that commercial MPIs are faster.
HB> So, Jan, I would be very curios to see any documentation of your claim above!
I did a benchmark of a customer application on a 8 node DualSocket DualCore Opteron cluster - unfortunately i can't remember the name.
I used OpenMPI 1.2 , mpich 1.2.7p1, mvapich 0.97-something and Intel MPI 3.0 IIRC.
I don't have the detailed data available but from my memory:
Latency was worst for mpich (just TCP/IP ;-) ), then IntelMPI, then OpenMPI and mvapich the fastest.
On a single machine mpich was the worst, then mvapich and then OpenMPI - IntelMPI was the fastest.
Difference between mvapich and OpenMPI was quite big - Intel just had a small advantage over OpenMPI.
Since this was not low-level i don't know which communication pattern the Application used but it seemed to me that the shared memory configuration on OpenMPI and Intel MPI was far better than on the other two.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf