[Beowulf] Q: IB message rate & large core counts (per node)?

Tom Elken tom.elken at qlogic.com
Mon Mar 15 17:03:14 PDT 2010


> On Behalf Of Gilad Shainer
> 
> ... OSU has different benchmarks
> so you can measure message coalescing or real message rate. 

[ As a refresher for the wider audience , as Gilad defined earlier: " Message coalescing is when you incorporate multiple MPI messages in a single network packet."  And I agree with this definition :) ]

Gilad,  

Sorry for the delayed QLogic response on this.  I was on vacation when this thread started up.  But now that it has been revived, ... 

Which OSU benchmarks have message-coalescing built into the source?  

> Nowadays it seems that QLogic
> promotes the message rate as non coalescing data and I almost got
> bought
> by their marketing machine till I looked on at the data on the wire...
> interesting what the bits and bytes and symbols can tell you...

Message-coalescing has been done in benchmark source code, such as HPC Challenge's MPI RandomAccess benchmark.  In that case, coalescing is performed when the SANDIA_OPT2 define is turned on during the build.

More typically message coalescing is a feature of some MPIs and they use various heuristics for when it is active.

MVAPICH has an environment variable -- VIADEV_USE_COALESCE -- which can turn this feature on or off.  HP-MPI has coalescing heuristics on by default when using IB-Verbs, off by default when using QLogic's PSM.  Open MPI has enabled message-coalescing heuristics for more recent versions when running over IB verbs.

There is nothing wrong with message coalescing features in the MPI.  Only when you are trying to measure the raw message rate of the network adapter, it is best to not use message coalescing feature so you can measure what you set out to measure.  QLogic MPI does not have a message coalescing feature, and that is what we use to measure MPI message rate on our IB adapters.  We also measure using MVAPICH with it's message coalescing feature turned off, and get virtually identical message rate performance to that with QLogic MPI. 

I don't know what you were measuring on the wire, but with the osu_mbw_mr benchmark and QLogic MPI, for the small 1 to 8 byte message sizes where we achieve maximum message rate, each message is in its own 56 byte packet with
no coalescing.  I asked a couple of our engineers who have looked at a lot of PCIe traces to make sure of this.

Regards,
-Tom


> 
> 
> 
> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
> On Behalf Of Greg Lindahl
> Sent: Friday, February 19, 2010 2:06 PM
> To: beowulf at beowulf.org
> Subject: Re: [Beowulf] Q: IB message rate & large core counts (per
> node)?
> 
> > Mellanox latest message rate numbers with ConnectX-2 more than
> > doubled versus the old cards, and are for real message rate -
> > separate messages on the wire. The competitor numbers are with using
> > message coalescing, so it is not real separate messages on the wire,
> > or not really message rate.
> 
> Gilad,
> 
> I think you forgot which side you're supposed to be supporting.
> 
> The only people I have ever seen publish message rate with coalesced
> messages are DK Panda (with Mellanox cards) and Mellanox.
> 
> QLogic always hated coalesced messages, and if you look back in the
> archive for this mailing list, you'll see me denouncing coalesced
> messages as meanless about 1 microsecond after the first result was
> published by Prof. Panda.
> 
> Looking around the Internet, I don't see any numbers ever published by
> PathScale/QLogic using coalesced messages.
> 
> At the end of the day, the only reason microbenchmarks are useful is
> when they help explain why one interconnect does better than another
> on real applications. No customer should ever choose which adapter to
> buy based on microbenchmarks.
> 
> -- greg
> (formerly employed by QLogic)
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list