[Beowulf] Performance characterising a HPC application

Mon Mar 26 09:57:51 PDT 2007

On Mon, Mar 26, 2007 at 09:38:43AM -0700, Gilad Shainer wrote:

> This is yet another example of "fair" comparison. Unlike Qlogic,
> Mellanox offer a family of products for PCIe servers, and there are
> multiple MPI versions that support those products. The performance
> depends on the hardware you pick and the software you use.

There are 4 MPIs that support InfiniPath's InfiniBand extension. The
servers basically were identical in this comparison: 3 Ghz dual/dual
Woodcrests.

If you'd like to suggest a better MPI to HP, please do so.

> Why don't you look at 
> http://www.clustermonkey.net//content/view/178/33/? 

You didn't say where you got the InfiniPath Fluent numbers. If it's my
whitepaper, I was not running version 6.3 of Fluent. Also, my number
was not run with HP-MPI. But in another month, I'll have a new whitepaper
with a Fluent chart, run with 6.3 and HP-MPI.

> This shows that Mellanox SDR beats Qlogic, even on a latency sensitive
> applications, and that was before ConnectX.

Fluent isn't latency sensitive at the L problem size. Real customer
runs with Fluent are much larger than the L problem size. (I just
spent a few days visiting with a Formula 1 racing team, so this is
fresh in my mind.)

> As for the overhead portion, this paper does not compare hardware to
> hardware overhead, and it is greatly influenced by the MPI software
> implementation.

If you'd like to suggest to Doug how your number could be improved,
please do so. You've had since September 2005.

> But who cares what exactly did they measured, right?..... Anyway, it
> is very reasonable to believe that On-loading architecture has lower
> CPU overhead than Off-loading one...

You're correct, "everybody knows" that onload must have higher
overhead.  Why bother testing it? "Everybody knows" that the cpu with
the highest Ghz is fastest.

-- greg