Any news on Infiniband ?

Anthony Skjellum tony at MPI-SoftTech.Com
Wed Feb 26 21:46:00 PST 2003

Patrick, aren't you from a competitor vendor? :-)

On 26 Feb 2003, Patrick Geoffray wrote:

> Hi Tony,
> On Wed, 2003-02-26 at 16:46, Anthony Skjellum wrote:
> > Because people ask for polling latency (it is not the right model for most
> > real applications), we did provide this number.  We have found almost universal
> > advantage to avoid polling MPI in almost all networks, including Ethernet+TCP
> > Giganet, and Myrinet.
> Can you detail ? Unless you oversubscribed you processors, I don't see
> the point.
Of course that is not the case.

> It may just be a side effect of your implementation: if you implement
> the progression with some threads, sure you want to block to not waste
> cycle in this progression threads. If you progress in the hardware,
> blocking on interrupt is useless unless you oversubscribe your
> processors and so you want to force the context switches.

If you have hardware progress of MPI level, you don't use software to do it.
I am only aware that Cplant Portals has this in their MCP...

> Does IB offers hooks to progress the MPI protocol in hardware ?
IB NICs have plenty of performance room to offer this, but
that is not part of the standard.

> > Our white paper shows the polling and non-polling implementation curves for
> > bandwidth.  We normally encourage people not to use polling mode, where
> > overhead is low.
> If your metric is bandwidth, you don't care about the interrupt
> overhead: it's completely hidden by the communication cost because
> bandwidth usually means large messages.

This is clear. Any MPI, polling or blocking, can achieve high bandwidth.

> One of the flaws of IB is to use a paradigm built on VI. MPI did not map
> well on VI, and I expect the same thing for IB.
We had good luck with MPI/Pro over VI, and none of the problems your
general statement suggests.  In fact, Dell did independent comparisons
of Myrinet and Giganet and found much lower overheads for Giganet than with
GM at the time... it was quite a good technology at scales up to 128.
I'd recommend the papers of Jenwei Hsieh and Tau Leng to everyone, to look at
how, even with the need for a progress thread, large Giganet transfers
were only using 3% or so of CPU, whereas similar Myrinet was in the 20%+
range (as far as I recall, but see the white papers).

What evidence can you offer that MPI doesn't work well over VI?

As for IB, there are reliable connections, but also RD, which is quite
interesting to look at ... The connection you draw is a confusing one.

In fact, the truth is that IB will work very well for small and medium scale
(maybe to 1000 nodes), before suffering problems with connections and other
issues.  It will probably be quite convenient to use, easy to multiprogram,
and offer a lot more robustness than you can get with a weak NIC.

On the other hand, for a very scalable network, Myrinet is obviously going
to have serious advantages with its simple, source routed network
and terribly low overheads...

The picture of the technologies is much more interesting than your mail
suggests, and not pointing all for one technology or the other.


> Patrick
> --
> Patrick Geoffray, Phd
> Myricom, Inc.

Anthony Skjellum PhD, CTO       | MPI Software Technology, Inc.
101 South Lafayette St, Ste. 33 | Starkville, MS 39759, USA
Ph: +1-(662)320-4300 x15        | FAX: +1-(662)320-4301     | tony at

Middleware that's hard at work for you and your enterprise.(SM)

More information about the Beowulf mailing list