Any news on Infiniband ?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Patrick Geoffray patrick at myri.comWed Feb 26 22:50:25 PST 2003
- Previous message: Any news on Infiniband ?
- Next message: Any news on Infiniband ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 2003-02-27 at 00:46, Anthony Skjellum wrote: > Patrick, aren't you from a competitor vendor? :-) Of course, and all of my statements have to be taken in this context :-) > On 26 Feb 2003, Patrick Geoffray wrote: > > > real applications), we did provide this number. We have found almost universal > > > advantage to avoid polling MPI in almost all networks, including Ethernet+TCP > > > Giganet, and Myrinet. This was the source of my comment: your universal advantage to avoid polling is related to your MPI implementation, it's a software design choice. That does not mean blocking is the universal solution for messages completion. > If you have hardware progress of MPI level, you don't use software to do it. > I am only aware that Cplant Portals has this in their MCP... It's interesting to see that there were hardware providing such capabilities for some time now, and nobody never really did take fully advantage of it. I think it was because the people doing the low level stuff were not in touch with the people doing MPI. It's also interesting that it's changing now, maybe under the pressure of IB. > > One of the flaws of IB is to use a paradigm built on VI. MPI did not map > > well on VI, and I expect the same thing for IB. > We had good luck with MPI/Pro over VI I think the fundamental advantage of MPI/Pro was to assert the progression problem and try to improve communication/communication overlapping. I don't think VI did provide any specific features to make it better. * VI is connection oriented, that means scalability issues and high setup overhead. * Large descriptors based, that means high latency: the latency on Giganet was way more than it should have been for a pure silicon solution. * Memory registration is explicit, that means it was not optimized and it was a nightmare to put it out of the critical path. * Matching space is too small to match MPI, so you need a progression engine in the host. > general statement suggests. In fact, Dell did independent comparisons > of Myrinet and Giganet and found much lower overheads for Giganet than with There is no such things as "independent comparaisons" specially when Dell and Giganet had a distribution deal. It's called marketing. > GM at the time... it was quite a good technology at scales up to 128. The good part of Giganet was, IMHO, the packet engine built on top of a ATM chip, with a very good medium message pipeline. VI was in fashion at that time, and it sure looked appealing to jump on the bandwagon initially pushed by Microsoft, Compaq and Oracle. There was the same appeal for IB a few years ago. But let me remind you the current state of VI: dead in the water. > I'd recommend the papers of Jenwei Hsieh and Tau Leng to everyone, to look at > how, even with the need for a progress thread, large Giganet transfers > were only using 3% or so of CPU, whereas similar Myrinet was in the 20%+ > range (as far as I recall, but see the white papers). This is the specific advantage of the progression thread. This is an MPI implementation trade-off to deal with GM's constraints (that have nothing to envy to VI constrains BTW). > What evidence can you offer that MPI doesn't work well over VI? VI, GM and IP don't work well for MPI. They do not share MPI semantics, and MPI implementations built on top of them use a lot of code just to work around them. If you have to have a progression thread, or cache the memory registration, or do the rendez-vous/matching yourself, it's not designed for MPI. > As for IB, there are reliable connections, but also RD, which is quite > interesting to look at ... The connection you draw is a confusing one. I will add IB to the list of communication layers that are not designed for MPI (VI, GM and IP). The IB trade association didn't think one second about MPI when writing the huge specs. Having Reliable Datagram won't help one bit to write a more efficient MPI middleware. > In fact, the truth is that IB will work very well for small and medium scale > (maybe to 1000 nodes), before suffering problems with connections and other > issues. It will probably be quite convenient to use, easy to multiprogram, > and offer a lot more robustness than you can get with a weak NIC. You don't want to hear my truth about IB because I am completely biased and it won't be pretty. Specifically about MPI, IB will work as well as VI or GM, that means on 3 legs with 2 iron balls on each and walking backward. > The picture of the technologies is much more interesting than your mail > suggests, and not pointing all for one technology or the other. For the HPC world, the de facto communication standard is MPI. To support MPI effectively, you need to have a native support from the hardware, either in silicon or in firmware. There is 2 solutions today that can provided a such support, it's Quadrics and Myrinet. When everybody will be soon at 4X (which IMHO overkilled to the current state of PCI and machines), the difference for the HPC community will be on the efficiency of MPI. For that, I think IB is like VI, it sucks :-) Patrick -- Patrick Geoffray, Phd Myricom, Inc. http://www.myri.com
- Previous message: Any news on Infiniband ?
- Next message: Any news on Infiniband ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
