[Beowulf] Re: Re: Home beowulf - NIC latencies
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Patrick Geoffray patrick at myri.comWed Feb 16 02:07:27 PST 2005
- Previous message: [Beowulf] Re: Re: Home beowulf - NIC latencies
- Next message: [Beowulf] Re: Re: Home beowulf - NIC latencies
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Joachim Worringen wrote: > AFAIK, Myrinet's MPI (MPICH-GM), for example, does use the standard > (partly naive) collective operations of MPICH. Considering this, plus > the fact Replacing the collectives from MPICH-1 was not high on the todo list because there was more important things to optimize, with more effects on applications that the scheduling of some collectives. For scaling real codes on large machines, your priority is not there, not enough bang for your time. > - that it's not all that hard to use GM for pt-2-pt efficiently. We have > done this in our MPI, too, with the same level of performance. You have then no idea how hard if to use GM efficiently and *correctly*. Enough to run pingpong ? sure, that's piece of cake. But how to recover from fatal errors on the wire, from resources exhaustion, to avoid to spend most of your time pinning/unpinning pages, to not trash the translation cache on the NIC, etc ? Did you address all of these issues in your MPI ? Maybe, but it requires some design characteristics that would be higher than the device layer. At one time you have to make choices, and in a Swiss-Army-Knive (SAK) implementation, you choose the common ground, or the existing ground. > - that you probably do not know anything on ScaMPI's current internal True, I know zip about ScaMPI design. This is exactely why I don't know how they use GM. Without knowing that, how can you infer hardware characteristics from benchmark results ?!? > design (Intel is MPICH2 plus some Intel-propietary device hacking) and > little about it's performance (if this is wrong, let us know) Intel MPI is MPICH2 plus some multi-device glue. Intel got something right in their design: they ask the vendor to provide the native device layers instead of doing everything themselves. That's how a (SAK) implementation could actually be decent. However, the reference implementation is using uDapl. That means that there is stuff above the device layers that are needed to make the MPI-over-uDapl performance decent. Some of it can be used for other devices, the rest not. The question is that if I need something above the device layer to make my stuff decent, could I have it ? I would think so. Now, if it conflicts with something needed for another device, what happens ? Someone makes a choice. > - that all code apart from the device, and also the device architecture > of MPICH-GM are more or less 10-year-old swiss-army-knive MPICH code > (which is not a bad thing per se) MPICH-1 is not a SAK. You cannot take an MPICH binary and run it on all of the devices on which MPICH has been ported. You can *compile* it on multiple targets, but nothing more. Furthermore, many ch2 things where not used in ch_gm. If you look at it, most of the common code of MPICH is not performance related, at the exception of the collectives (and again they are not that bad). MPICH-2 has been moving more things to the device-specific part, that's the good direction. > you should maybe think again before judging on the efficiency of other > MPI implementations. I could not care less about the efficiency of other MPI implementations. None of my business. My point is that assuming that using a SAK MPI implementation factorize the software part and all remaining performance differences are thus hardware related is ridiculous. As Greg pointed out, an interconnect is a software/hardware stack, all the way to the MPI lib. Throw away the native MPI lib and you have a lame duck. Compare lame ducks and you go nowhere. You don't have much choice when you have a commercial MPI than to support many interconnects. You cannot ask the vendors to write their part unless you are Intel, so you write it yourself. You do your best, because you need to sell your stuff, and you call it good. Is there a value ? Today yes, because it makes life easier to have binary compatibility. However, my second point is that binary compatibility should be addressed by the MPI community, not by commercial MPI implementations. Patrick -- Patrick Geoffray Myricom, Inc. http://www.myri.com
- Previous message: [Beowulf] Re: Re: Home beowulf - NIC latencies
- Next message: [Beowulf] Re: Re: Home beowulf - NIC latencies
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
