[Beowulf] Re: Beowulf Digest, Vol 37, Issue 58
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Christian Bell christian.bell at qlogic.comMon Mar 26 07:59:56 PDT 2007
- Previous message: [Beowulf] Re: Beowulf Digest, Vol 37, Issue 58
- Next message: [Beowulf] Re: Beowulf Digest, Vol 37, Issue 58
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 26 Mar 2007, H?kon Bugge wrote: > Hi Christian, > > At 01:19 24.03.2007, beowulf-request at beowulf.org wrote: > >I've yet to see a significant number of message-passing applications > >show that an RDMA offload engine, as opposed to any other messaging > >engine, is a stronger performance determinant. That's probably > >because there are other equally important and desirable features > >implemented in other messaging engines. > > > I find this statement hard to justify from > available benchmark data. Looking at the LS-DYNA > neon_refined_revised submissions to > www.topcrunch.org, you can add one in favour of > RDMA ;-). Scali MPI Connect, utilizing SDR IB, > performs better than all comparable systems, > except for one case, where Infinipath is faster. > This is somewhat surprising to me, given the > latency and message rate advantage Infinipath has > compared to traditional IB. Therefore, let me use > this opportunity to stress that its not only the > interconnect architecture, but also the software > harnessing it (read MPI) that matters. Hi Håkon, I'm unsure if i would call significant a submission comparing results between configurations not compared at scale (in appearance large versus small switch, much heavier shared-memory component at small process counts). For example, in your submitted configurations, the interconnect communication (inter-node) is never involved more than shared memory (intra-node) and when the interconnect does become dominant at 32 procs, that's when InfiniPath is faster. On the flip side, you're right that these results show the importance of an MPI implementation (at least for shared memory), which also means your product is well positioned for the next generation of node configurations in this regard. However, because of the node configurations and because this is really one benchmark, I can't take these results as indicative of general interconnect performance. Oh, and because you're forcing me to compare results on this table, I now see what Patrick at Myricom was saying -- the largest config you show that stresses the interconnect (8x2x2) takes 596s walltime on a similar Mellanox DDR and 452s walltime on InfiniPath SDR (yes, the pipe is "100%" smaller but the performance is 25% better). We have performance engineers who gather this type of data and who've seen these trends on other benchmarks, and they'll be happy to right any wrong misconceptions, I'm certain. Now I feel like I'm sticking my tongue out like a shameless vendor and yet my original discussion is not really about beating the InfiniPath drum, which your reply insinuates. Rather, I was trying to point out that what curses MPI in its inability to semantically match the interfaces designed for offload is also what makes MPI effective on other grounds. Namely that a receiver-driven model with no remote completion guarantees leaves enough room for implementors to provide efficient network performance in many, perhaps non-conventional forms. Distilling the MPI discussion into "cpu overhead" is focusing on a very specialized (i.e. narrow) part of solving the MPI problem, a problem for which RDMA offload is not panacea. . . christian -- christian.bell at qlogic.com (QLogic SIG, formerly Pathscale)
- Previous message: [Beowulf] Re: Beowulf Digest, Vol 37, Issue 58
- Next message: [Beowulf] Re: Beowulf Digest, Vol 37, Issue 58
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
