[Beowulf] Re: Beowulf Digest, Vol 37, Issue 58
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Håkon Bugge Hakon.Bugge at scali.comMon Mar 26 14:34:09 PDT 2007
- Previous message: [Beowulf] Re: Beowulf Digest, Vol 37, Issue 58
- Next message: [Beowulf] Performance characterising a HPC application
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi again Christian, At 16:59 26.03.2007, Christian Bell wrote: >Hi Håkon, > >I'm unsure if i would call significant a >submission comparing results between >configurations not compared at scale (in >appearance large versus small switch, much >heavier shared-memory component at small process >counts). For example, in your submitted >configurations, the interconnect communication >(inter-node) is never involved more than shared >memory (intra-node) and when the interconnect >does become dominant at 32 procs, that's when InfiniPath is faster. Not sure how you count this. In my "world", all processes communicates with more remote processes that local ones in all cases except for the single node runs. I.e., in a two node case with 2 or 4 processes per node, a process has 1 or 3 other local processes and 2 or 4 other remote processes. Excluding the single node cases, we have six runs (2x2, 4x2, 8x2, 2x4, 4x4, 8x4) where RDMA is faster than message passing in 5 of the cases. As to the 32 core case, I am running equal fast as Infinipath on this one, but this is not a released product (yet). Hence I haven't published it. And based on this I did not call it significant findings, but merely an indication of RDMA being faster (upto 16 cores) or equal fast as message passing for _this_ application and dataset. >On the flip side, you're right that these >results show the importance of an MPI >implementation (at least for shared memory), >which also means your product is well positioned >for the next generation of node configurations >in this regard. However, because of the node >configurations and because this is really one >benchmark, I can't take these results as >indicative of general interconnect >performance. Oh, and because you're forcing me >to compare results on this table, I now see what >Patrick at Myricom was saying -- the largest >config you show that stresses the interconnect >(8x2x2) takes 596s walltime on a similar >Mellanox DDR and 452s walltime on InfiniPath SDR >(yes, the pipe is "100%" smaller but the performance is 25% better). Just to avoid any confusion, the 596s number is _not_ with Scali MPI Connect (SMC), but a competing MPI implementation. SMC achieves 551s using SDR. I must admit your Infinipath number is new to me, as topcrunch reports 482s for this configuration with Infinipath. >We have performance engineers who gather this >type of data and who've seen these trends on >other benchmarks, and they'll be happy to right >any wrong misconceptions, I'm certain. > >Now I feel like I'm sticking my tongue out like >a shameless vendor and yet my original >discussion is not really about beating the >InfiniPath drum, which your reply insinuates. Well, my intent was to draw the wulfers attention to some published facts containing apples-to-apples comparisons, in an interesting discussion of RDMA vs. message passing. Given the significant (yes, I mean it) difference in latency and message rates, I was indeed surprised. My question still is; if there existed an RDMA API with similar characteristics as the best message passing APIs, how would a good MPI implementation perform? Håkon
- Previous message: [Beowulf] Re: Beowulf Digest, Vol 37, Issue 58
- Next message: [Beowulf] Performance characterising a HPC application
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
