[Beowulf] MPI - time for packing, unpacking, creating a message...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Dan.Kidger at quadrics.com Dan.Kidger at quadrics.comTue May 26 04:37:22 PDT 2009
- Previous message: [Beowulf] MPI - time for packing, unpacking, creating a message...
- Next message: [Beowulf] MPI - time for packing, unpacking, creating a message...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The original question was about relatively small messages - only 500 bytes each You can often get better throughput if you send say two smaller messages rather than one large one. This is since the interconnect can generate multiple RDMA requests that can proceed concurrently. This old paper from 2003 illustrates this http://www.docstoc.com/docs/5579957/Quadrics-QsNetII-A-network-for-Supercomputing-Applications Page 25 shows a graph where 1,2,4 and 8 concurrent RDMA are issued concurrently. For large messages (>256KB) there is no significant difference in the achieved total bandwidth - it is limited by the PCIe/PCI-X interface or the interconnect fabric itself. But at smaller messages sizes there are measurable differences - eg. two 1K messages show higher total bandwidth than a single 2K message. Daniel p.s. did you really mean to compare three 500bytes transfers with a single 2000byte transfer, rather than the same total message size in both cases? pps. Case A is really a broadcast - interconnects that implement broadcast in hardware are bound to do A faster than B From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Bruno Coutinho Sent: 23 May 2009 16:44 To: tribur at vision.ee.ethz.ch Cc: beowulf at beowulf.org Subject: Re: [Beowulf] MPI - time for packing, unpacking, creating a message... If you are using Gigabit Ethernet with jumbo frames (9000 bytes for example): A will send 3 packets with 4000 bytes and B will send one of 9000 bytes and one of 7000 bytes. For the cpu B is better, because will generate one system call and A will generate three and as many high speed interconnects today need large packets to fully utilize their bandwidth, I think that B should be faster. But the only way to be sure is testing. 2009/5/18 <tribur at vision.ee.ethz.ch<mailto:tribur at vision.ee.ethz.ch>> Hi all, is there anyone who can tell me if A) or B) is probably faster? A) process 0 sends 3x500 elements, e.g. doubles, to 3 different processors using something like if(rank==0){ MPI_Send(sendbuf, 500, MPI_DOUBLE, 1, 1, MPI_COMM_WORLD); MPI_Send(sendbuf, 500, MPI_DOUBLE, 2, 2, MPI_COMM_WORLD); MPI_Send(sendbuf, 500, MPI_DOUBLE, 3, 3, MPI_COMM_WORLD); } else MPI_Recv(recvbuf, 500, MPI_DOUBLE, 0, rank, MPI_COMM_WORLD, status); B) process 0 sends 2000 elements to process 1 using if(rank==0) MPI_Send(sendbuf, 2000, MPI_DOUBLE, 1, 1, MPI_COMM_WORLD); else MPI_Recv(recvbuf, 2000, MPI_DOUBLE, 0, rank, MPI_COMM_WORLD, status); _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org<mailto:Beowulf at beowulf.org> sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090526/1a1ead61/attachment.html
- Previous message: [Beowulf] MPI - time for packing, unpacking, creating a message...
- Next message: [Beowulf] MPI - time for packing, unpacking, creating a message...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
