[Beowulf] Programming Help needed
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Larry Stewart stewart at serissa.comSat Nov 7 03:22:04 PST 2009
- Previous message: [Beowulf] Programming Help needed
- Next message: [Beowulf] Programming Help needed
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Nov 6, 2009 at 5:43 PM, amjad ali <amjad11 at gmail.com> wrote: > Hi all, > > > Suppose that the grid/mesh is decomposed for n number of processors, such > that each processors has a number of elements that share their side/face > with different processors. What I do is that I start non blocking MPI > communication at the partition boundary faces (faces shared between any two > processors) , and then start computing values on the internal/non-shared > faces. When I complete this computation, I put WAITALL to ensure MPI > communication completion. Then I do computation on the partition boundary > faces (shared-ones). This way I try to hide the communication behind > computation. Is it correct? > > There are two issues here. First, correctness. The data for messages that arrive while you are computing may be written into memory asynchronously with respect to your program. Be sure that you are not depending on values in memory that may be overwritten by data arriving from other ranks. Second, overlap is good, but whether you actually get any overlap depends on the details. For example, the work of communicating with other ranks and sending messages and so forth must be done by something. For ethernet, there will be a lot of work done by the OS kernel and in general by some core on each node. If you expect to be using all the cores in a node to run your program, who is left to do the communications work? Some implementations will timeshare the processors, giving the appearance of overlap, but not actually running faster, while other implementations simply won't do any work until the WAITALL that demands progress. If you have multicore nodes, and you don't need every last core to run your program, it can help if you only allocate some of the cores on each node to your program, leaving some "idle" to run the OS and the communications. The job control system should have a way to do this. You can test to find out if you are getting any overlap, by artificially reducing the actual communications work to near zero and seeing if the program runs any faster. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20091107/f41bd6e5/attachment.html
- Previous message: [Beowulf] Programming Help needed
- Next message: [Beowulf] Programming Help needed
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
