[Beowulf] MPI_Alltoall
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Ashley Pittman ashley at quadrics.comTue Apr 12 06:52:49 PDT 2005
- Previous message: [Beowulf] MPI_Alltoall
- Next message: [Beowulf] NASTRAN on cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 2005-04-12 at 02:29 -0700, Rita Zrour wrote: > Hello I have a question, > when i do many MPI_Alltoall in my program always the > first MPI_Alltoall take too much time to be done. > > I don't know where the first communication is always > expensive. Is that a problem of memory??????? Many MPI implementations do "lazy" allocation of resources, comms buffers and descriptors, it's not unusual for the first iteration of a loop to have to allocate these on the fly, future iterations simply re-use cached descriptors/handles as needed. This isn't unique to MPI but happens nearly everywhere in the software world, perhaps alltoall exposes it more as it has more simultaneous pending send/recvs than anything else? Plus of course I assume you are actually initialising your data before you send it, far to many people write "benchmarks" that just send un-initialised mmaped() memory and end up measuring the page fault performance rather than the network bandwidth. Proper benchmarks (for the most part) zero all data before they send it and do a handful of warmup laps before doing any measurements, even without extra allocation/faulting simply having the data cache-hot can make a difference to measured performance. Ashley,
- Previous message: [Beowulf] MPI_Alltoall
- Next message: [Beowulf] NASTRAN on cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
