[Beowulf] hpcc MPIRandomAccess bug
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Håkon Bugge Hakon.Bugge at scali.comTue Aug 23 06:50:30 PDT 2005
- Previous message: [Beowulf] LAM_MPI problem on PBS
- Next message: [Beowulf] hpcc MPIRandomAccess bug
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Running hpcc in its latest version, I have from time to time been exposed to missing updates in the MPIRandomAccess benchmark. The benchmark tolerates upto 1% missing updates, hence the benchmarks "passes" as such. However, when running a pure MPI implementation of this benchmark, missing updates are unacceptable. Hence, I investigated the issue. Would be interesting to know if anyone else has been exposed to the same. This applies to hpcc version 1.0, sub-benchmark MPIRandomAccess with the define USE_MULTIPLE_RECV set (default set) and the define MAX_RECV greater than 1 (default 16). Rudimentary explanation: Each process sends (a bucket) of updates randomly to other processes, using the tag UPDATE_TAG. Whenever a process has sent all updates, it sends a message to all other processes using a FINISHED_TAG, to indicate that it has finished. Each process has N posted MPI_Irecvs with ANY_SOURCE and ANY_TAG. At regular intervals, a process will check completion of any of the posted MPI_Irecvs by issuing an MPI_Testany. Most messages will contain the UPDATE_TAG, and the process will update its part of a global array accordingly. If the message selected by MPI_Testany contains FINISHED_TAG, the process will decrement a counter, initialized to the total number of processes minus one. Hence, when this counter becomes zero and this process has sent all updates, it has completed its work. It will then cancel its N outstanding MPI_Irecvs by calling MPI_Cancel+MPI_Wait N times. This implementation of the algorithm contains a bug. Assume that "our" process has sent all updates, and has received FINISHED_TAG from all but one other process. Assume N is 2. The "oldest" issued MPI_Irecv has matched the last message containing updates (i.e. using the UPDATE_TAG) from this remote process. The youngest issued MPI_Irecv has matched the last message (i.e. the message containing the FINISHED_TAG) sent by this remote process. Hence, ordering between sends and receives are maintained. Then, our process calls MPI_Testany. This MPI call will pick *any* of the posted receives being finished. It *might* pick the message containing the FINISHED_TAG. If that is the case, our process think it is finished, since it has received FINISHED_TAG from all remote processes. It will then cancel the posted MPI_Irecvs, also the one containing the UPDATE_TAG. Hence, an update (or a bucket of updates) will be lost. A proposed work-around will shortly be available from me. -h
- Previous message: [Beowulf] LAM_MPI problem on PBS
- Next message: [Beowulf] hpcc MPIRandomAccess bug
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
