Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] MPI_Isend/Irecv failure for IB and large message sizes

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Michael Di Domenico mdidomenico4 at gmail.com
Sun Nov 15 14:29:13 PST 2009


you might want to ask on the linux-rdma list (was openfabrics).  its
been awhile since i looked at IB error messages, but what
stack/version are you running?

On Sat, Nov 14, 2009 at 4:43 PM, Martin Siegert <siegert at sfu.ca> wrote:
> Hi,
>
> I am running into problems when sending large messages (about
> 180000000 doubles) over IB. A fairly trivial example program is attached.
>
> # mpicc -g sendrecv.c
> # mpiexec -machinefile m2 -n 2 ./a.out
> id=1: calling irecv ...
> id=0: calling isend ...
> [[60322,1],1][btl_openib_component.c:2951:handle_wc] from b1 to: b2 error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 199132400 opcode 549755813  vendor error 105 qp_idx 3
>
> This is with OpenMPI-1.3.3.
> Does anybody know a solution to this problem?
>
> If I use MPI_Allreduce instead of MPI_Isend/Irecv, the program just hangs
> and never returns.
> I asked on the openmpi users list but got no response ...
>
> Cheers,
> Martin
>
> --
> Martin Siegert
> Head, Research Computing
> WestGrid Site Lead
> IT Services                                phone: 778 782-4691
> Simon Fraser University                    fax:   778 782-4242
> Burnaby, British Columbia                  email: siegert at sfu.ca
> Canada  V5A 1S6
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>




More information about the Beowulf mailing list