[Beowulf] MPI_Isend/Irecv failure for IB and large message sizes

Michael Di Domenico mdidomenico4 at gmail.com
Sun Nov 15 14:29:13 PST 2009


you might want to ask on the linux-rdma list (was openfabrics).  its
been awhile since i looked at IB error messages, but what
stack/version are you running?

On Sat, Nov 14, 2009 at 4:43 PM, Martin Siegert <siegert at sfu.ca> wrote:
> Hi,
>
> I am running into problems when sending large messages (about
> 180000000 doubles) over IB. A fairly trivial example program is attached.
>
> # mpicc -g sendrecv.c
> # mpiexec -machinefile m2 -n 2 ./a.out
> id=1: calling irecv ...
> id=0: calling isend ...
> [[60322,1],1][btl_openib_component.c:2951:handle_wc] from b1 to: b2 error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 199132400 opcode 549755813  vendor error 105 qp_idx 3
>
> This is with OpenMPI-1.3.3.
> Does anybody know a solution to this problem?
>
> If I use MPI_Allreduce instead of MPI_Isend/Irecv, the program just hangs
> and never returns.
> I asked on the openmpi users list but got no response ...
>
> Cheers,
> Martin
>
> --
> Martin Siegert
> Head, Research Computing
> WestGrid Site Lead
> IT Services                                phone: 778 782-4691
> Simon Fraser University                    fax:   778 782-4242
> Burnaby, British Columbia                  email: siegert at sfu.ca
> Canada  V5A 1S6
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>




More information about the Beowulf mailing list