[Beowulf] MPI_Isend/Irecv failure for IB and large message sizes

Martin Siegert siegert at sfu.ca
Mon Nov 16 12:56:21 PST 2009


Hi Michael,

On Mon, Nov 16, 2009 at 10:49:23AM -0700, Michael H. Frese wrote:
> Martin,
>
> Could it be that your MPI library was compiled using a small memory model?  
> The 180 million doubles sounds suspiciously close to a 2 GB addressing 
> limit.
>
> This issue came up on the list recently under the topic "Fortran Array size 
> question."
>
>
> Mike

I am running MPI applications that use more than 16GB of memory - 
I do not believe that this is the problem. Also -mmodel=large
does not appear to be a valid argument for gcc under x86_64:
gcc -DNDEBUG -g -fPIC -mmodel=large   conftest.c  >&5
cc1: error: unrecognized command line option "-mmodel=large"

- Martin

> At 05:43 PM 11/14/2009, Martin Siegert wrote:
>> Hi,
>>
>> I am running into problems when sending large messages (about
>> 180000000 doubles) over IB. A fairly trivial example program is attached.
>>
>> # mpicc -g sendrecv.c
>> # mpiexec -machinefile m2 -n 2 ./a.out
>> id=1: calling irecv ...
>> id=0: calling isend ...
>> [[60322,1],1][btl_openib_component.c:2951:handle_wc] from b1 to: b2 error 
>> polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 
>> 199132400 opcode 549755813  vendor error 105 qp_idx 3
>>
>> This is with OpenMPI-1.3.3.
>> Does anybody know a solution to this problem?
>>
>> If I use MPI_Allreduce instead of MPI_Isend/Irecv, the program just hangs
>> and never returns.
>> I asked on the openmpi users list but got no response ...
>>
>> Cheers,
>> Martin
>>
>> --
>> Martin Siegert
>> Head, Research Computing
>> WestGrid Site Lead
>> IT Services                                phone: 778 782-4691
>> Simon Fraser University                    fax:   778 782-4242
>> Burnaby, British Columbia                  email: siegert at sfu.ca
>> Canada  V5A 1S6



More information about the Beowulf mailing list