[Beowulf] MPI_Isend/Irecv failure for IB and large message sizes
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Martin Siegert siegert at sfu.caMon Nov 16 13:24:50 PST 2009
- Previous message: [Beowulf] MPI_Isend/Irecv failure for IB and large message sizes
- Next message: [Beowulf] MPI_Isend/Irecv failure for IB and large message sizes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Mark, On Sun, Nov 15, 2009 at 03:38:08PM -0500, Mark Hahn wrote: >> I am running into problems when sending large messages (about >> 180000000 doubles) over IB. A fairly trivial example program is attached. > > sorry if you've already thought of this, but might you have RLIMIT_MEMLOCK > set too low? (ulimit -l) Good point. By now I have played with all kinds of ulimits (the nodes have 16GB of memory and 16GB of swap space - this program is not even coming close to those limits). This is the current setting: # ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 139264 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) unlimited real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 139264 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited ... same error :-( >> [[60322,1],1][btl_openib_component.c:2951:handle_wc] from b1 to: b2 error polling LP CQ with status LOCAL LENGTH ERROR status number 1 for wr_id 199132400 opcode 549755813 vendor error 105 qp_idx 3 > > 105 looks like it might be an errno to me: > #define ENOBUFS 105 /* No buffer space available */ > > regards, mark. BTW: when using Intel-MPI (MPICH2) the program segfaults with l = 26843546 = 2^31/8 which makes me suspect that they use MPI_Byte to transfer the data internally and multiply the variable count by 8 without checking whether the integer overflows ... - Martin
- Previous message: [Beowulf] MPI_Isend/Irecv failure for IB and large message sizes
- Next message: [Beowulf] MPI_Isend/Irecv failure for IB and large message sizes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
