LAM and channel bonding doesn't work?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Martin Siegert siegert at sfu.caThu Jul 12 19:05:23 PDT 2001
- Previous message: HPL Compile Problems: resolved.
- Next message: LAM and channel bonding doesn't work?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, sorry to bug you with this, but I am banging my head against the wall for days now and can find the source of my problem. Thus, I need help... There seems to be a bug in LAM (I tested versions 6.3.2 and 6.5.2 which I compiled myself with --with-rpi=usysv, and the RPMs lam-6.5.1-1.i386.rpm (RH 7.1), lam-6.5.2-usysv.1.i386.rpm, lam-6.5.3-usysv.1.i386.rpm) that shows itself only when channel bonding is used. Furthermore, it shows up only with a user defined datatype using MPI_Type_vector. Furthermore, it only shows up when the system size and consequently the message size is sufficiently large. Also it only shows up when nonblocking sends/receives (MPI_Isend, MPI_Irecv) are used. I append my test program below (again, sorry for this; this is as condensed as I could make it). Anyway the problem does not show up with mpich-1.2.1 and mpipro-1.6.3 only with lam, which is a problem because I rely on the latency of lam for performance. Anyway, if you by now hit the d key, because of all this bizarreness, I can't blame you - nevertheless here is the problem: I compile the program: # mpicc -O2 -o vector-test vector-test.c and run it on two processors that must not be on the same machine (program works fine using two processors on a SMP box): # lamboot -v LAM 6.5.2/MPI 2 C++/ROMIO - University of Notre Dame Executing hboot on n0 (b001 - 1 CPU)... Executing hboot on n1 (b002 - 1 CPU)... topology done # mpirun -np 2 -O vector-test -2 4000 id=1: MPI_Isend done. id=1: MPI_Irecv done. id=0: MPI_Isend done. id=0: MPI_Irecv done. id=0: elapsed time: 1.982758 s id=1: elapsed time: 1.856275 s # mpirun -np 2 -O vector-test -2 8000 id=0: MPI_Isend done. and at that point the program hangs forever :-( Strangely enough, if only one processor does a isend and the other a irecv the program runs through: # mpirun -np 2 -O vector-test 8000 id=0: MPI_Irecv done. id=0: elapsed time: 4.800264 s id=1: MPI_Isend done. id=1: elapsed time: 4.737434 s Also with blocking send/recv it runs: # mpirun -np 2 -O vector-test -b -2 8000 id=0: MPI_Recv done. id=1: MPI_Send done. id=0: MPI_Send done. id=0: elapsed time: 16.022285 s id=1: MPI_Recv done. id=1: elapsed time: 15.358338 s Under mpich there is no problem either: # mpirun -np 2 vector-test 8000 id=1: MPI_Isend done. id=0: MPI_Isend done. id=0: MPI_Irecv done. id=0: elapsed time: 10.939983 s id=1: MPI_Irecv done. id=1: elapsed time: 11.625667 s Does anybody have an idea what the problem could be? How can I debug this? If there is somebody on the list who uses LAM and channel bonding: would you be willing to verify or disprove this? - You probably have to change the system size, I strongly suspect that the critical size is memory dependent: my numbers (4000 and 8000) are for 512MB of RAM. Sorry again for dumping this on the list. Regards, Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert at sfu.ca Canada V5A 1S6 ======================================================================== ---<cut here: vector-test.c>-------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <math.h> #include <mpi.h> /* allocate a 2d array of doubles with subscript range dm[min_x,...,max_x][min_y,...,max_y] contigously in memory */ double **alloc_darray2d(int min_x, int max_x, int min_y, int max_y, int *ialloc_err) { int i,nx=max_x-min_x+1,ny=max_y-min_y+1; double **dm; *ialloc_err=0; dm=(double **)malloc((size_t) nx*sizeof(double*)); if (dm == NULL){ *ialloc_err=1; return dm; } dm -= min_x; dm[min_x]=(double *)malloc((size_t) nx*ny*sizeof(double)); if (dm[min_x]==NULL){ *ialloc_err=2; return dm; } dm[min_x]-=min_y; for (i=min_x+1; i <= max_x; i++) dm[i] = dm[i-1]+ny; /* return pointer to array of pointers to rows */ return dm; } void free_darray2d(double **dm,int min_x, int max_x, int min_y, int max_y) { free((void *) (dm[min_x]+min_y)); free((void *) (dm+min_x)); } int main(int argc,char *argv[]){ /* This program tests message passing with a user defined data type: each processor allocates a matrix of size L x (L/numprocs), where numprocs is the # of processors that are used in the computation (from MPI_Comm_size). The matrix is split into numprocs blocks of size (L/numprocs) x (L/numprocs). Hence, the data in each block are are (L/numprocs) arrays of size (L/numprocs). These arrays are stored a stride L appart. These blocks are defined as a new MPI datatype using MPI_Type_vector. Each processor sends one block to the next processor and receives one block from the previous processor. */ double **d_matrix; double *work,*local_data; double start_time,elapsed_time; int l,lm1,lx,ly,i,j,k,ialloc_err; int myid,numprocs,l_proc,l2_proc,n_local,id_send,id_recv,idx_send,idx_recv; int blocking=0,send_and_receive=0; int c,errflag=0; extern char *optarg; extern int optind; MPI_Datatype block; MPI_Request s_req; MPI_Request r_req; MPI_Status s_status; MPI_Status r_status; MPI_Init(&argc,&argv); MPI_Comm_rank( MPI_COMM_WORLD, &myid ); MPI_Comm_size( MPI_COMM_WORLD, &numprocs ); while ((c = getopt(argc, argv, "b2")) != -1 ) { switch (c) { case 'b' : blocking=1; break; case '2' : send_and_receive=1; break; case '?' : errflag++; } } if (argc != optind+1) errflag++; if (sscanf(argv[optind++],"%i",&l) != 1) errflag++; if (errflag) { if (myid == 0) { fprintf(stderr,"usage: %s [-b] [-2] L\n",argv[0]); fprintf(stderr,"with LxL/(# of procs) being the size of the matrix" " (int).\n" "If -b is specified blocking send/recv are used.\n" "If -2 is specified, each process sends and receives." "\n"); } MPI_Finalize(); exit(1); } l=((int)(((double)l)/numprocs+0.5))*numprocs; lm1=l-1; l_proc=l/numprocs; ly=l_proc; l2_proc=l_proc*l_proc; n_local=l_proc*l; /* allocate matrix, work array */ d_matrix=alloc_darray2d(0,l_proc-1,0,lm1,&ialloc_err); if (ialloc_err) { fprintf(stderr,"id=%i: matrix allocation error %i\n",myid,ialloc_err); MPI_Abort(MPI_COMM_WORLD,ialloc_err); exit(ialloc_err); } work=(double *)malloc(n_local*sizeof(double)); if (work == NULL){ fprintf(stderr,"id=%i: work allocation error\n",myid); MPI_Abort(MPI_COMM_WORLD,1); exit(1); } /* define datatype of a block to be sent: a total of l_proc*l_proc elements, stored in l_proc arrays of size l_proc that are a stride l apart */ MPI_Type_vector(l_proc,l_proc,l,MPI_DOUBLE,&block); MPI_Type_commit(&block); /* initialize matrix */ for (i=0; i<l_proc; i++){ for (j=0; j<l; j++){ d_matrix[i][j] = (double)myid; } } /* send block to id+1, recv block from id-1 the send block starts at myid*l_proc the data are received as a contigous block of l_proc*l_proc doubles */ id_send = (myid + 1) % numprocs; id_recv = (myid - 1 + numprocs) % numprocs; idx_send = myid*l_proc; idx_recv = id_recv*l2_proc; start_time=MPI_Wtime(); if (blocking) { if (myid % 2) { MPI_Send(&d_matrix[0][idx_send],1,block,id_send,myid, MPI_COMM_WORLD); fprintf(stderr,"id=%i: MPI_Send done.\n",myid); if (send_and_receive) { MPI_Recv(&work[idx_recv],l2_proc,MPI_DOUBLE,id_recv,id_recv, MPI_COMM_WORLD,&r_status); fprintf(stderr,"id=%i: MPI_Recv done.\n",myid); } } else { MPI_Recv(&work[idx_recv],l2_proc,MPI_DOUBLE,id_recv,id_recv, MPI_COMM_WORLD,&r_status); fprintf(stderr,"id=%i: MPI_Recv done.\n",myid); if (send_and_receive) { MPI_Send(&d_matrix[0][idx_send],1,block,id_send,myid, MPI_COMM_WORLD); fprintf(stderr,"id=%i: MPI_Send done.\n",myid); } } } else { if (send_and_receive) { MPI_Irecv(&work[idx_recv],l2_proc,MPI_DOUBLE,id_recv,id_recv, MPI_COMM_WORLD,&r_req); MPI_Isend(&d_matrix[0][idx_send],1,block,id_send,myid, MPI_COMM_WORLD,&s_req); MPI_Wait(&s_req,&s_status); fprintf(stderr,"id=%i: MPI_Isend done.\n",myid); MPI_Wait(&r_req,&r_status); fprintf(stderr,"id=%i: MPI_Irecv done.\n",myid); } else { if (myid % 2) { MPI_Isend(&d_matrix[0][idx_send],1,block,id_send,myid, MPI_COMM_WORLD,&s_req); MPI_Wait(&s_req,&s_status); fprintf(stderr,"id=%i: MPI_Isend done.\n",myid); } else { MPI_Irecv(&work[idx_recv],l2_proc,MPI_DOUBLE,id_recv,id_recv, MPI_COMM_WORLD,&r_req); MPI_Wait(&r_req,&r_status); fprintf(stderr,"id=%i: MPI_Irecv done.\n",myid); } } } elapsed_time=MPI_Wtime()-start_time; printf("id=%i: elapsed time: %f s\n",myid,elapsed_time); MPI_Finalize(); }
- Previous message: HPL Compile Problems: resolved.
- Next message: LAM and channel bonding doesn't work?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
