[Beowulf] use a MPI library thought a shared library

Mathieu Gontier mg.mailing-list at laposte.net
Wed Dec 5 00:28:05 PST 2007


Yep, I use ldd every days. But here the problem comes from a corrupted 
structure in MorphMPI and MPI

typedef struct{
  int MorphMPI_SOURCE;
  int MorphMPI_TAG;
  int MorphMPI_ERROR;
  void* mpi_status ;
} MorphMPI_Status ;

Where the attribut mpi_status is used to point a real MPI_Status. In MPICH:

typedef struct{
  int MPI_SOURCE;
  int MPI_TAG;
  int MPI_ERROR;
  int count ;
} MPI_Status ;

Then, when my MorphMPI_Status is given to MorphMPI_Get_count(), the 
attribut MorphMPI_Status::mpi_status is not corrupted but 
MorphMPI_Status::mpi_status::count is corrupted: the value should be 4 
and not "random".

I tried to manipulate the structure MorphMPI_Status (add another integer 
to align it in 64-bits, only have the void*,...) without success.

As reminder, this problem appears only when the MPI is used through a 
dynamic linked MorphMPI library.

Does someone have an idea?

Mathieu Gontier
Core Development Engineer

Read the attached v-card for telephone, fax, adress
Look at our web-site http://www.fft.be
 



Joe Landman wrote:
> Greetings Mathieu:
>
> Mathieu Gontier wrote:
>
> [...]
>
>> So, I meet a little problem whatever the MPI library used (I tried 
>> with MPICH-1.2.5.2, MPICHGM and IntelMPI).
>> When MorphMPI is  linked statically with my parallel application, 
>> everything is ok; but when MorphMPI is  linked dynamically with my 
>> parallel application, MPI_Get_count return a wrong value.
>>
>> I concluded it is difficult to use a MPI library thought a shared 
>> library. I wonder if someone have more information about it (in this 
>
> Not likely.  I would suggest ldd.  It is your friend.
>
> For example:
>
> joe at pegasus-i:~/workspace/source-mpi$ ldd matmul_mpi_3.exe
>         libm.so.6 => /lib/libm.so.6 (0x00002b5409d17000)
>         libmpi.so.0 => not found
>         libopen-rte.so.0 => not found
>         libopen-pal.so.0 => not found
>         librt.so.1 => /lib/librt.so.1 (0x00002b5409f99000)
>         libdl.so.2 => /lib/libdl.so.2 (0x00002b540a1a2000)
>         libnsl.so.1 => /lib/libnsl.so.1 (0x00002b540a3a6000)
>         libutil.so.1 => /lib/libutil.so.1 (0x00002b540a5c0000)
>         libpthread.so.0 => /lib/libpthread.so.0 (0x00002b540a7c3000)
>         libc.so.6 => /lib/libc.so.6 (0x00002b540a9de000)
>         /lib64/ld-linux-x86-64.so.2 (0x00002b5409af9000)
>
> Notice that libmpi.so.0 is not found, so I can't run this by hand. 
> Unless I force the issue using LD_LIBRARY_PATH
>
> joe at pegasus-i:~/workspace/source-mpi$ export 
> LD_LIBRARY_PATH="/home/joe/local/lib64/:/home/joe/local/lib/"
> joe at pegasus-i:~/workspace/source-mpi$ ldd matmul_mpi_3.exe
>         libm.so.6 => /lib/libm.so.6 (0x00002ae35ca50000)
>         libmpi.so.0 => /home/joe/local/lib/libmpi.so.0 
> (0x00002ae35ccd1000)
>         libopen-rte.so.0 => /home/joe/local/lib/libopen-rte.so.0 
> (0x00002ae35cfe8000)
>         libopen-pal.so.0 => /home/joe/local/lib/libopen-pal.so.0 
> (0x00002ae35d2b3000)
>         librt.so.1 => /lib/librt.so.1 (0x00002ae35d514000)
>         libdl.so.2 => /lib/libdl.so.2 (0x00002ae35d71d000)
>         libnsl.so.1 => /lib/libnsl.so.1 (0x00002ae35d921000)
>         libutil.so.1 => /lib/libutil.so.1 (0x00002ae35db3b000)
>         libpthread.so.0 => /lib/libpthread.so.0 (0x00002ae35dd3e000)
>         libc.so.6 => /lib/libc.so.6 (0x00002ae35df59000)
>         /lib64/ld-linux-x86-64.so.2 (0x00002ae35c832000)
>
> and it might even run ...
>
> joe at pegasus-i:~/workspace/source-mpi$ ./matmul_mpi_3.exe
> D[tid=0]: running on machine = pegasus-i
> D: checking arguments: N_args=1
> D: arg[0] = ./matmul_mpi_3.exe
> Allocating memory ...
> array size in MB = 7.629 MB
>  (remember, you have 2 of these)normalization a: 0.05510,  b: 0.00173
> 0 : loop_min = 0, loop_max = 1000
> ...
>
> Do you have some sort of LD_LIBRARY_PATH set up?  Or something set in 
> /etc/ld.so.config that points to where these things are?  Remember, 
> mpirun/mpiexec's alternative purpose in life is to set up the correct 
> run time environment for you, so you might want to see what is going 
> on with the environment in your equivalent command.
>
>



More information about the Beowulf mailing list