Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] use a MPI library thought a shared library

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mathieu Gontier mg.mailing-list at laposte.net
Wed Dec 5 00:28:05 PST 2007


Yep, I use ldd every days. But here the problem comes from a corrupted 
structure in MorphMPI and MPI

typedef struct{
  int MorphMPI_SOURCE;
  int MorphMPI_TAG;
  int MorphMPI_ERROR;
  void* mpi_status ;
} MorphMPI_Status ;

Where the attribut mpi_status is used to point a real MPI_Status. In MPICH:

typedef struct{
  int MPI_SOURCE;
  int MPI_TAG;
  int MPI_ERROR;
  int count ;
} MPI_Status ;

Then, when my MorphMPI_Status is given to MorphMPI_Get_count(), the 
attribut MorphMPI_Status::mpi_status is not corrupted but 
MorphMPI_Status::mpi_status::count is corrupted: the value should be 4 
and not "random".

I tried to manipulate the structure MorphMPI_Status (add another integer 
to align it in 64-bits, only have the void*,...) without success.

As reminder, this problem appears only when the MPI is used through a 
dynamic linked MorphMPI library.

Does someone have an idea?

Mathieu Gontier
Core Development Engineer

Read the attached v-card for telephone, fax, adress
Look at our web-site http://www.fft.be
 



Joe Landman wrote:
> Greetings Mathieu:
>
> Mathieu Gontier wrote:
>
> [...]
>
>> So, I meet a little problem whatever the MPI library used (I tried 
>> with MPICH-1.2.5.2, MPICHGM and IntelMPI).
>> When MorphMPI is  linked statically with my parallel application, 
>> everything is ok; but when MorphMPI is  linked dynamically with my 
>> parallel application, MPI_Get_count return a wrong value.
>>
>> I concluded it is difficult to use a MPI library thought a shared 
>> library. I wonder if someone have more information about it (in this 
>
> Not likely.  I would suggest ldd.  It is your friend.
>
> For example:
>
> joe at pegasus-i:~/workspace/source-mpi$ ldd matmul_mpi_3.exe
>         libm.so.6 => /lib/libm.so.6 (0x00002b5409d17000)
>         libmpi.so.0 => not found
>         libopen-rte.so.0 => not found
>         libopen-pal.so.0 => not found
>         librt.so.1 => /lib/librt.so.1 (0x00002b5409f99000)
>         libdl.so.2 => /lib/libdl.so.2 (0x00002b540a1a2000)
>         libnsl.so.1 => /lib/libnsl.so.1 (0x00002b540a3a6000)
>         libutil.so.1 => /lib/libutil.so.1 (0x00002b540a5c0000)
>         libpthread.so.0 => /lib/libpthread.so.0 (0x00002b540a7c3000)
>         libc.so.6 => /lib/libc.so.6 (0x00002b540a9de000)
>         /lib64/ld-linux-x86-64.so.2 (0x00002b5409af9000)
>
> Notice that libmpi.so.0 is not found, so I can't run this by hand. 
> Unless I force the issue using LD_LIBRARY_PATH
>
> joe at pegasus-i:~/workspace/source-mpi$ export 
> LD_LIBRARY_PATH="/home/joe/local/lib64/:/home/joe/local/lib/"
> joe at pegasus-i:~/workspace/source-mpi$ ldd matmul_mpi_3.exe
>         libm.so.6 => /lib/libm.so.6 (0x00002ae35ca50000)
>         libmpi.so.0 => /home/joe/local/lib/libmpi.so.0 
> (0x00002ae35ccd1000)
>         libopen-rte.so.0 => /home/joe/local/lib/libopen-rte.so.0 
> (0x00002ae35cfe8000)
>         libopen-pal.so.0 => /home/joe/local/lib/libopen-pal.so.0 
> (0x00002ae35d2b3000)
>         librt.so.1 => /lib/librt.so.1 (0x00002ae35d514000)
>         libdl.so.2 => /lib/libdl.so.2 (0x00002ae35d71d000)
>         libnsl.so.1 => /lib/libnsl.so.1 (0x00002ae35d921000)
>         libutil.so.1 => /lib/libutil.so.1 (0x00002ae35db3b000)
>         libpthread.so.0 => /lib/libpthread.so.0 (0x00002ae35dd3e000)
>         libc.so.6 => /lib/libc.so.6 (0x00002ae35df59000)
>         /lib64/ld-linux-x86-64.so.2 (0x00002ae35c832000)
>
> and it might even run ...
>
> joe at pegasus-i:~/workspace/source-mpi$ ./matmul_mpi_3.exe
> D[tid=0]: running on machine = pegasus-i
> D: checking arguments: N_args=1
> D: arg[0] = ./matmul_mpi_3.exe
> Allocating memory ...
> array size in MB = 7.629 MB
>  (remember, you have 2 of these)normalization a: 0.05510,  b: 0.00173
> 0 : loop_min = 0, loop_max = 1000
> ...
>
> Do you have some sort of LD_LIBRARY_PATH set up?  Or something set in 
> /etc/ld.so.config that points to where these things are?  Remember, 
> mpirun/mpiexec's alternative purpose in life is to set up the correct 
> run time environment for you, so you might want to see what is going 
> on with the environment in your equivalent command.
>
>



More information about the Beowulf mailing list