Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] MPICH problem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Isaac Dooley idooley at isaacdooley.com
Tue May 25 12:06:24 PDT 2004


I've gotten similar problems on a linux Xeon cluster with ethernet, and mpich2-0.96p2. I ended up just using mpich-1.2.5.2. Both were compiled from source with gcc(for c mpi programs). With version 2-0.96p2 I could not get any sample program to run on more than a single node(which incidently worked), even those that just initialize MPI and don't do any real message passing.

Which version are you using?

Isaac Dooley


>I'm having some problems running some mpi programs in a beowulf cluster.
>The cluster is composed of 12 Linux machines and the compilation of the
>mpich libraries run well. I've also configured the machines.LINUX file
>so that it lists all machines available in the cluster. When I try to
>run some program I get the following error:
>
>$ mpirun -np 3 cpi
>rm_924:  p4_error: rm_start: net_conn_to_listener failed: 33064
>p0_22381:  p4_error: Child process exited while making connection to
>remote process on a01: 0
>/opt/mpich/bin/mpirun: line 1: 22381 Broken
>pipe             /nfshome/ex/cpi -p4pg /nfshome/ex/PI22264
>-p4wd /nfshome/ex
>
>The /nfshome is a nfs shared directory. The a01 is accessible by rsh.
>Can someone help me with this error?





More information about the Beowulf mailing list