[Beowulf] MPICH problem

Isaac Dooley idooley at isaacdooley.com
Tue May 25 12:06:24 PDT 2004


I've gotten similar problems on a linux Xeon cluster with ethernet, and mpich2-0.96p2. I ended up just using mpich-1.2.5.2. Both were compiled from source with gcc(for c mpi programs). With version 2-0.96p2 I could not get any sample program to run on more than a single node(which incidently worked), even those that just initialize MPI and don't do any real message passing.

Which version are you using?

Isaac Dooley


>I'm having some problems running some mpi programs in a beowulf cluster.
>The cluster is composed of 12 Linux machines and the compilation of the
>mpich libraries run well. I've also configured the machines.LINUX file
>so that it lists all machines available in the cluster. When I try to
>run some program I get the following error:
>
>$ mpirun -np 3 cpi
>rm_924:  p4_error: rm_start: net_conn_to_listener failed: 33064
>p0_22381:  p4_error: Child process exited while making connection to
>remote process on a01: 0
>/opt/mpich/bin/mpirun: line 1: 22381 Broken
>pipe             /nfshome/ex/cpi -p4pg /nfshome/ex/PI22264
>-p4wd /nfshome/ex
>
>The /nfshome is a nfs shared directory. The a01 is accessible by rsh.
>Can someone help me with this error?





More information about the Beowulf mailing list