[Beowulf] MPICH problem
idooley at isaacdooley.com
Tue May 25 12:06:24 PDT 2004
I've gotten similar problems on a linux Xeon cluster with ethernet, and mpich2-0.96p2. I ended up just using mpich-126.96.36.199. Both were compiled from source with gcc(for c mpi programs). With version 2-0.96p2 I could not get any sample program to run on more than a single node(which incidently worked), even those that just initialize MPI and don't do any real message passing.
Which version are you using?
>I'm having some problems running some mpi programs in a beowulf cluster.
>The cluster is composed of 12 Linux machines and the compilation of the
>mpich libraries run well. I've also configured the machines.LINUX file
>so that it lists all machines available in the cluster. When I try to
>run some program I get the following error:
>$ mpirun -np 3 cpi
>rm_924: p4_error: rm_start: net_conn_to_listener failed: 33064
>p0_22381: p4_error: Child process exited while making connection to
>remote process on a01: 0
>/opt/mpich/bin/mpirun: line 1: 22381 Broken
>pipe /nfshome/ex/cpi -p4pg /nfshome/ex/PI22264
>The /nfshome is a nfs shared directory. The a01 is accessible by rsh.
>Can someone help me with this error?
More information about the Beowulf