[Beowulf] error in mpich!!!!

Glen Gardner Glen.Gardner at verizon.net
Fri Sep 17 15:45:53 PDT 2004


It looks like rsh might not be working and mpi is confused by the 
"connection refused" message.

Make sure you can rsh to all the nodes from the controlling node, and 
make sure all the ndoes can rsh to each other.


Glen

Vinicius wrote:

>what is this error???
>please.
>
>[root at swingle mpich-1.2.6]# bin/tstmachines
>Errors while trying to run rsh swingle -n true
>Unexpected response from swingle:
>--> connect to address 192.168.3.1: Connection refused
>connect to address 192.168.3.1: Connection refused
>trying normal rsh (/usr/bin/rsh)
>If your .cshrc, login, .bashrc, or other startup file
>contains a command that generates any output when logging in,
>such as fortune or hostname or even echo, you should modify
>that startup file to only print such a message when the
>process is attached to a terminal.  Examples of how to do
>this are in the Users Manual.  If you do not do this, MPICH
>will still work, but this script and the test programs will
>report problems because they compare expected output from
>what the programs produce.
>Unexpected response from swingle3:
>--> connect to address 192.168.3.3: Connection refused
>connect to address 192.168.3.3: Connection refused
>trying normal rsh (/usr/bin/rsh)
>If your .cshrc, login, .bashrc, or other startup file
>contains a command that generates any output when logging in,
>such as fortune or hostname or even echo, you should modify
>that startup file to only print such a message when the
>process is attached to a terminal.  Examples of how to do
>this are in the Users Manual.  If you do not do this, MPICH
>will still work, but this script and the test programs will
>report problems because they compare expected output from
>what the programs produce.
>Unexpected response from swingle4:
>--> connect to address 192.168.3.4: Connection refused
>connect to address 192.168.3.4: Connection refused
>trying normal rsh (/usr/bin/rsh)
>If your .cshrc, login, .bashrc, or other startup file
>contains a command that generates any output when logging in,
>such as fortune or hostname or even echo, you should modify
>that startup file to only print such a message when the
>process is attached to a terminal.  Examples of how to do
>this are in the Users Manual.  If you do not do this, MPICH
>will still work, but this script and the test programs will
>report problems because they compare expected output from
>what the programs produce.
>The test of rsh <machine> true  failed on some machines.
>This may be due to problems in your .login or .cshrc files;
>some common problems are described when detected.  Look at the
>output above to see what the problem is.
>
>If the problem is something like 'permission denied', then the
>remote shell command rsh does not allow you to run programs.
>See the documentation about remote shell and rhosts.
>
>Errors while trying to run rsh swingle -n /bin/ls
>/usr/local/mpich-1.2.6/mpichfoo
>Unexpected response from swingle:
>--> connect to address 192.168.3.1: Connection refused
>connect to address 192.168.3.1: Connection refused
>trying normal rsh (/usr/bin/rsh)
>/usr/local/mpich-1.2.6/mpichfoo
>Unexpected response from swingle3:
>--> connect to address 192.168.3.3: Connection refused
>connect to address 192.168.3.3: Connection refused
>trying normal rsh (/usr/bin/rsh)
>/bin/ls: /usr/local/mpich-1.2.6/mpichfoo: No such file or directory
>Unexpected response from swingle4:
>--> connect to address 192.168.3.4: Connection refused
>connect to address 192.168.3.4: Connection refused
>trying normal rsh (/usr/bin/rsh)
>/bin/ls: /usr/local/mpich-1.2.6/mpichfoo: No such file or directory
>The ls test failed on some machines.
>This usually means that you do not have a common filesystem on
>all of the machines in your machines list; MPICH requires this
>for mpirun (it is possible to handle this in a procgroup file; see
>the documentation for more details).
>
>Other possible problems include:
>The remote shell command rsh does not allow you to run ls.
>See the documentation about remote shell and rhosts.
>You have a common file system, but with inconsistent names.
>See the documentation on the automounter fix.
>
>
>3 errors were encountered while testing the machines list for LINUX
>No machines seem to be available!
>[root at swingle mpich-1.2.6]# cd examples/
>[root at swingle examples]# cd basic/
>[root at swingle basic]# mpirun -np 3 cpi
>connect to address 192.168.3.1: Connection refused
>Trying krb4 rsh...
>connect to address 192.168.3.1: Connection refused
>trying normal rsh (/usr/bin/rsh)
>connect to address 192.168.3.1: Connection refused
>Trying krb4 rsh...
>connect to address 192.168.3.1: Connection refused
>trying normal rsh (/usr/bin/rsh)
>Process 0 of 3 on swingle
>pi is approximately 3.1415926544231323, Error is 0.0000000008333392
>wall clock time = 0.002702
>Process 1 of 3 on swingle
>Process 2 of 3 on swingle
>________________________________________________
>Message sent using UebiMiau
>2.7.2
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>

-- 
Glen E. Gardner, Jr.
AA8C
AMSAT MEMBER 10593



http://members.bellatlantic.net/~vze24qhw/index.html






More information about the Beowulf mailing list