[Beowulf] errors while testing machines

Glen Gardner Glen.Gardner at verizon.net
Sat Dec 11 19:46:41 PST 2004


The error in the 5th step is caused by a chatty login message. This 
makes mpi complain but it ought to work anyway.
You want to turn off motd, and if using freebsd create a file called 
".huslogin" and put it in the users home directory.

The next error is to do with paths to mpich and to the program being 
launched.
All the nodes need to be able to "see" the  mpi binaries and need to be 
able to see the executable program.
The paths to mpi and the program being launched need to be the same for 
all nodes and for the root node.
Make sure the path is seutup properly in the environment. You may need 
to chek your mount points and setup NFS properly.


The last one probably has to do with name resolution.
The root node usually won't need to be in the machines.linux file, but 
all other nodes need to be.
I believe you need to list machines by hostname, not ip addresses so be 
sure that both machines have the same hostfile, same .rhosts, etc.

Glen


The next message indicates that the path to the executable "mpichfoo" 
was not found.  

akhtar Rasool wrote:

> After the extraction of MPICH in /usr/local
>
>  
>
> 1- tcsh              
>
> 2- ./configure -with-comm=shared --prefix=/usr/local
>
> 3-  make
>
> 4-  make install
>
> 5-  util/tstmachines
>
> in the 5th step error was
>
> Errors while trying to run  rsh 192.168.0.25 -n /bin/ls  
> /usr/local/mpich/mpich-1.2.5.2/mpichfoo     unexpected response from 
> 192.168.0.25
>
>  
>
> n      > /bin/ls: /usr/local/mpich/mpich-1.2.5.2/mpichfoo:
>
> n      no such file or directory
>
> The ls test failed on some machines.
>
> This usually means that u donot have a common filesystem on all of the 
> machines in your machines list; MPICH requires this for mpirun (it is 
> possible to handle this in a procgroup file; see the......)
>
> Other possible problems include:-
>
> The remote shell command rsh doesnot allow you to run ls.
>
> See the doc abt remote shell & rhosts
>
>  
>
> You have common filesystem, but with inconsistent names
>
> See the doc on the automounter fix
>
> 1 error were encountered while testing the machines list for LINUX
>
> only these machines seem to be available
>
> host1
>
>  
>
>  
>
>  
>
>    
>
> now since this is only a two node cluster host1 is the server on to 
> which MPICH is being installed. & 192.168.0.25 is the client.....
>
> rsh on both nodes is logging freely.......
>
> On the server side the file    " machines.LINUX  " contains  
>
> -192.168.0.25
>
> -host1
>
> Kindly help
>
>   
>
>  
>
> Akhtar
>
> ------------------------------------------------------------------------
> Do you Yahoo!?
> The all-new My Yahoo! <http://my.yahoo.com> - What will yours do?
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  
>

-- 
Glen E. Gardner, Jr.
AA8C
AMSAT MEMBER 10593
Glen.Gardner at verizon.net


http://members.bellatlantic.net/~vze24qhw/index.html


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20041211/7b1a3d46/attachment.html>


More information about the Beowulf mailing list