[Beowulf] errors while testing machines
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Glen Gardner Glen.Gardner at verizon.netSat Dec 11 19:46:41 PST 2004
- Previous message: [Beowulf] errors while testing machines
- Next message: [Beowulf] Error while tstmachines still not solved
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The error in the 5th step is caused by a chatty login message. This makes mpi complain but it ought to work anyway. You want to turn off motd, and if using freebsd create a file called ".huslogin" and put it in the users home directory. The next error is to do with paths to mpich and to the program being launched. All the nodes need to be able to "see" the mpi binaries and need to be able to see the executable program. The paths to mpi and the program being launched need to be the same for all nodes and for the root node. Make sure the path is seutup properly in the environment. You may need to chek your mount points and setup NFS properly. The last one probably has to do with name resolution. The root node usually won't need to be in the machines.linux file, but all other nodes need to be. I believe you need to list machines by hostname, not ip addresses so be sure that both machines have the same hostfile, same .rhosts, etc. Glen The next message indicates that the path to the executable "mpichfoo" was not found. akhtar Rasool wrote: > After the extraction of MPICH in /usr/local > > > > 1- tcsh > > 2- ./configure -with-comm=shared --prefix=/usr/local > > 3- make > > 4- make install > > 5- util/tstmachines > > in the 5th step error was > > Errors while trying to run rsh 192.168.0.25 -n /bin/ls > /usr/local/mpich/mpich-1.2.5.2/mpichfoo unexpected response from > 192.168.0.25 > > > > n > /bin/ls: /usr/local/mpich/mpich-1.2.5.2/mpichfoo: > > n no such file or directory > > The ls test failed on some machines. > > This usually means that u donot have a common filesystem on all of the > machines in your machines list; MPICH requires this for mpirun (it is > possible to handle this in a procgroup file; see the......) > > Other possible problems include:- > > The remote shell command rsh doesnot allow you to run ls. > > See the doc abt remote shell & rhosts > > > > You have common filesystem, but with inconsistent names > > See the doc on the automounter fix > > 1 error were encountered while testing the machines list for LINUX > > only these machines seem to be available > > host1 > > > > > > > > > > now since this is only a two node cluster host1 is the server on to > which MPICH is being installed. & 192.168.0.25 is the client..... > > rsh on both nodes is logging freely....... > > On the server side the file " machines.LINUX " contains > > -192.168.0.25 > > -host1 > > Kindly help > > > > > > Akhtar > > ------------------------------------------------------------------------ > Do you Yahoo!? > The all-new My Yahoo! <http://my.yahoo.com> - What will yours do? > >------------------------------------------------------------------------ > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Glen E. Gardner, Jr. AA8C AMSAT MEMBER 10593 Glen.Gardner at verizon.net http://members.bellatlantic.net/~vze24qhw/index.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20041211/7b1a3d46/attachment.html
- Previous message: [Beowulf] errors while testing machines
- Next message: [Beowulf] Error while tstmachines still not solved
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
