[Beowulf] Newbie question on mpich2 installation
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Saleem Hasan hasan at grant.phys.subr.eduMon Feb 7 18:15:25 PST 2005
- Previous message: [Beowulf] mpich future
- Next message: [Beowulf] cheap 48 port gigabit ethernet switch w/ jumbo frames?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello all, I apologise for what may be a very simple issue but is giving me trouble. I would really appreciate some advice. For learning the setup of a cluster, I have installed mpich2 on a linux machine with Red Hat 8.0. I have a second machine RH 8.0. w2 is the master and w1 is the slave. I have installed mpich2 on w2 (/home/mpi) and used nfs to share /home with w1. I have also setup passwordless ssh between w1 and w2. I am able to bring up mpd on the local machine (w2) and do mpdtrace and mpdallexit. I am following the installation procedure from the MPICH2 home. I am unable to boot mpd on the slave. The first time I ran mpdboot -n 2 -f /home/mpi/mpd.hosts, I got the message that there was no mpd.conf file in w1 and that could be a reason for the mpd not coming up the slave. I added an mpd.conf (secretword) to /etc in the slave also. Now I get a different message [root at w2 mpich2-1.0]# mpdboot -n 2 -f /home/mpi/mpd.hosts mpdboot_w2.maverick.net_0 (mpdboot 357): error trying to start mpd(boot) at 1 w1.maverick.net; output: mpdboot_w1_1 (err_exit 379): mpd failed to start correctly on w1 reason: 1: invalid msg from mpd :{}: mpdboot_w1_1 (err_exit 385): contents of mpd logfile in /tmp: logfile for mpd with pid 1654 mpdboot_w2.maverick.net_0 (err_exit 379): mpd failed to start correctly on w2.maverick.net Even though the message says mpd failed to start coorectly on w2 (last line), mpdtrace gives w2. The log file in w1 (slave) states the following logfile for mpd with pid 1654 w1_1060 failed ; cause: unable to obtain socket for rhs in ring traceback: [('/home/mpi/mpich2-install/bin/mpd.py', '1192', '_enter_existing_ring'), ('/home/mpi/mpich2-install/bin/mpd.py', '173', '_mpd_init'), ('/home/mpi/mpich2-install/bin/mpd.py', '1374', '?')] Thank you very much. Saleem Hasan
- Previous message: [Beowulf] mpich future
- Next message: [Beowulf] cheap 48 port gigabit ethernet switch w/ jumbo frames?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
