lam - recon works lamboot doesn't!
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Steve Yam styam at hns.comMon Aug 6 12:41:45 PDT 2001
- Previous message: lam - recon works lamboot doesn't!
- Next message: Problems running MPI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I believe that your problem is that in your /etc/hosts file, you have the hostname for each machine pointing to 127.0.0.1 (loopback). This causes a problem because lamboot runs lamd on the 2nd (and 3rd.. etc) node giving it the IP address for the 1st node. If kitkat has a /etc/hosts file containing "127.0.0.1 kitkat", then running the remote lamd, snickers will attempt to access 127.0.0.1 for the 1st node, instead of kitkat's real IP address. You can solve the problem by modifying that entry in /etc/hosts to point to the actual ip address instead of the loopback. -Steve Yam Hughes Network Systems Eric Linenberg wrote: > I am trying to run lam and recon works A-OK, but lamboot gives me errors. > Could someone possibly give me some insight into this problem! I have read > everythig I can to no avail. Help a newbie! > > Thanks, > eric > > [guest at kitkat bin]$ lamboot -d -v -b beowulf > > LAM 6.5.4/MPI 2 C++/ROMIO - University of Notre Dame > > lamboot: boot schema file: /usr/local/lam/etc/beowulf > lamboot: opening hostfile /usr/local/lam/etc/beowulf > lamboot: found the following hosts: > lamboot: n0 kitkat > lamboot: n1 snickers > lamboot: n2 twix > lamboot: n3 rolo > lamboot: n4 butterfinger > lamboot: found 5 host node(s) > lamboot: origin node is 0 (kitkat) > Executing hboot on n0 (kitkat - 2 CPUs)... > lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H > 127.0.0.1 -P 35993 -n 0 -o 0 > "" > hboot: process schema = "/usr/local/lam/etc/lam-conf.lam" > hboot: found /usr/local/bin/lamd > hboot: performing tkill > hboot: tkill > hboot: booting... > hboot: fork /usr/local/bin/lamd > hboot: attempting to execute > [1] 24080 lamd -H 127.0.0.1 -P 35993 -n 0 -o 0 -d > Executing hboot on n1 (snickers - 2 CPUs)... > lamboot: -b used, assuming same shell on remote nodes > lamboot: got local shell /bin/bash > lamboot: attempting to execute "/usr/bin/rsh snickers -n hboot -t -c > lam-conf.lam -d -v -s -I "-H 127.0.0.1 -P 35993 -n 1 -o 0 "" > hboot: process schema = "/usr/local/lam/etc/lam-conf.lam" > hboot: found /usr/local/lam/bin/lamd > hboot: performing tkill > hboot: tkill > hboot: booting... > hboot: fork /usr/local/lam/bin/lamd > [1] 918 lamd -H 127.0.0.1 -P 35993 -n 1 -o 0 -d > ----------------------------------------------------------------------------- > lamboot encountered some error (see above) during the boot process, > and will now attempt to kill all nodes that it was previously able to > boot (if any). > > Please wait for LAM to finish; if you interrupt this process, you may > have LAM daemons still running on remote nodes. > ----------------------------------------------------------------------------- > wipe ... > > LAM 6.5.4/MPI 2 C++/ROMIO - University of Notre Dame > > Executing tkill on n0 (kitkat)... > Executing tkill on n1 (snickers)... > lamboot did NOT complete successfully > > thanks, > -eric > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: lam - recon works lamboot doesn't!
- Next message: Problems running MPI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
