[Beowulf] LAM -beowulf problems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Reuti reuti at staff.uni-marburg.deSun Dec 24 13:53:02 PST 2006
- Previous message: [Beowulf] LAM -beowulf problems
- Next message: [Beowulf] LAM -beowulf problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, first of all I would suggest to look into the most recent version of LAM/MPI, which is 7.1.2 or OpenMPI. Which shell are you using? For bash maybe you have to add the PATH to your LAM/MPI binaries in .bashrc i.e. a file, that is sourced during a non-interactive login. -- Reuti Am 20.12.2006 um 17:45 schrieb Mr. Sumit Saxena: > Hi > I am new to linux as well as beowulf, please help me. > I tried to hook up two machines and run LAM but I am not able to > lamboot. I can lamboot on each machine individually but not from > master > to master and slave. I have provided the link of the libraries of LAM > in my ld.so.conf as wellas .bash_profile, still I see the following > error message. Also I am able to ssh into machines without > passwords. I > followed the following document to setup my machines > http://tldp.org/HOWTO/html_single/Beowulf-HOWTO/ > ++++++++++++++++++++++++++++++++++++++++++++++ > LAM 6.5.9/MPI 2 C++ - Indiana University > > Executing hboot on n0 (surya01 - 1 CPU)... > Executing hboot on n1 (surya02 - 1 CPU)... > bash: line 1: hboot: command not found > ---------------------------------------------------------------------- > ------- > LAM failed to execute a LAM binary on the remote node "surya02". > Since LAM was already able to determine your remote shell as "hboot", > it is probable that this is not an authentication problem. > > LAM tried to use the remote agent command "ssh" > to invoke the following command: > > ssh -x surya02 -n hboot -t -c lam-conf.lam -v -s -I "-H > 192.168.13.1 -P > 33628 -n 1 -o 0 " > > This can indicate several things. You should check the following: > > - The LAM binaries are in your $PATH > - You can run the LAM binaries > - The $PATH variable is set properly before your > .cshrc/.profile exits > > Try to invoke the command listed above manually at a Unix prompt. > > You will need to configure your local setup such that you will *not* > be prompted for a password to invoke this command on the remote node. > No output should be printed from the remote node before the output of > the command is displayed. > > When you can get this command to execute successfully by hand, LAM > will probably be able to function properly. > ---------------------------------------------------------------------- > -- > ----- > ---------------------------------------------------------------------- > -- > ----- > lamboot encountered some error (see above) during the boot process, > and will now attempt to kill all nodes that it was previously able to > boot (if any). > > Please wait for LAM to finish; if you interrupt this process, you may > have LAM daemons still running on remote nodes. > ---------------------------------------------------------------------- > -- > ----- > wipe ... > > LAM 6.5.9/MPI 2 C++ - Indiana University > > Executing tkill on n0 (surya01)... > > ++++++++++++++++++++++++++++++++++++++++++++++ > please help > kind regards > Sumit > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] LAM -beowulf problems
- Next message: [Beowulf] LAM -beowulf problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
