[Beowulf] Fwd: warewulf - cannot log into nodes

Gus Correa gus at ldeo.columbia.edu
Tue Nov 27 10:52:38 PST 2012


On 11/27/2012 02:14 AM, Duke Nguyen wrote:
> On 11/27/12 1:44 PM, Christopher Samuel wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 27/11/12 15:51, Duke Nguyen wrote:
>>
>>> Thanks! Yes, I am trying to get the system work with
>>> Torque/Maui/OpenMPI now.
>> Make sure you build Open-MPI with support for Torques TM interface,
>> that will save you a lot of hassle as it means mpiexec/mpirun will
>> find out directly from Torque what nodes and processors have been
>> allocated for the job.
>
> Christopher, how would I check that? I got Torque/Maui/OpenMPI up,
> working with root (not with normal user yet :( !!!), tried mpirun and it
> worked fine:
>
> # /usr/lib64/openmpi/bin/mpirun -pernode --hostfile
> /home/mpiwulf/.openmpihostfile /home/mpiwulf/test/mpihello
> Hello world!  I am process number: 3 on host node0118
> Hello world!  I am process number: 1 on host node0104
> Hello world!  I am process number: 0 on host node0103
> Hello world!  I am process number: 2 on host node0117
>
> Thanks,
>
> D.

D.

Try to omit the hostfile from your mpirun command line,
put it inside a Torque/PBS script, and submit it with qsub.
Like this:

*********************************
myPBSScript.tcsh
*********************************
#! /bin/tcsh
#PBS -l nodes=2:ppn=8  [Assuming your Torque 'nodes' file has np=8]
#PBS -q batch at mycluster.mydomain
#PBS -N hello
@ NP = `cat $PBS_NODEFILE | wc -l`
mpirun -np ${NP} ./mpihello
*********************************

$ qsub myPBSScript.tcsh


If OpenMPI was built with Torque support,
the job will run on the nodes/processors allocated by Torque.
[The nodes/processors are listed in $PBS_NODEFILE,
but you don't need to refer to it in the mpirun line if
OpenMPI was built with Torque support. If OpenMPI lacks
Torque support, then you can use $PBS_NODEFILE as your hostfile:
mpirun -hostfile $PBS_NODEFILE.]

If Torque was installed in a standard place, say under /usr,
then OpenMPI configure will pick it up automatically.
If not in a standard location, then add
--with-tm=/torque/directory
to the OpenMPI configure line.
[./configure --help is your friend!]

Another check:

$ ompi_info  [tons of output that you can grep for "tm" to see
if Torque was picked up.]

I hope this helps,
Gus Correa



More information about the Beowulf mailing list