[Beowulf] Fwd: warewulf - cannot log into nodes

Thu Nov 29 09:12:16 PST 2012

On 11/29/2012 06:35 AM, Duke Nguyen wrote:
> On 11/29/12 5:52 PM, Duke Nguyen wrote:
>> On 11/28/12 1:56 AM, Gus Correa wrote:
>>> On 11/27/2012 01:52 PM, Gus Correa wrote:
>>>> On 11/27/2012 02:14 AM, Duke Nguyen wrote:
>>>>> On 11/27/12 1:44 PM, Christopher Samuel wrote:
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>>
>>>>>> On 27/11/12 15:51, Duke Nguyen wrote:
>>>>>>
>>>>>>> Thanks! Yes, I am trying to get the system work with
>>>>>>> Torque/Maui/OpenMPI now.
>>>>>> Make sure you build Open-MPI with support for Torques TM interface,
>>>>>> that will save you a lot of hassle as it means mpiexec/mpirun will
>>>>>> find out directly from Torque what nodes and processors have been
>>>>>> allocated for the job.
>>>>> Christopher, how would I check that? I got Torque/Maui/OpenMPI up,
>>>>> working with root (not with normal user yet :( !!!), tried mpirun
>>>>> and it
>>>>> worked fine:
>>>>>
>>> PS - Do 'qsub myjob' as a regular user, not as root.
>>>
>>>>> # /usr/lib64/openmpi/bin/mpirun -pernode --hostfile
>>>>> /home/mpiwulf/.openmpihostfile /home/mpiwulf/test/mpihello
>>>>> Hello world! I am process number: 3 on host node0118
>>>>> Hello world! I am process number: 1 on host node0104
>>>>> Hello world! I am process number: 0 on host node0103
>>>>> Hello world! I am process number: 2 on host node0117
>>>>>
>>>>> Thanks,
>>>>>
>>>>> D.
>>>> D.
>>>>
>>>> Try to omit the hostfile from your mpirun command line,
>>>> put it inside a Torque/PBS script, and submit it with qsub.
>>>> Like this:
>>>>
>>>> *********************************
>>>> myPBSScript.tcsh
>>>> *********************************
>>>> #! /bin/tcsh
>>>> #PBS -l nodes=2:ppn=8 [Assuming your Torque 'nodes' file has np=8]
>>>> #PBS -q batch at mycluster.mydomain
>>>> #PBS -N hello
>>>> @ NP = `cat $PBS_NODEFILE | wc -l`
>>>> mpirun -np ${NP} ./mpihello
>>>> *********************************
>>>>
>>>> $ qsub myPBSScript.tcsh
>>>>
>>>>
>>>> If OpenMPI was built with Torque support,
>>>> the job will run on the nodes/processors allocated by Torque.
>>>> [The nodes/processors are listed in $PBS_NODEFILE,
>>>> but you don't need to refer to it in the mpirun line if
>>>> OpenMPI was built with Torque support. If OpenMPI lacks
>>>> Torque support, then you can use $PBS_NODEFILE as your hostfile:
>>>> mpirun -hostfile $PBS_NODEFILE.]
>>>>
>>>> If Torque was installed in a standard place, say under /usr,
>>>> then OpenMPI configure will pick it up automatically.
>>>> If not in a standard location, then add
>>>> --with-tm=/torque/directory
>>>> to the OpenMPI configure line.
>>>> [./configure --help is your friend!]
>>>>
>>>> Another check:
>>>>
>>>> $ ompi_info [tons of output that you can grep for "tm" to see
>>>> if Torque was picked up.]
>>>>
>>>>
>>
>> OK, after a huge headache of torque/maui things, I finally found out
>> that my master node's system was a mess :D. Multiple version of torque
>> (via yum and via src etc...) which cause the confuse for different
>> users logging in (root or normal users) - well, mainly because I
>> followed different guides on the net. Then I decided to delete
>> everything related to pbs (torque, maui, openmpi) and start from
>> scratch. So I built torque rpms for masters/nodes, installed them,
>> then built maui rpm, installed with support for torque, then built
>> openmpi rpm with support for torque too. This time I think I got
>> almost everything:
>>
>> [mpiwulf at biobos:~]$ ompi_info | grep tm
>>                   MCA ras: tm (MCA v2.0, API v2.0, Component v1.6.3)
>>                   MCA plm: tm (MCA v2.0, API v2.0, Component v1.6.3)
>>                   MCA ess: tm (MCA v2.0, API v2.0, Component v1.6.3)
>>
>> openmpi now works with infiniband:
>>
>> [mpiwulf at biobos:~]$ /usr/local/bin/mpirun -mca btl ^tcp -pernode
>> --hostfile /home/mpiwulf/.openmpihostfile /home/mpiwulf/test/mpihello
>> Hello world!  I am process number: 3 on host node0118
>> Hello world!  I am process number: 1 on host node0104
>> Hello world!  I am process number: 2 on host node0117
>> Hello world!  I am process number: 0 on host node0103
>>
>> openmpi also works with torque:
>>
>> ----------------
>> [mpiwulf at biobos:~]$ cat test/KCBATCH
>> #!/bin/bash
>> #
>> #PBS -l nodes=6:ppn=1
>> #PBS -N kcTEST
>> #PBS -m be
>> #PBS -e qsub.er.log
>> #PBS -o qsub.ou.log
>> #
>> { time {
>> /usr/local/bin/mpirun /home/mpiwulf/test/mpihello
>> } }&>output.log
>>
>> [mpiwulf at biobos:~]$ qsub test/KCBATCH
>> 21.biobos
>>
>> [mpiwulf at biobos:~]$ cat output.log
>> --------------------------------------------------------------------------
>>
>> The OpenFabrics (openib) BTL failed to initialize while trying to
>> allocate some locked memory.  This typically can indicate that the
>> memlock limits are set too low.  For most HPC installations, the
>> memlock limits should be set to "unlimited".  The failure occured
>> here:
>>
>>    Local host:    node0103
>>    OMPI source:   btl_openib_component.c:1200
>>    Function:      ompi_free_list_init_ex_new()
>>    Device:        mthca0
>>    Memlock limit: 65536
>>
>> You may need to consult with your system administrator to get this
>> problem fixed.  This FAQ entry on the Open MPI web site may also be
>> helpful:
>>
>> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>> --------------------------------------------------------------------------
>>
>> --------------------------------------------------------------------------
>>
>> WARNING: There was an error initializing an OpenFabrics device.
>>
>>    Local host:   node0103
>>    Local device: mthca0
>> --------------------------------------------------------------------------
>>
>> Hello world!  I am process number: 5 on host node0103
>> Hello world!  I am process number: 0 on host node0104
>> Hello world!  I am process number: 2 on host node0110
>> Hello world!  I am process number: 4 on host node0118
>> Hello world!  I am process number: 1 on host node0109
>> Hello world!  I am process number: 3 on host node0117
>> [node0104:02221] 5 more processes have sent help message
>> help-mpi-btl-openib.txt / init-fail-no-mem
>> [node0104:02221] Set MCA parameter "orte_base_help_aggregate" to 0 to
>> see all help / error messages
>> [node0104:02221] 5 more processes have sent help message
>> help-mpi-btl-openib.txt / error in device init
>>
>> real    0m0.291s
>> user    0m0.034s
>> sys     0m0.043s
>> ----------------
>>
>> Unfortunately I still got the problem of "error registering openib
>> memory" with non-interactive job. Any experience on this would be great.
>
> Got it now, though I *do not* really like the solution. I had to edit
> the pbs_mom daemon:
>
> # vi /etc/rc.d/init.d/pbs_mom
>
> and make sure to have:
>
> ulimit -l unlimited
> #ulimit -n 32768
>
> and now openib works fine :).
>
> D.
>
>
Hi Duke

It is great news that you've figured it all and got everything working.

Yes, in a cluster, installing Torque,Maui,
and any MPI (OpenMPI, MVAPICH2, MPICH2,
etc) works much better from source than from yum,
because they can/will be configured to match your hardware,
the compilers of your choice, allow resource manager support (Torque),
etc.
But the yum RPMs are probably fine for work in a single workstation.
I should have mentioned that in my previous email.

It is worth keeping an eye at the Torque and Maui admin guides and
mailing lists, and likewise for the various MPI mailing lists,
as they are active and helpful:

http://www.adaptivecomputing.com/support/documentation/
http://www.supercluster.org/mailman/listinfo/torqueusers
http://www.supercluster.org/mailman/listinfo/mauiusers

http://www.open-mpi.org/community/lists/ompi.php
http://mvapich.cse.ohio-state.edu/support/mailing_lists.shtml
http://www.mpich.org/support/mailing-lists/

***

If you have an NFS filesystem or directory shared across the cluster,
you can install applications (MPI, compilers, etc) there
(but for Torque it is better to install it on local disks, as you did).
This scheme scales OK for small clusters,
and simplifies the installation and maintenance process.
Say, if you may want to keep different versions of MPI,
compiled with different compilers, etc, maintaining everything
can be time consuming.
However your solution of creating RPMs works for any cluster,
and is probably the best for large clusters.

***

I suggest that you take a look at the environment modules package,
which a great tool that allows users to switch
their environment across different compilers, MPI versions, etc:

http://modules.sourceforge.net/

In my opinion they work much better than hardwiring static choices
in .bashrc/.tcshrc or in /etc/profile.d

***

Yes, limits on locked memory are a hurdle for OpenFabrics
registered memory.
It seems to be an OpenFabrics problem, not an OpenMPI problem.
It may affect only the OMPI 1.6 series, not the older 1.4, but I
am not sure about this.
There have been several recent posts on the OpenMPI list of problems
similar to the one you had, if you care to check their archives:

http://www.open-mpi.org/community/lists/users/

For most real applications and number crunching parallel jobs,
the default (and small) Linux stacksize may also be a hurdle,
so you may want to make it unlimited or at least larger.
Likewise, you may want to increase the maximum
number of open file handles.  This can be done in the
pbs_mom script and perhaps also in /etc/security/limits.conf.

I hope this helps,
Gus Correa