[Beowulf] Fwd: warewulf - cannot log into nodes

Fri Nov 30 01:43:51 PST 2012

On 11/30/12 12:12 AM, Gus Correa wrote:
> On 11/29/2012 06:35 AM, Duke Nguyen wrote:
>> On 11/29/12 5:52 PM, Duke Nguyen wrote:
>>> On 11/28/12 1:56 AM, Gus Correa wrote:
>>>> On 11/27/2012 01:52 PM, Gus Correa wrote:
>>>>> On 11/27/2012 02:14 AM, Duke Nguyen wrote:
>>>>>> On 11/27/12 1:44 PM, Christopher Samuel wrote:
>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>> Hash: SHA1
>>>>>>>
>>>>>>> On 27/11/12 15:51, Duke Nguyen wrote:
>>>>>>>
>>>>>>>> Thanks! Yes, I am trying to get the system work with
>>>>>>>> Torque/Maui/OpenMPI now.
>>>>>>> Make sure you build Open-MPI with support for Torques TM interface,
>>>>>>> that will save you a lot of hassle as it means mpiexec/mpirun will
>>>>>>> find out directly from Torque what nodes and processors have been
>>>>>>> allocated for the job.
>>>>>> Christopher, how would I check that? I got Torque/Maui/OpenMPI up,
>>>>>> working with root (not with normal user yet :( !!!), tried mpirun
>>>>>> and it
>>>>>> worked fine:
>>>>>>
>>>> PS - Do 'qsub myjob' as a regular user, not as root.
>>>>
>>>>>> # /usr/lib64/openmpi/bin/mpirun -pernode --hostfile
>>>>>> /home/mpiwulf/.openmpihostfile /home/mpiwulf/test/mpihello
>>>>>> Hello world! I am process number: 3 on host node0118
>>>>>> Hello world! I am process number: 1 on host node0104
>>>>>> Hello world! I am process number: 0 on host node0103
>>>>>> Hello world! I am process number: 2 on host node0117
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> D.
>>>>> D.
>>>>>
>>>>> Try to omit the hostfile from your mpirun command line,
>>>>> put it inside a Torque/PBS script, and submit it with qsub.
>>>>> Like this:
>>>>>
>>>>> *********************************
>>>>> myPBSScript.tcsh
>>>>> *********************************
>>>>> #! /bin/tcsh
>>>>> #PBS -l nodes=2:ppn=8 [Assuming your Torque 'nodes' file has np=8]
>>>>> #PBS -q batch at mycluster.mydomain
>>>>> #PBS -N hello
>>>>> @ NP = `cat $PBS_NODEFILE | wc -l`
>>>>> mpirun -np ${NP} ./mpihello
>>>>> *********************************
>>>>>
>>>>> $ qsub myPBSScript.tcsh
>>>>>
>>>>>
>>>>> If OpenMPI was built with Torque support,
>>>>> the job will run on the nodes/processors allocated by Torque.
>>>>> [The nodes/processors are listed in $PBS_NODEFILE,
>>>>> but you don't need to refer to it in the mpirun line if
>>>>> OpenMPI was built with Torque support. If OpenMPI lacks
>>>>> Torque support, then you can use $PBS_NODEFILE as your hostfile:
>>>>> mpirun -hostfile $PBS_NODEFILE.]
>>>>>
>>>>> If Torque was installed in a standard place, say under /usr,
>>>>> then OpenMPI configure will pick it up automatically.
>>>>> If not in a standard location, then add
>>>>> --with-tm=/torque/directory
>>>>> to the OpenMPI configure line.
>>>>> [./configure --help is your friend!]
>>>>>
>>>>> Another check:
>>>>>
>>>>> $ ompi_info [tons of output that you can grep for "tm" to see
>>>>> if Torque was picked up.]
>>>>>
>>>>>
>>> OK, after a huge headache of torque/maui things, I finally found out
>>> that my master node's system was a mess :D. Multiple version of torque
>>> (via yum and via src etc...) which cause the confuse for different
>>> users logging in (root or normal users) - well, mainly because I
>>> followed different guides on the net. Then I decided to delete
>>> everything related to pbs (torque, maui, openmpi) and start from
>>> scratch. So I built torque rpms for masters/nodes, installed them,
>>> then built maui rpm, installed with support for torque, then built
>>> openmpi rpm with support for torque too. This time I think I got
>>> almost everything:
>>>
>>> [mpiwulf at biobos:~]$ ompi_info | grep tm
>>>                    MCA ras: tm (MCA v2.0, API v2.0, Component v1.6.3)
>>>                    MCA plm: tm (MCA v2.0, API v2.0, Component v1.6.3)
>>>                    MCA ess: tm (MCA v2.0, API v2.0, Component v1.6.3)
>>>
>>> openmpi now works with infiniband:
>>>
>>> [mpiwulf at biobos:~]$ /usr/local/bin/mpirun -mca btl ^tcp -pernode
>>> --hostfile /home/mpiwulf/.openmpihostfile /home/mpiwulf/test/mpihello
>>> Hello world!  I am process number: 3 on host node0118
>>> Hello world!  I am process number: 1 on host node0104
>>> Hello world!  I am process number: 2 on host node0117
>>> Hello world!  I am process number: 0 on host node0103
>>>
>>> openmpi also works with torque:
>>>
>>> ----------------
>>> [mpiwulf at biobos:~]$ cat test/KCBATCH
>>> #!/bin/bash
>>> #
>>> #PBS -l nodes=6:ppn=1
>>> #PBS -N kcTEST
>>> #PBS -m be
>>> #PBS -e qsub.er.log
>>> #PBS -o qsub.ou.log
>>> #
>>> { time {
>>> /usr/local/bin/mpirun /home/mpiwulf/test/mpihello
>>> } }&>output.log
>>>
>>> [mpiwulf at biobos:~]$ qsub test/KCBATCH
>>> 21.biobos
>>>
>>> [mpiwulf at biobos:~]$ cat output.log
>>> --------------------------------------------------------------------------
>>>
>>> The OpenFabrics (openib) BTL failed to initialize while trying to
>>> allocate some locked memory.  This typically can indicate that the
>>> memlock limits are set too low.  For most HPC installations, the
>>> memlock limits should be set to "unlimited".  The failure occured
>>> here:
>>>
>>>     Local host:    node0103
>>>     OMPI source:   btl_openib_component.c:1200
>>>     Function:      ompi_free_list_init_ex_new()
>>>     Device:        mthca0
>>>     Memlock limit: 65536
>>>
>>> You may need to consult with your system administrator to get this
>>> problem fixed.  This FAQ entry on the Open MPI web site may also be
>>> helpful:
>>>
>>> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>>
>>> WARNING: There was an error initializing an OpenFabrics device.
>>>
>>>     Local host:   node0103
>>>     Local device: mthca0
>>> --------------------------------------------------------------------------
>>>
>>> Hello world!  I am process number: 5 on host node0103
>>> Hello world!  I am process number: 0 on host node0104
>>> Hello world!  I am process number: 2 on host node0110
>>> Hello world!  I am process number: 4 on host node0118
>>> Hello world!  I am process number: 1 on host node0109
>>> Hello world!  I am process number: 3 on host node0117
>>> [node0104:02221] 5 more processes have sent help message
>>> help-mpi-btl-openib.txt / init-fail-no-mem
>>> [node0104:02221] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>> see all help / error messages
>>> [node0104:02221] 5 more processes have sent help message
>>> help-mpi-btl-openib.txt / error in device init
>>>
>>> real    0m0.291s
>>> user    0m0.034s
>>> sys     0m0.043s
>>> ----------------
>>>
>>> Unfortunately I still got the problem of "error registering openib
>>> memory" with non-interactive job. Any experience on this would be great.
>> Got it now, though I *do not* really like the solution. I had to edit
>> the pbs_mom daemon:
>>
>> # vi /etc/rc.d/init.d/pbs_mom
>>
>> and make sure to have:
>>
>> ulimit -l unlimited
>> #ulimit -n 32768
>>
>> and now openib works fine :).
>>
>> D.
>>
>>
> Hi Duke
>
> It is great news that you've figured it all and got everything working.

Thanks! :)

>
> Yes, in a cluster, installing Torque,Maui,
> and any MPI (OpenMPI, MVAPICH2, MPICH2,
> etc) works much better from source than from yum,
> because they can/will be configured to match your hardware,
> the compilers of your choice, allow resource manager support (Torque),
> etc.
> But the yum RPMs are probably fine for work in a single workstation.
> I should have mentioned that in my previous email.

No problem. It was actually better for me, I learnt more :).

>
> It is worth keeping an eye at the Torque and Maui admin guides and
> mailing lists, and likewise for the various MPI mailing lists,
> as they are active and helpful:
>
> http://www.adaptivecomputing.com/support/documentation/
> http://www.supercluster.org/mailman/listinfo/torqueusers
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
> http://www.open-mpi.org/community/lists/ompi.php
> http://mvapich.cse.ohio-state.edu/support/mailing_lists.shtml
> http://www.mpich.org/support/mailing-lists/

Thanks, yes, I am already in most of these lists (but have not voiced 
much, since I dont even know what to ask :D).

>
> ***
>
> If you have an NFS filesystem or directory shared across the cluster,
> you can install applications (MPI, compilers, etc) there
> (but for Torque it is better to install it on local disks, as you did).
> This scheme scales OK for small clusters,
> and simplifies the installation and maintenance process.
> Say, if you may want to keep different versions of MPI,
> compiled with different compilers, etc, maintaining everything
> can be time consuming.
> However your solution of creating RPMs works for any cluster,
> and is probably the best for large clusters.

Hum, good to know this :). I decided to build rpms since after 
installing with sources (./configure, make, make install) sometimes it 
is very hard to remove the package. Installing using rpm is much better 
if I want to remove using rpm or yum.

>
> ***
>
> I suggest that you take a look at the environment modules package,
> which a great tool that allows users to switch
> their environment across different compilers, MPI versions, etc:
>
> http://modules.sourceforge.net/
>
> In my opinion they work much better than hardwiring static choices
> in .bashrc/.tcshrc or in /etc/profile.d

Very interesting and useful advice. I will try this for sure.

>
> ***
>
> Yes, limits on locked memory are a hurdle for OpenFabrics
> registered memory.
> It seems to be an OpenFabrics problem, not an OpenMPI problem.
> It may affect only the OMPI 1.6 series, not the older 1.4, but I
> am not sure about this.
> There have been several recent posts on the OpenMPI list of problems
> similar to the one you had, if you care to check their archives:
>
> http://www.open-mpi.org/community/lists/users/
>
> For most real applications and number crunching parallel jobs,
> the default (and small) Linux stacksize may also be a hurdle,
> so you may want to make it unlimited or at least larger.
> Likewise, you may want to increase the maximum
> number of open file handles.  This can be done in the
> pbs_mom script and perhaps also in /etc/security/limits.conf.

I found a much better way: instead of modifying /etc/init.d/pbs_mom, I 
created a file /etc/sysconfig/pbs_mom and control the memory there.

Bests,

D.