Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Using LSF

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

bala balahindustani at gmail.com
Wed Feb 20 20:55:59 PST 2008


On 2/21/08, Mark Hahn <hahn at mcmaster.ca> wrote:
>
> > submit the jobs through a job scheduler (LSF in this case). We used the
> > machinefile option with mpirun to order the nodes on which the processes
> has
> > to be started.
> >
> > But i am not able to do this with the current setup where LSF is used
> for
> > scheduling and SLURM for resource management.
> > I have tried a few of the options like using the -m options to bsub for
> > specifying the preference and so on. But of no success.
>
> this sounds like our HP-XC systems.  but I'm a bit mystified:
> you can get the node assignment from LSF, and then use srun -m hostfile
> to force slurm to set up the rank-node mappings as you like.
> (note: not -m to LSF.)  did you try that?
>

yes it is a HP-XC system and  I have tried using -m option to srun also.
*This is what I tried with a sample MPI Program that prints rank on node*

*#include "stdio.h"
#include "mpi.h"*

*int main(int argc, char *argv[]) {*

*int ierr,rank,size,len;
char name[100];*

*MPI_Init(&argc, &argv);*

*MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Get_processor_name(name,&len);*

*printf("This is %d out of %d: %s \n", rank,size,name);
MPI_Finalize();*

*return 0;*

*}*

This was submitted to LSF using

* bsub -n 4 -e errfile -ext "SLURM[nodelist=n2,n1,n4,n3]"
/opt/hpmpi/bin/mpirun -srun -m hostfile ./a.out*

The environment variable SLURM_HOSTFILE was set to the hostfile with the
nodes on which the binary had to be run in the order n2,n1,n4,n3.

I got the following error in my error file:

*a.out: MPI_Init: node to rank map is not correct myrank :0 mynode:1
a.out: MPI_Init: node to rank map is not correct myrank :1 mynode:0
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
a.out: MPI_Init: node to rank map is not correct myrank :3 mynode:2
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
a.out: MPI_Init: Cannot set srun startup protocol
srun: error: n2: task0: Exited with exit code 1
a.out: MPI_Init: node to rank map is not correct myrank :2 mynode:3
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
srun: Terminating job*


-- 
Best Regards,
Balamurugan. R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20080221/443b57f0/attachment.html


More information about the Beowulf mailing list