[Beowulf] Using LSF

bala balahindustani at gmail.com
Wed Feb 20 20:55:59 PST 2008


On 2/21/08, Mark Hahn <hahn at mcmaster.ca> wrote:
>
> > submit the jobs through a job scheduler (LSF in this case). We used the
> > machinefile option with mpirun to order the nodes on which the processes
> has
> > to be started.
> >
> > But i am not able to do this with the current setup where LSF is used
> for
> > scheduling and SLURM for resource management.
> > I have tried a few of the options like using the -m options to bsub for
> > specifying the preference and so on. But of no success.
>
> this sounds like our HP-XC systems.  but I'm a bit mystified:
> you can get the node assignment from LSF, and then use srun -m hostfile
> to force slurm to set up the rank-node mappings as you like.
> (note: not -m to LSF.)  did you try that?
>

yes it is a HP-XC system and  I have tried using -m option to srun also.
*This is what I tried with a sample MPI Program that prints rank on node*

*#include "stdio.h"
#include "mpi.h"*

*int main(int argc, char *argv[]) {*

*int ierr,rank,size,len;
char name[100];*

*MPI_Init(&argc, &argv);*

*MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Get_processor_name(name,&len);*

*printf("This is %d out of %d: %s \n", rank,size,name);
MPI_Finalize();*

*return 0;*

*}*

This was submitted to LSF using

* bsub -n 4 -e errfile -ext "SLURM[nodelist=n2,n1,n4,n3]"
/opt/hpmpi/bin/mpirun -srun -m hostfile ./a.out*

The environment variable SLURM_HOSTFILE was set to the hostfile with the
nodes on which the binary had to be run in the order n2,n1,n4,n3.

I got the following error in my error file:

*a.out: MPI_Init: node to rank map is not correct myrank :0 mynode:1
a.out: MPI_Init: node to rank map is not correct myrank :1 mynode:0
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
a.out: MPI_Init: node to rank map is not correct myrank :3 mynode:2
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
a.out: MPI_Init: Cannot set srun startup protocol
srun: error: n2: task0: Exited with exit code 1
a.out: MPI_Init: node to rank map is not correct myrank :2 mynode:3
a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format
a.out: MPI_Init: Cannot set srun startup protocol
srun: Terminating job*


-- 
Best Regards,
Balamurugan. R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080221/443b57f0/attachment.html>


More information about the Beowulf mailing list