[Beowulf] Using LSF
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
bala balahindustani at gmail.comWed Feb 20 20:55:59 PST 2008
- Previous message: [Beowulf] Using LSF
- Next message: [Beowulf] python2.4 error when loose MPICH2 TI with Grid Engine
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2/21/08, Mark Hahn <hahn at mcmaster.ca> wrote: > > > submit the jobs through a job scheduler (LSF in this case). We used the > > machinefile option with mpirun to order the nodes on which the processes > has > > to be started. > > > > But i am not able to do this with the current setup where LSF is used > for > > scheduling and SLURM for resource management. > > I have tried a few of the options like using the -m options to bsub for > > specifying the preference and so on. But of no success. > > this sounds like our HP-XC systems. but I'm a bit mystified: > you can get the node assignment from LSF, and then use srun -m hostfile > to force slurm to set up the rank-node mappings as you like. > (note: not -m to LSF.) did you try that? > yes it is a HP-XC system and I have tried using -m option to srun also. *This is what I tried with a sample MPI Program that prints rank on node* *#include "stdio.h" #include "mpi.h"* *int main(int argc, char *argv[]) {* *int ierr,rank,size,len; char name[100];* *MPI_Init(&argc, &argv);* *MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Get_processor_name(name,&len);* *printf("This is %d out of %d: %s \n", rank,size,name); MPI_Finalize();* *return 0;* *}* This was submitted to LSF using * bsub -n 4 -e errfile -ext "SLURM[nodelist=n2,n1,n4,n3]" /opt/hpmpi/bin/mpirun -srun -m hostfile ./a.out* The environment variable SLURM_HOSTFILE was set to the hostfile with the nodes on which the binary had to be run in the order n2,n1,n4,n3. I got the following error in my error file: *a.out: MPI_Init: node to rank map is not correct myrank :0 mynode:1 a.out: MPI_Init: node to rank map is not correct myrank :1 mynode:0 a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format a.out: MPI_Init: Cannot set srun startup protocol a.out: MPI_Init: node to rank map is not correct myrank :3 mynode:2 a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format a.out: MPI_Init: Cannot set srun startup protocol a.out: MPI_Init: Cannot set srun startup protocol srun: error: n2: task0: Exited with exit code 1 a.out: MPI_Init: node to rank map is not correct myrank :2 mynode:3 a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format a.out: MPI_Init: Cannot set srun startup protocol srun: Terminating job* -- Best Regards, Balamurugan. R -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080221/443b57f0/attachment.html
- Previous message: [Beowulf] Using LSF
- Next message: [Beowulf] python2.4 error when loose MPICH2 TI with Grid Engine
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
