[Beowulf] Running Different master and slave executables under MPI

J.Wood j.wood at qmul.ac.uk
Sun May 1 04:12:21 PDT 2005

  Hello  ,

             When I ran MPI on a cluster of SGI workstations , I was able to load  executable code

            onto the Master processor and different executable code loaded onto the Slave processors . I want to

            do something similar on my College Cluster parallel machine . This is a stand-alone parallel

            computer with about 150 individual processors (cn1 , cn2 ,  ......,  cn100 etc) .

            'fe03.esc.qmul.ac.uk' below is the name of the front-end for the parallel cluster .

             Below , I attach the QSUB file and the myprocgroup file I use .

            I also attach the error message from the system .Can you help ?.

               Best Regards ,

                Jim Wood

#specify the number of nodes requested and the
#          number of processors per node.

#PBS -l nodes=3:ppn=2,cput=5:00:00 -W x=\"NACCESSPOLICY:SINGLEJOB\"


echo "Allocated nodes are:"

echo "NUM PROC is: $NPROC"

cd /home/hep/wood/wmpi

 mpirun -p4pg myprocgroup nlb.m.cryp.mod

fe03.esc.qmul.ac.uk 0 /home/hep/wood/wmpi/nlb.m.cryp.mod
fe03.esc.qmul.ac.uk 1 /home/hep/wood/wmpi/nlb.s.cryp.mod
fe03.esc.qmul.ac.uk 1 /home/hep/wood/wmpi/nlb.s.cryp.mod
fe03.esc.qmul.ac.uk 1 /home/hep/wood/wmpi/nlb.s.cryp.mod
fe03.esc.qmul.ac.uk 1 /home/hep/wood/wmpi/nlb.s.cryp.mod
fe03.esc.qmul.ac.uk 1 /home/hep/wood/wmpi/nlb.s.cryp.mod
Allocated nodes are:
NUM PROC is: 6
rm_6571:  p4_error: rm_start: net_conn_to_listener failed: 32877
p0_9640:  p4_error: Child process exited while making connection to remote process on fe03.esc.qmul.ac.uk: 0
p0_9640: (18.562668) net_send: could not write to fd=4, errno = 32
P4 procgroup file is myprocgroup.

More information about the Beowulf mailing list