[Beowulf] Tight MPICH2 Integration with SGE
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Sangamesh B forum.san at gmail.comFri Jan 25 06:41:20 PST 2008
- Previous message: [Beowulf] IB for smallish clusters
- Next message: [Beowulf] Tight MPICH2 Integration with SGE
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all,
I'm doing the Tight MPICH2 (not MPICH) Integration with SGE on a
cluster with, dual core dual AMD64 opteron processor.
Followed the sun document located at:
http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html
The document explains following three kinds of TI:
Tight Integration(TI) using Process Manager(PM): gforker
TI using PM: SMPD – Daemonless
TI using PM: SMPD – Daemonbased
I did the TI with gforker and tested it successfully.
But failed to do TI with daemonless-SMPD.
Let me explain what I did.
Installed the MPICH2 with smpd configuration.
The sge is installed at: /opt/gridengine
And created MPICH2-SM folder in /opt/gridengine/mpi by referring the
following lines from the document
start_proc_args /usr/sge/mpich2_smpd_rsh/startmpich2.sh -catch_rsh
$pe_hostfile
stop_proc_args /usr/sge/mpich2_smpd_rsh/stopmpich2.sh
Copied the startmpi.sh, stopmpi.sh from /opt/gridengine/mpi to
/opt/gridengine/mpi/MPICH2-SM dir, because nothing has given in the doc what
to include in these scripts.
Using qmon, created MPICH2-GF pe.
# qconf -sp MPICH2-SM
pe_name MPICH2-SM
slots 999
user_lists rootuserset
xuser_lists NONE
start_proc_args /opt/gridengine/mpi/MPICH2-SM/startmpich2sm.sh
stop_proc_args /opt/gridengine/mpi/MPICH2-SM/stopmpich2sm.sh
allocation_rule $round_robin
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min
Added this PE to default queue all.q.
Then submitted the job with following script:
# cat sgeSM.sh
#!/bin/sh
#$ -cwd
#$ -pe MPICH2-SM 4
#$ -e msge2.Err
#$ -o msge2.out
#$ -v MPI_HOME=/opt/MPI_LIBS/MPICH2-GNU/MPICH2-SM/bin
#$ -v MEME_DIRECTORY=/opt/MEME-MAX
$MPI_HOME/mpiexec -np 4 -machinefile /root/MFM /opt/MEME-MAX/bin/meme_p
/opt/MEME-MAX/NCCS/samevivo_sample.txt -dna -mod tcm -nmotifs 10 -nsites 100
-minw 5 -maxw 50 -revcomp -text -maxsize 200500
It gave following error:
# cat msge2.Err
startmpich2sm.sh: got wrong number of arguments
rm: cannot remove `/tmp/92.1.all.q/machines': No such file or directory
rm: cannot remove `/tmp/92.1.all.q/rsh': No such file or directory
I guess the problem might be with the scripts startmpich2sm.sh and
stopmpich2sm.sh.
Can any one guide me to resolve this issue..
Thanks & Regards,
Sangamesh
HPC Engineer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20080125/279bdef4/attachment.html
- Previous message: [Beowulf] IB for smallish clusters
- Next message: [Beowulf] Tight MPICH2 Integration with SGE
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
