[Beowulf] Re: python2.4 error when loose MPICH2 TI with Grid Engine
reuti at staff.uni-marburg.de
Sun Mar 2 01:45:06 PST 2008
Am 22.02.2008 um 09:23 schrieb Sangamesh B:
> Dear Reuti & members of beowulf,
> I need to execute a parallel job thru grid engine.
> MPICH2 is installed with Process Manager:mpd.
> Added a parallel environment MPICH2 into SGE:
> $ qconf -sp MPICH2
> pe_name MPICH2
> slots 999
> user_lists NONE
> xuser_lists NONE
> start_proc_args /share/apps/MPICH2/startmpi.sh -catch_rsh
> stop_proc_args /share/apps/MPICH2/stopmpi.sh
> allocation_rule $pe_slots
> control_slaves FALSE
> job_is_first_task TRUE
> urgency_slots min
> Added this PE to the default queue: all.q.
> mpdboot is done. mpd's are running on two nodes.
> The script for submitting this job thru sge is:
> $ cat subsamplempi.sh
> #$ -S /bin/bash
> #$ -cwd
> #$ -N Samplejob
> #$ -q all.q
> #$ -pe MPICH2 4
> #$ -e ERR_$JOB_NAME.$JOB_ID
> #$ -o OUT_$JOB_NAME.$JOB_ID
> /opt/MPI_LIBS/MPICH2-GNU/bin/mpirun -np $NSLOTS -machinefile
> $TMP_DIR/machines ./samplempi
> echo "Executed"
> exit 0
> The job is getting submitted, but not executing. The error and
> output file contain:
> cat ERR_Samplejob.192
> /usr/bin/env: python2.4: No such file or directory
> $ cat OUT_Samplejob.192
> -catch_rsh /opt/gridengine/default/spool/compute-0-0/active_jobs/
> Fri Feb 22 12:57:18 IST 2008
> So the problem is coming for python2.4.
> $ which python2.4
> I googled this error. Then created a symbolic link:
> # ln -sf /opt/rocks/bin/python2.4 /bin/python2.4
> After this also same error is coming.
> I guess the problem might be different. i.e. gridengine might not
> getting the link to running mpd.
> And the procedure followed by me to configure PE might be wrong.
> So, I expect from you to clear my doubts and help me to resolve
> this error.
> 1. Is the PE configuration of MPICH2 + grid engine right?
if you want to integrate MPICH2 with MPD it's similar to a PVM setup.
The daemons must be started in start_proc_args on every node with a
dedicated port number per job. You don't say what your startmpi.sh is
> 2. Without Tight integration, is there a way to run a MPICh2(mpd)
> based job using gridengine?
> 3. In smpd-daemon based and daemonless MPICH2 tight integration,
> which one is better?
Depends: if you have just one mpirun per job which will run for days,
I would go for the daemonless startup. But if you issue many mpirun
calls in your jobscript which will just run for seconds I would go
for the daemon based startup, as the mpirun will be distributed to
the slaves faster.
> 4. Can we do mvapich2 tight integration with SGE? Any differences
> with process managers wrt MVAPICH2?
Maybe, if the startup is similar to standard MPICH2.
> Thanks & Best Regards,
> Sangamesh B
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf