Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf]

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

William Burke wburke999 at msn.com
Sat Mar 26 18:31:22 PST 2005


Reuti,

 

>> I'd suggest to move over to the SGE users list at: 

>> http://gridengine.sunsource.net/servlets/ProjectMailingListList

 

I have but I do not see my name yet? How long is the verification process?

 

>> Although there is a special Myrinet directory, you can also try to use 

>> the files in the mpi directory instead.

 

The mpi directory's mpich.template doesn't use mpirun.ch_gm so how does it
know what version of mpirun to use? If I use the mpi what changes do I have
to make?

 

>> Can you please give more details of your queue and PE setup (qconf -sq/sp


>> output

 

SEE BELOW

 

>> Do you have an admin account for SGE? I'd prefer not to do anything in 

>> SGE as root.

 

Yes, its grid... SEE BELOW

 

>> Not really an issue: you have to make a small change to the 

>> mpirun.ch_gm.pl to make all jobs staying in the same process group to get


>> them correctly killed in case of a jobb abort:

 

I have to double check that in:

http://gridengine.sunsource.net/howto/mpich-integration.html

 

Here is the new problem I have this situation in the PE:

 

My jobs won't run when I run my script it goes into pending mode for about
10 sec (status qw), SGE submits to N number of hosts (status t), jobs hangs
in a status t mode, then quickly exit. When I investigated both the
Jobscript_name.{pe|po}JobID output it states that SGE can't make links in
the 

/WEMS/grid/tmp/549.1.Production.q/ directory. 

 

It looks like the startmpi.sh script links files in $TMPDIR and from my
understanding the value of $TMPDIR 

is derived from the tmpdir parameter in the queue's configuration. I have
designated this attribute as 

'/WEMS/grid/tmp/' but according to the errorlog qsub_wrf.sh.pe549 it is
'/WEMS/grid/tmp/549.1.Production.q/' 

Possibly the source of the problem is here, so what created the
'549.1.Production.q' addendum?  

 

I then checked the permission of /WEMS/grid/tmp

 

[wems at wems grid]$ ls -ltr /WEMS/grid | grep tmp

drwxrwxrwx    2 root     root         4096 Mar 26 17:34 tmp

 

As a sanity check within the startmpi.sh I echo out the ls -ltr of $TMPDIR:

 

drwxr-xr-x    2 65534    65534        4096 Mar 26  2005 549.1.Production.q

 

as expected there is no UID/GID that is 65534 in my /etc/passwd. Furthermore
there are only write permission 

for UID/GID 65534 so if it(N1GE) is the only one writing and reading this
directory what else could be 

preventing the writing into that directory? I thought maybe there was a lock
file in /WEMS/grid/tmp so I checked..

 

[wems at wems tmp]$ ls -al /WEMS/grid/tmp

total 8

drwxrwxrwx    2 root     root         4096 Mar 26 17:34 .

drwxr-xr-x   22 grid     grid         4096 Mar 26 04:20 ..

 

No Avail, so I am out of solutions. Is this a known issue when using
myrinet, mpich, tight integration or am I overlook something?? I am using
the sge_mpirun script instead of mpirun script. Have you seen any problem
like this before?

I also suspect that the editor may be reading the PE mpich configuration
file's argument start_proc_args incorrectly since the editor wraps the
string of argument around to the next line according to  the
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs file. In CHECK 5 it says
"The mpirun command "\" does not exist"

 

SEE CHECK 5, CHECK 8, then CHECK 7 BELOW.

 

Oh yeah, this may be a silly question but where does SGE get $pe_hostfile,
$TMPDIR from and what is the process of how it acquires these variables? I
would like some clarification.

 

Thanks,

William

 

Things that I checked

CHECK 0.5

 

[root at wems wrfprd]# cat qsub_wrf.sh

#!/bin/sh

#$ -S /bin/ksh

#$ -pe mpich 32

#$ -l h_rt=10800

#$ -q Production.q

#

#. /WEMS/wems/external/WRF/wrfsi/etc/setup-mpi.sh

cd /WEMS/wems/data/WRF/wni001a/wrfprd

echo 'This is the job ID '$JOB_ID >
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs

echo 'This is the pe_hostfile '$PE_HOSTFILE >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

echo 'This is the tmpdir '$TMPDIR >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

/WEMS/grid/mpi/myrinet/sge_mpirun
/WEMS/wems/external/WRF/wrfsi/../run/wrf.exe >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs 2>&1

 

 

CHECK 1

 

[wems at wems wems]$ qsub -pe mpich 32 -P test -q Production.q
/WEMS/wems/data/WRF/wni001a/wrfprd/qsub_wrf.sh

 

CHECK 2

 

[wems at wems grid]$ cat qsub_wrf.sh.pe549

ln: creating symbolic link `/WEMS/grid/tmp/549.1.Production.q/mpirun.sge' to


`/WEMS/pkgs/mpich-gm-1.2.6.14a/bin/mpirun.ch_gm': Permission denied

/WEMS/grid/mpi/myrinet/startmpi.sh[142]: cannot create
/WEMS/grid/tmp/549.1.Production.q/machines: Permission denied

cat: /WEMS/grid/tmp/549.1.Production.q/machines: No such file or directory

ln: creating symbolic link `/WEMS/grid/tmp/549.1.Production.q/rsh' to
`/WEMS/grid/mpi/rsh': Permission denied

 

CHECK 3

 

[wems at wems grid]$ cat qsub_wrf.sh.po549

-catch_rsh /WEMS/grid/wems-hosts2
/WEMS/pkgs/mpich-gm-1.2.6.14a/bin/mpirun.ch_gm

this is the value of mpirun  /WEMS/pkgs/mpich-gm-1.2.6.14a/bin/mpirun.ch_gm

I am doing a ls -ltr on $TMPDIR

total 4

drwxr-xr-x    2 65534    65534        4096 Mar 26  2005 549.1.Production.q

Machine file is /WEMS/grid/tmp/549.1.Production.q/machines

 

CHECK 4

 

[wems at wems grid]$ cat Queue-config

qname                 Production.q

hostlist              @Parallel

seq_no                0

load_thresholds       np_load_avg=1.75

suspend_thresholds    NONE

nsuspend              1

suspend_interval      00:05:00

priority              0

min_cpu_interval      00:05:00

processors            2

qtype                 BATCH

ckpt_list             NONE

pe_list               mpich

rerun                 FALSE

slots                 2

tmpdir                /WEMS/grid/tmp

shell                 /bin/ksh

prolog                NONE

epilog                NONE

shell_start_mode      posix_compliant

starter_method        NONE

suspend_method        NONE

resume_method         NONE

terminate_method      NONE

notify                00:00:60

owner_list            NONE

user_lists            Test_A

xuser_lists           NONE

subordinate_list      NONE

complex_values        NONE

projects              test

xprojects             NONE

calendar              NONE

initial_state         default

s_rt                  INFINITY

h_rt                  INFINITY

s_cpu                 INFINITY

h_cpu                 INFINITY

s_fsize               INFINITY

h_fsize               INFINITY

s_data                INFINITY

h_data                INFINITY

s_stack               INFINITY

h_stack               INFINITY

s_core                INFINITY

h_core                INFINITY

s_rss                 INFINITY

h_rss                 INFINITY

s_vmem                INFINITY

h_vmem                INFINITY

 

CHECK 5

 

[wems at wems grid]$ cat mpich-PE-config

pe_name           mpich

slots             78

user_lists        Test_A

xuser_lists       NONE

start_proc_args   /WEMS/grid/mpi/myrinet/startmpi.sh -catch_rsh  \

                  /WEMS/grid/wems-hosts2  \

                  /WEMS/pkgs/mpich-gm-1.2.6.14a/bin/mpirun.ch_gm

stop_proc_args    /WEMS/grid/mpi/myrinet/stopmpi.sh

allocation_rule   $fill_up

control_slaves    TRUE

job_is_first_task FALSE

urgency_slots     min

 

CHECK 6

 

[wems at wems wems]# cat /WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

 

This is the pe_hostfile
/WEMS/grid/default/spool/wems18/active_jobs/388.1/pe_hostfile

This is the tmpdir /WEMS/grid/tmp/388.1.Production.q

This is the pe_hostfile
/WEMS/grid/default/spool/wems07/active_jobs/389.1/pe_hostfile

This is the tmpdir /WEMS/grid/tmp//389.1.Production.q

This is the pe_hostfile
/WEMS/grid/default/spool/wems24/active_jobs/390.1/pe_hostfile









This is the tmpdir /WEMS/grid/tmp/398.1.Production.q

This is the pe_hostfile
/WEMS/grid/default/spool/wems22/active_jobs/549.1/pe_hostfile

This is the tmpdir /WEMS/grid/tmp/549.1.Production.q

This is the pe_hostfile

This is the tmpdir

 

CHECK 7

 

[wems at wems wems]$ cat /WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs

This is the job ID 549

The mpirun command "\" does not exist

There must be a problem with the mpich parallel environment

 

CHECK 8

 

[root at wems wrfprd]# cat qsub_wrf.sh

#!/bin/sh

#$ -S /bin/ksh

#$ -pe mpich 32

#$ -l h_rt=10800

#$ -q Production.q

#

#. /WEMS/wems/external/WRF/wrfsi/etc/setup-mpi.sh

cd /WEMS/wems/data/WRF/wni001a/wrfprd

echo 'This is the job ID '$JOB_ID >
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs

echo 'This is the pe_hostfile '$PE_HOSTFILE >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

echo 'This is the tmpdir '$TMPDIR >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

/WEMS/grid/mpi/myrinet/sge_mpirun
/WEMS/wems/external/WRF/wrfsi/../run/wrf.exe >> 

/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs 2>&1

exit

 

 

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Wednesday, March 23, 2005 6:26 PM
To: William Burke
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf]

 

Hi,

 

I'd suggest to move over to the SGE users list at: 

http://gridengine.sunsource.net/servlets/ProjectMailingListList

 

But anyway, let's sort the things out:

 

Quoting William Burke <wburke999 at msn.com>:

 

> I can't get PE to work on a 50 node class II Beowulf. It has a front-end

> Sunfire v40 (qmaster host) and 49 Sunfire v20s (execution hosts) running

> Linux configured to communicate data over Myrinet using MPICH-GM version

> 1.26.14a. 

 

Although there is a special Myrinet directory, you can also try to use the 

files in the mpi directory instead.

 

> These are the requirements of the N1GE environment to handle: 

> 

> 1.  Serial type jobs for pre-processing the data - average runtime 15

> minutes. 

> 2.  Output is pipelined into parallel processing jobs - range of runtime

> 1- 6 hours. 

> 3.  Concurrently running is post-processing serial jobs. 

> 

> I have setup a Parallel Environment called mpich-gm and a straight-forward

> FIFO scheduling schema for testing. When I submit parallel jobs they hang

> in

> limbo in a 'qw' state pending submission. I am not sure why the scheduler

> does not see jobs that I submit.  

> 

>  

> 

> I used the myrinet mpich template located $SGE_ROOT/< sge_cell

> >/mpi/myrinet

> directory to configure the pe (parallel environment) plus I copied the

> sge_mpirun script to the $SGE_ROOT/< sge_cell >/bin directory.  I

> configured

> a Production.q queue that runs only parallel jobs. As a last sanity check
I

> ran a trace on the scheduler, submitted a simple parallel job, and this is

> the results that I got from the logs:

 

Can you please give more details of your queue and PE setup (qconf -sq/sp 

output).

 

> JOB RUN Window

> 

> [wems at wems examples]$ qsub -now y -pe mpich-gm 1-4 -b y hello++

> 

> Your job 277 ("hello++") has been submitted.

> 

> Waiting for immediate job to be scheduled.

> 

>  

> 

> Your qsub request could not be scheduled, try again later.

> 

> [wems at wems examples]$ qsub -pe mpich-gm 1-4 -b y hello++

> 

> Your job 278 ("hello++") has been submitted.

> 

> [wems at wems examples]$ qsub -pe mpich-gm 1-4 -b y hello++

> 

> Your job 279 ("hello++") has been submitted.

 

You can't start a parallel job this way, as there is no mpirun used. When
you 

used your mentioned script, you get the same behavior (and there you used 

mpirun -np $NSLOTS ...)?

 

> This is the 2nd window SCHEDULER LOG

> 

> [root at wems bin]# qconf -tsm

> 

> [root at wems bin]# qconf -tsm

> 

> [root at wems bin]# cat /WEMS/grid/default/common/schedd_runlog

> 

> Wed Mar 23 06:08:55 2005|-------------START-SCHEDULER-RUN-------------

> 

> Wed Mar 23 06:08:55 2005|queue instance "all.q at wems10.grid.wni.com"
dropped

> because it is temporarily not available

> 

> Wed Mar 23 06:08:55 2005|queue instance "Production.q at wems10.grid.wni.com"

> dropped because it is temporarily not available

> 

> Wed Mar 23 06:08:55 2005|queues dropped because they are temporarily not

> available: all.q at wems10.grid.wni.com Production.q at wems10.grid.wni.com

> 

> Wed Mar 23 06:08:55 2005|no pending jobs to perform scheduling on

> 

> Wed Mar 23 06:08:55 2005|--------------STOP-SCHEDULER-RUN-------------

> 

> Wed Mar 23 06:11:37 2005|-------------START-SCHEDULER-RUN-------------

> 

> Wed Mar 23 06:11:37 2005|queue instance "all.q at wems10.grid.wni.com"
dropped

> because it is temporarily not available

> 

> Wed Mar 23 06:11:37 2005|queue instance "Production.q at wems10.grid.wni.com"

> dropped because it is temporarily not available

> 

> Wed Mar 23 06:11:37 2005|queues dropped because they are temporarily not

> available: all.q at wems10.grid.wni.com Production.q at wems10.grid.wni.com

> 

> Wed Mar 23 06:11:37 2005|no pending jobs to perform scheduling on

> 

> Wed Mar 23 06:11:37 2005|--------------STOP-SCHEDULER-RUN-------------

> 

> [root at wems bin]# qstat

> 

> job-ID prior   name       user         state submit/start at     queue

> slots ja-task-ID

> 

>
----------------------------------------------------------------------------

> -------------------------------------

> 

>     279 0.55500 hello++    wems         qw    03/23/2005 06:11:43

> 1

> 

> [root at wems bin]#

 

Do you have an admin account for SGE? I'd prefer not to do anything in SGE
as 

root.

 

> BTW that node wems10.grid.wni.com has connectivity issues and I have not

> removed it from the cluster queue.  

> 

>  

> 

> What causes this type of problem in N1GE to return "no pending jobs to

> perform scheduling on" in the schedd_runlog even though there are
available

> slots ready to take jobs?  

> 

> I had no problem submitting serial jobs, only the parallel jobs resulted
as

> such. Are there N1GE - Myrinet issue that I am not aware of?  FYI the same

> binary (hello++) runs with no problems from the command line.

 

If you just start hello++, it will not run in parallel I think.

 

Not really an issue: you have to make a small change to the mpirun.ch_gm.pl
to 

make all jobs staying in the same process group to get them correctly killed
in 

case of a jobb abort:

 

http://gridengine.sunsource.net/howto/mpich-integration.html

 

> Since I generally run scripts from qsub instead of binaries I created a

> script to run the mpich executable but that yield the same result.

> 

>  

> 

> I have an additional question regarding setting a queue.conf parameter

> called "subordinate_list". How is it read from the result of qconf -mq

> <queue_name>?

> 

> Example 

> 

>             i.e., subordinate_list     low_pri.q=5,small.q.

 

The queue "low_pri.q" will be suspended, when 5 or more slots of
"<queue_name>" 

are filled. The "small.q" will be suspened, if all slots of "<queue_name>"
are 

filled.

 

Cheers - Reuti

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20050326/c2adc1f8/attachment.html


More information about the Beowulf mailing list