Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

DQS drops jobs on SuSE 6.3 cluster

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Kris Thielemans kris.thielemans at csc.mrc.ac.uk
Thu Nov 2 03:52:58 PST 2000


Hi,

I'm trying to get DQS running on our cluster of 4 SuSE 6.3 systems. I tried
3 different versions of DQS
- the RPM package on the original CD
- the RPM pakcage provide on the SuSE website to update it to fix a y2k
problem (version 3.2.7)
- the newest version  (3.3.1) from ftp.scri.fsu.edu (compiled from
sources)

All 3 versions have the same problem:
jobs are occasionally dropped from the queue, or even not started

Symptoms:
qsub somejob.sh   -> works ok
qstat -f                -> lists job

(a little bit later)
qstat -f                -> job gone

This happens with the simple dqs.sh example script that they provide for
testing.

There is NO error message in the dqs err_file, or anything in the log_file.

This problem also occurs when I disable all queues except 1 (on the same
node as the qmaster).


Any ideas?

Thanks,

Kris Thielemans

MRC Cyclotron Unit,
Hammersmith Hospital,
DuCane Rd,London W12 0NN, United Kingdom





More information about the Beowulf mailing list