[Beowulf] Please help to setup Beowulf
kilian.cavalotti.work at gmail.com
Thu Feb 19 01:33:00 PST 2009
On Wednesday 18 February 2009 16:30:37 Mark Hahn wrote:
> if SGE has to keep re-reading user files, this suggests to me that its
> design is poor. it's obvious to me that a scheduler should be based on
> a production-quality DB, for instance, and should clearly give some
> thought to performance when there are many runnable jobs.
IIRC, LSF, which to me qualifies as a production-quality scheduler, stores its
jobs status in text files, parses them every time it needs it, and also reads
each job definition right before starting them, because it has no way of
knowing if the resource requirements are the same as the previous job's,
unless they're part of an array.
(This was for versions 6.x, and may have changed since).
> is it? an array job, in my experience at least, is just syntactic sugar
> for submission.
And also for managing later on. It's much easier to issue a
and get details about the ones of your job you especially care, in the logical
order which is relevant to your problem, rather than having to store an
arbitrary list of random job numbers, which only make sense to the scheduler
> all the internals of job scheduling still have to operate,
Yes, but they only affect the job_ID, which is unique for your array, rather
than each individual job number. You can define the job array's indexes as you
want (as you submit them).
> though as you point out, the scheduler may take advantage of the fact that
> each sub-job has identical resource requirements. it still needs to
> dispatch each sub-job separately, each SJ may fail in unique ways, etc.
> the end, the submission and resource-matching code of the scheduler has
> gotten a break, but everything else works just as hard. consider, for
> instance, that flat naming schemes are conceptually simpler, but array jobs
> break that.
I'm not sure what you mean by that. For LSF at least, jobs can be given a
job_name, including job arrays.
> remember that the user still has to write some sort of script
> to evaluate whether each SJ worked, and recover from the failures.
Sure, job arrays won't make errors go away. But they will be easier to track,
because it's easier to see that you need to resubmit the first 3 jobs of your
array, rather than job 542323, job 542326 and job 543623. You just have one
job-ID to track.
You can get a quick summarized view of the status of your jobs in an array
without needing to feed the scheduler with a list of arbitrary job numbers.
You can also submit jobs with dependency conditions on the status of other
jobs, like "run job only if job succeeded". This is much harder to do
when you submit jobs individually, because you can't know job numbers in
Well, all this to say that array jobs can prove useful. :)
More information about the Beowulf