[Beowulf] SGE + policy

Guy Coates gmpc at sanger.ac.uk
Thu May 27 09:22:17 PDT 2004


> Concern:  That long running jobs will get into the queue (probably SGE
> managed queue) and starve the short running jobs for either licenses or
> CPUs or both.  Students won't be able to finish their homework in a
> timely way because long running jobs de facto hog the resource once they
> are given a license/CPU.

You are screwed, but not for the reason you think.

This is not a scheduler issue; it's a flexlm limitation.  Most queueing
systems support pre-emption, when the scheduler will suspend long running
jobs to free up job slots/CPUs in favour of short high priority jobs.

Unfortunately, as far as flexlm is concerned, your suspended long job has
still checked out a license, so you get license starvation, and your short
jobs will not be able to start.

The only way around this limitation would be to checkpoint and kill the
long job, so the license get returned. (Can you checkpoint matlab...?)

Cheers,

Guy Coates

-- 
Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
Tel: +44 (0)1223 834244 ex 7199






More information about the Beowulf mailing list