[Beowulf] Suggestions for scheduling software
smulcahy at aplpi.com
Tue Aug 19 02:20:50 PDT 2008
Up to now we've been working with a 20 node cluster where we'd have the
luxury of working without any scheduling or queuing software - the
cluster is pretty much dedicated to running a single job and is manually
invoked with mpirun.
We're moving to a much larger cluster in the near future and are keen to
keep the utilisation as high as possible. On the new cluster we have to
to run 2 distinct jobs - one is a long-running (weeks or possibly
months) job and the other is a regular short running job (running in a
few hours) which has to run at a specific time each day.
We're currently looking at using SLURM for queuing up jobs on the system
but I'm not sure if it will meet all of our needs here. Ideally, we'd
have some system that would allow us to queue up the long-running job
and a series of short-running jobs and the system would automatically
suspend the long-running job when the short-running job is due to start,
run the short-run job and then restart the long-running job.
I expect we're not the only ones in this situation. Is SLURM the right
tool for this job? If not, can anyone recommend other tools out there,
preferably open source?
Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center,
GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com
Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway)
More information about the Beowulf