[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

Chris Samuel chris at csamuel.org
Mon Jun 11 21:33:20 PDT 2018


Hi Prentice!

On Tuesday, 12 June 2018 4:11:55 AM AEST Prentice Bisbal wrote:

> I to make this work, I will be using job_submit.lua to apply this logic
> and assign a job to a partition. If a user requests a specific partition
> not in line with these specifications, job_submit.lua will reassign the
> job to the appropriate QOS.

Yeah, that's very much like what we do for GPU jobs (redirect them to the 
partition with access to all cores, and ensure non-GPU jobs go to the 
partition with fewer cores) via the submit filter at present..

I've already coded up something similar in Lua for our submit filter (that only 
affects my jobs for testing purposes) but I still need to handle memory 
correctly, in other words only pack jobs when the per-task memory request * 
tasks per node < node RAM (for now we'll let jobs where that's not the case go 
through to the keeper for Slurm to handle as now).

However, I do think Scott's approach is potentially very useful, by directing 
jobs < full node to one end of a list of nodes and jobs that want full nodes 
to the other end of the list (especially if you use the partition idea to 
ensure that not all nodes are accessible to small jobs).

cheers!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



More information about the Beowulf mailing list