[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

Skylar Thompson skylar.thompson at gmail.com
Sat Jun 9 08:48:18 PDT 2018


We're a Grid Engine shop, and we have the execd/shepherds place each job in
its own cgroup with CPU and memory limits in place. This lets our users
make efficient use of our HPC resources whether they're running single-slot
jobs, or multi-node jobs. Unfortunately we don't have a mechanism to limit
network usage or local scratch usage, but the former is becoming less of a
problem with faster edge networking, and we have an opt-in bookkeeping mechanism 
for the latter that isn't enforced but works well enough to keep people
happy.

On Fri, Jun 08, 2018 at 05:21:56PM +1000, Chris Samuel wrote:
> Hi all,
> 
> I'm curious to know what/how/where/if sites do to try and reduce the impact of 
> fragmentation of resources by small/narrow jobs on systems where you also have 
> to cope with large/wide parallel jobs?
> 
> For my purposes a small/narrow job is anything that will fit on one node 
> (whether a single core job, multi-threaded or MPI).
> 
> One thing we're considering is to use overlapping partitions in Slurm to have 
> a subset of nodes that are available to these types of jobs and then have 
> large parallel jobs use a partition that can access any node.
> 
> This has the added benefit of letting us set a higher priority on that 
> partition to let Slurm try and place those jobs first, before smaller ones.
> 
> We're already using a similar scheme for GPU jobs where they get put into a 
> partition that can access all 36 cores on a node whereas non-GPU jobs get put 
> into a partition that can only access 32 cores on a node, so effectively we 
> reserve 4 cores a node for GPU jobs.
> 
> But really I'm curious to know what people do about this, or do you not worry 
> about it at all and just let the scheduler do its best?
> 
> All the best,
> Chris
> -- 
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Skylar


More information about the Beowulf mailing list