[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

Scott Atchley e.scott.atchley at gmail.com
Sat Jun 9 08:22:07 PDT 2018


Hi Chris,

We have looked at this _a_ _lot_ on Titan:

A Multi-faceted Approach to Job Placement for Improved Performance on
Extreme-Scale Systems

https://ieeexplore.ieee.org/document/7877165/

This issue we have is small jobs "inside" large jobs interfering with the
larger jobs. The item that is easy to implement with our scheduler was
"Dual-Ended Scheduling". We set a threshold of 16 nodes to demarcate small.
Jobs using more than 16 nodes, schedule from the top/front of the list and
smaller schedule from the bottom/back of the list.

Scott

On Sat, Jun 9, 2018 at 2:56 AM, Chris Samuel <chris at csamuel.org> wrote:

> On Saturday, 9 June 2018 12:39:02 AM AEST Bill Abbott wrote:
>
> > We set PriorityFavorSmall=NO and PriorityWeightJobSize to some
> > appropriately large value in slurm.conf, which helps.
>
> I guess that helps getting jobs going (and we use something similar), but
> my
> question was more about placement.   It's a hard one..
>
> --
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180609/8341f114/attachment.html>


More information about the Beowulf mailing list