[Beowulf] Avoiding/mitigating fragmentation of systems by small jobs?

Skylar Thompson skylar.thompson at gmail.com
Tue Jun 12 05:15:26 PDT 2018


On Tue, Jun 12, 2018 at 02:28:25PM +1000, Chris Samuel wrote:
> On Sunday, 10 June 2018 1:48:18 AM AEST Skylar Thompson wrote:
> 
> > Unfortunately we don't have a mechanism to limit
> > network usage or local scratch usage
> 
> Our trick in Slurm is to use the slurmdprolog script to set an XFS project
> quota for that job ID on the per-job directory (created by a plugin which
> also makes subdirectories there that it maps to /tmp and /var/tmp for the
> job) on the XFS partition used for local scratch on the node.
> 
> If they don't request an amount via the --tmp= option then they get a default
> of 100MB.  Snipping the relevant segments out of our prolog...
> 
> JOBSCRATCH=/jobfs/local/slurm/${SLURM_JOB_ID}.${SLURM_RESTART_COUNT}
> 
> if [ -d ${JOBSCRATCH} ]; then
>         QUOTA=$(/apps/slurm/latest/bin/scontrol show JobId=${SLURM_JOB_ID} | egrep MinTmpDiskNode=[0-9] | awk -F= '{print $NF}')
>         if [ "${QUOTA}" == "0" ]; then
>                 QUOTA=100M
>         fi
>         /usr/sbin/xfs_quota -x -c "project -s -p ${JOBSCRATCH} ${SLURM_JOB_ID}" /jobfs/local
>         /usr/sbin/xfs_quota -x -c "limit -p bhard=${QUOTA} ${SLURM_JOB_ID}" /jobfs/local

Thanks, Chris! We've been considering doing this with GE prolog/epilog
scripts (and boot-time logic to clean up if a node dies with scratch space
still allocated) but haven't gotten around to it. I think we might also
need to get buy-in from some groups that are happy with the unenforced
state right now.

-- 
Skylar


More information about the Beowulf mailing list