[Beowulf] Scheduler question -- non-uniform memory allocation to MPI

Tom Harvill unl at harvill.net
Thu Jul 30 08:34:44 PDT 2015



Hi,

We run SLURM with cgroups for memory containment of jobs.  When users 
request
resources on our cluster many times they will specify the number of 
(MPI) tasks and
memory per task.  The reality of much of the software that runs is that 
most of the
memory is used by MPI rank 0 and much less on slave processes.  This is 
wasteful
and sometimes causes bad outcomes (OOMs and worse) during job runs.

AFAIK SLURM is not able to allow users to request a different amount of 
memory
for different processes in their MPI pool.  We used to run Maui/Torque 
and I'm fairly
certain that feature is not present in that scheduler either.

Does anyone know if any scheduler allows the user to request different 
amounts of
memory per process?  We know we can move to whole-node assignment to remedy
this problem but there is resistance to that...

Thank you!
Tom

Tom Harvill
Holland Computing Center
hcc.unl.edu


More information about the Beowulf mailing list