Linux cpusets and HPC (was Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?)

Chris Samuel csamuel at vpac.org
Thu Aug 14 03:42:10 PDT 2008


----- "Paul Jackson" <pj at sgi.com> wrote:

Hi Paul,

> Let me see if I understand this.  Is the following right:
> 
>   Without the cpuset constraint, such a 'bad' job could tell the
>   cluster management software (PBS or Torque or ...)  it needed just
>   one CPU, which could end up putting it on a cluster node with
>   say eight CPUs, along with some other jobs that expect to use the
>   other seven CPUs.

It's the user that specifies how many CPUs their particular
problem will require, but effectively that's correct.

>   But then OpenMP code in that 'bad' job could notice it had eight
>   CPUs, think to itself 'wow - cool', and proceed to hog all eight
>   CPUs, messing up those other jobs.

That's correct, some OpenMP codes automatically detect the
number of cores in a system and will (if not told otherwise)
use them all.

Alternatively some users forget they've reduced the number
of cores they've said the job needs and have (in a config
file say) still a larger number specified.

>   With the cpuset constraint, that 'bad' job -will- only be able to
>   use that one CPU, and if OpenMP or other code in that job can't
>   deal reasonably with that circumstance, well, tough, the owner of
>   that job should fix something.

Well, "tough" might be a tad hard on them, but yes.

>  But at least the other jobs that were
>  hoping to use the other seven CPUs won't be bothered much by this.

Spot on.

> Did I say that right?

That's a pretty fair summary for our main use case, yes.

The memory locality possibilities are also important,
just not currently covered by Torque and will probably
require more smarts in the pbs_mom and how it detects
core/socket relationships, locality, hyperthreading/SMT,
etc..  It will also change how it reports that to the
pbs_server and then how the scheduler (Maui or Moab)
allocates resources to jobs based upon policies and
what the job has requested.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency



More information about the Beowulf mailing list