[Beowulf] NUMA info request

Wed Mar 26 15:12:12 PDT 2008

----- "Mark Kosmowski" <mark.kosmowski at gmail.com> wrote:

> This discussion as well as reading about NUMA and affinity elsewhere
> leads to another question - what is the difference between using
> numactl or using the affinity options of my parallelization software
> (in my case openmpi)?

Our experiences with MVAPICH2 was unpleasant because of
our workload (lots of users with their own codes doing
very different - and occasionally wrong - things).

It has a naive implementation where if you have say
an 8 way system and run 2 x 4-way MPI jobs on it then
this happens:

Job 1 starts and each MPI thread sets affinity starting
from core 0, so it allocates core 0, 1, 2 and 3.

Job 2 starts and each MPI thread sets affinity starting
from core 0, so it allocates core 0, 1, 2 and 3.

This is a Bad Thing(tm).

Suddenly the user wonders why their performance has just
halved.  The sysadmin looks at the node and wonders
what the code is doing wrong if the load average is
so high but there's so much idle CPU time available.

This is why I'm a big fan of the queueing system doing
this for you, especially if it can (as Torque does when
you use a compatible MPI job launcher) allocate each
MPI process onto its own core.

Of course it's game over if you're using a legacy MPI
launcher that uses ssh or rsh. :-(

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency