[Beowulf] Again about NUMA (numactl and taskset)
hahn at mcmaster.ca
Mon Jun 23 08:25:28 PDT 2008
> The questions are
> 1) Is there some way to distribute analogously the local memory of threads (I
> assume that it have the same size for each thread) using "reasonable" NUMA
> allocation ?
that is, not surprisingly, the default. generally, on all NUMA machines,
the starting rule is that memory is allocated for a thread upon "first
touch". that is, the first thread to touch it, causing a page fault and
triggering the actual allocation. (if you allocate memory but never
touch it, it remains purely virtual, ignoring any book-keeping by your
memory allocation library, if any.)
> 2) Is it right that using of numactl for applications may gives improvements
> of performance for the following case:
> the number of application processes is equal to the number of cores of one
> CPU *AND* the necessary (for application) RAM amount may be placed on one
> node DIMMs (I assume that RAM is allocated "continously").
you certainly don't want to _deliberately_ create imbalances.
"numactl --hardware" is interesting to see the state of memory allocation.
of course, it reflects only size and free (where free means "wasted" to the
kernel, not the same as "freeable".)
> What will be w/performance (at numactl using) for the case if RAM size
> required is higher than RAM available per one node, and therefore the program
> will not use the possibility of (load balanced) simultaneous using of memory
> controllers on both CPUs ?
non-local memory is modestly slower than local - not dramatically.
> (I also assume also that RAM is allocated
I'm not sure what that means - continuously in time? or contiguously?
the latter is definitely not true - the allocated memory map for a task
will normally be pretty chopped up, and the virtual addresses will have
little relation to physical addresses.
> 3) Is there some reason to use things like
> mpirun -np N /usr/bin/numactl <numactl_parameters> my_application ?
not that I know.
> 4) If I use malloc() and don't use numactl, how to understand - from which
> node Linux will begin the real memory allocation ? (I remember that I assume
if there is free memory on the node where the thread is running,
that's where the physical page will be allocated.
> that all the RAM is free) And how to understand where are placed the DIMMs
> which will corresponds to higher RAM addresses or lower RAM addresses ?
I don't see why userspace would need to know that. the main question is
whether non-local allocations are allowed or not, and you set that policy
with numactl --localalloc (or override with --preferred, etc)
> 5) In which cases is it reasonable to switch on "Node memory interleaving"
> (in BIOS) for the application which uses more memory than is presented on the
> node ?
I leave it off, since numactl --interleave lets you get the same effect
from user-space. I'm not sure I've ever seen it be a win.
> And BTW: if I use taskset -c CPU1,CPU2, ... <program_file>
> and the program_file creates some new processes, will all this processes run
> only on the same CPUs defined in taskset command ?
afaik, scheduler settings like this are indeed inherited across clone,
possibly also fork.
More information about the Beowulf