[Beowulf] Again about NUMA (numactl and taskset)
kus at free.net
Mon Jun 23 11:24:59 PDT 2008
In message from Vincent Diepeveen <diep at xs4all.nl> (Mon, 23 Jun 2008
>I would add to this:
>"how sure are we that a process (or thread) that allocated and
> initialized and writes to memory at a single specific memory node,
>also keeps getting scheduled at a core on that memory node?"
>It seems to me that sometimes (like every second or so) threads jump
> from 1 memory node to another. I could be wrong,
>but i certainly have that impression with the linux kernels.
do I understand you correctly that simple using of taskset is not
enough to prevent process migration to other core/node ??
>That said, it has improved a lot, now all we need is a better
> compiler for linux. GCC is for my chessprogram generating an
>executable that gets 22% slower positions per second than visual c++
> 2005 is.
>On Jun 23, 2008, at 4:01 PM, Mikhail Kuzminsky wrote:
>> I'm testing my 1st dual-socket quad-core Opteron 2350-based server.
>> Let me assume that the RAM used by kernel and system processes is
>> zero, there is no physical RAM fragmentation, and the affinity of
>> processes to CPU cores is maintained. I assume also that both the
>> nodes are populated w/equal number of the same DIMMs.
>> If I run thread- parallelized (for example, w/OpenMP) application w/
>> 8 threads (8 = number of server CPU cores), the ideal case for all
>> the ("equal") threads is: the shared memory used by each of 2 CPUs
>> (by each of 2 processes "quads") should be divided equally between
>> 2 nodes, and the local memory used by each process should be mapped
>> Theoretically like ideal case may be realized if my application (8
>> threads) uses practically all the RAM and uses only shared memory
>> (I assume here also that all the RAM addresses have the same load,
>> and the size of program codes is zero :-) ).
>> The questions are
>> 1) Is there some way to distribute analogously the local memory of
>> threads (I assume that it have the same size for each thread) using
>> "reasonable" NUMA allocation ?
>> 2) Is it right that using of numactl for applications may gives
>> improvements of performance for the following case:
>> the number of application processes is equal to the number of cores
>> of one CPU *AND* the necessary (for application) RAM amount may be
>> placed on one node DIMMs (I assume that RAM is allocated
>> What will be w/performance (at numactl using) for the case if RAM
>> size required is higher than RAM available per one node, and
>> therefore the program will not use the possibility of (load
>> balanced) simultaneous using of memory controllers on both CPUs ?
>> (I also assume also that RAM is allocated continously).
>> 3) Is there some reason to use things like
>> mpirun -np N /usr/bin/numactl <numactl_parameters> my_application
>> 4) If I use malloc() and don't use numactl, how to understand -
>> from which node Linux will begin the real memory allocation ? (I
>> remember that I assume that all the RAM is free) And how to
>> understand where are placed the DIMMs which will corresponds to
>> higher RAM addresses or lower RAM addresses ?
>> 5) In which cases is it reasonable to switch on "Node memory
>> interleaving" (in BIOS) for the application which uses more memory
>> than is presented on the node ?
>> And BTW: if I use taskset -c CPU1,CPU2, ... <program_file>
>> and the program_file creates some new processes, will all this
>> processes run only on the same CPUs defined in taskset command ?
>> Mikhail Kuzminsky
>> Computer Assistance to Chemical Research Center,
>> Zelinsky Institute of Organic Chemistry
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf