[Beowulf] NUMA info request
mark.kosmowski at gmail.com
Tue Mar 25 03:58:58 PDT 2008
On Tue, Mar 25, 2008 at 12:17 AM, Eric Thibodeau <kyron at neuralbs.com> wrote:
> Mark Hahn wrote:
> >> NUMA is an acronym meaning Non Uniform Memory Access. This is a
> >> hardware constraint and is not a "performance" switch you turn on.
> >> Under the Linux
> > I don't agree. NUMA is indeed a description of hardware. I'm not
> > sure what you meant by "constraint" - NUMA is not some kind of
> > shortcoming.
> Mark is right, my choice of words is misleading. By constraint I meant
> that you have to be conscious of what ends up where (that was the point
> of the link I added in my e-mail ;P )
> >> kernel there is an option that is meant to tell the kernel to be
> >> conscious about that hardware fact and attempt to help it optimize
> >> the way it maps the memory allocation to a task Vs the processor the
> >> given task will be using (processor affinity, check out taskset (in
> >> recent util-linux implementations, ie: 2.13+).
> > the kernel has had various forms of NUMA and socket affinity for a
> > long time,
> > and I suspect most any distro will install kernel which has the
> > appropriate support (surely any x86_64 kernel would have NUMA support).
> My point of view on distro kernels is that they are to be scrutinized
> unless they are specifically meant to be used as computation nodes (ie:
> don't expect CONFIG_HZ=100 to be set on "typical" distros).
> Also, NUMA is only applicable to Opteron architecture (internal MMU with
> HyperTransport), not the Intel flavor of multi-core CPUs (external MMU,
> which can be a single bus or any memory access scheme as dictated by the
> motherboard manufacturer).
> > I usually use numactl rather than taskset. I'm not sure of the
> > history of those tools. as far as I can tell, taskset only addresses
> > numactl --cpubind,
> > though they obviously approach things differently. if you're going to
> > use taskset, you'll want to set cpu affinity to multiple cpus (those
> > local to a socket, or 'node' in numactl terms.)
> >> In your specific case, you would have 4Gigs per CPU and would want
> >> to make sure each task (assuming one per CPU) stays on the same CPU
> >> all the time and would want to make sure each task fits within the
> >> "local" 4Gig.
> > "numactl --localalloc".
> > but you should first verify that your machines actually do have the 8GB
> > split across both nodes. it's not that uncommon to see an
> > inexperienced assembler fill up one node before going onto the next,
> > and there have even
> > been some boards which provided no memory to the second node.
> Mark (Hahn) is right (again !), I ASSumed the tech would load the memory
> banks appropriately, don't make that mistake ;) And numactl is indeed
> more appropriate in this case (thanks Mr. Hahn ;) ). Note that the
> kernel (configured with NUMA) _will_ attempt to allocate the memory to
> "'local nodes" before offloading to memory "abroad".
The memory will be installed by myself correctly - that is,
distributing the memory according to cpu. However, it appears that
one of my nodes (my first Opteron machine) may well be one that has
only one bank of four DIMM slots assigned to cpu 0 and shared by cpu
1. It uses a Tyan K8W Tiger s2875 motherboard. My other two nodes
use Arima HDAMA motherboards with SATA support - each cpu has a bank
of 4 DIMMs associated with it. The Tyan node is getting 4 @ 2 Gb
DIMMs, one of the HDAMA nodes is getting 8 @ 1 Gb (both instances
fully populating the available DIMM slots) and the last machine is
going to get 4 @ 1 Gb DIMMs for one cpu and 2 @ 2 Gb for the other.
It looks like I may want to upgrade my motherboard before exploring
NUMA / affinity then.
This discussion as well as reading about NUMA and affinity elsewhere
leads to another question - what is the difference between using
numactl or using the affinity options of my parallelization software
(in my case openmpi)?
More information about the Beowulf