[Beowulf] x86-64 NUMA vs SMP kernel: appl. performance?
mwill at penguincomputing.com
Fri Sep 24 11:21:00 PDT 2004
I benchmarked a neuronal network training program for a customer on a 2-CPU
opteron system with SLES8. Running it over and over again produced two distinct
timing values, independend on which CPU it actually ran on. What really happened
was that the additional latency incurred when the code ran on CPU A but the
RAM was allocated on the RAM attached to CPU B (or vice versa) slowed down
things by 15%.
Instead of using a NUMA aware kernel we just used a bios feature of the mainboard
that configures it as 'node interleaved memory access' striping the address space
across both RAM banks rather than having two distinct continous blocks of RAM,
averaging out the effect.
I too would be interested in newer experiments with the NUMA enabled kernels,
since it could give you a 5-7% speed advantage over the simpler SMP assumption.
> I have possible choice between using of SMP or NUMA-enabled x86_64
> kernels (2.4.21 from SuSE Linux 9.0 distributive). We use 2-way
> Opteron-based nodes w/2 GBytes RAM per node (symmetrical DIMMs
> Our applications are parallelized :-), so we have 2 "computing
> per each 2-way SMP node.
> Have somebody data about relative performance of applications working
> under NUMA vs SMP kernels ? Quantum chemical packages like
> Gaussian/Gamess-US/NWchem are the most interesting (their performance
> is "memory-bounded"), but at least direct STREAM results are
> Mikhail Kuzminsky
> Zelinsky Institute of Organic Chemistry
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Michael Will, Linux Sales Engineer
NEWS: We have moved to a larger iceberg :-)
NEWS: 300 California St., San Francisco, CA.
Tel: 415-954-2822 Toll Free: 888-PENGUIN
More information about the Beowulf