[Beowulf] bizarre scaling behavior on a Nehalem
kus at free.net
Wed Aug 12 08:14:25 PDT 2009
In message from Craig Tierney <Craig.Tierney at noaa.gov> (Tue, 11 Aug
2009 11:40:03 -0600):
>Rahul Nabar wrote:
>> On Mon, Aug 10, 2009 at 12:48 PM, Bruno
>>Coutinho<coutinho at dcc.ufmg.br> wrote:
>>> This is often caused by cache competition or memory bandwidth
>>> If it was cache competition, rising from 4 to 6 threads would make
>>> As the code became faster with DDR3-1600 and much slower with Xeon
>>> this code is memory bandwidth bound.
>>> Tweaking CPU affinity to avoid thread jumping among cores of the
>>> help much, as the big bottleneck is memory bandwidth.
>>> To this code, CPU affinity will only help in NUMA machines to
>>> memory access in local memory.
>>> If the machine has enough bandwidth to feed the cores, it will
>> Exactly! But I thought this was the big advance with the Nehalem
>> it has removed the CPU<->Cache<->RAM bottleneck. So if the code
>> with the AMD Barcelona then it would continue to scale with the
>> Nehalem right?
>> I'm posting a copy of my scaling plot here if it helps.
>> To remove most possible confounding factors this particular Nehlem
>> plot is produced with the following settings:
>> Hyperthreading OFF
>> 24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration
>> Even if we explained away the bizzare performance of the 4 node case
>> to the Turbo effect what is most confusing is how the 8 core data
>> point could be so much slower than the corresponding 8 core point on
>> old AMD Barcelona.
>> Something's wrong here that I just do not understand. BTW, any other
>> VASP users here? Anybody have any Nehalem experience?
>What are you doing to ensure that you have both memory and processor
As I mentioned here in "numactl&SuSE11.1' thread, on some kernels
there is wrong behaviour for Nehalem (bad /sys/devices/system/node
directory content). This bug is presented, in particular, in default
OpenSuSE 11 kernels (126.96.36.199-9 and 2.6.29-6), and (as it was writted
in the corresponding thread discussion) in FC11 2.6.29 kernel.
I found that in such situation disabling of NUMA in BIOS gives only
increase of STREAM throughput. Therefore I think this (Rahul) problem
is not due to BIOS settings. Unfortunately I've no data about VASP
It's interesting, do somebody have "normally working" w/Nehalem - in
the sense of NUMA - kernels ? AFAIK more old 2.6 kernels (from SuSE
10.3) works OK, but I didn't check. May be error in NUMA support is
the reason of Rahul problem ?
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> To change your subscription (digest mode or unsubscribe) visit
>Craig Tierney (craig.tierney at noaa.gov)
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>To change your subscription (digest mode or unsubscribe) visit
>üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
>É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
>ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
More information about the Beowulf