[Beowulf] bizarre scaling behavior on a Nehalem

Mikhail Kuzminsky kus at free.net
Wed Aug 12 08:14:25 PDT 2009

In message from Craig Tierney <Craig.Tierney at noaa.gov> (Tue, 11 Aug 
2009 11:40:03 -0600):
>Rahul Nabar wrote:
>> On Mon, Aug 10, 2009 at 12:48 PM, Bruno 
>>Coutinho<coutinho at dcc.ufmg.br> wrote:
>>> This is often caused by cache competition or memory bandwidth 
>>> If it was cache competition, rising from 4 to 6 threads would make 
>>>it worse.
>>> As the code became faster with DDR3-1600 and much slower with Xeon 
>>> this code is memory bandwidth bound.
>>> Tweaking CPU affinity to avoid thread jumping among cores of the 
>>>will not
>>> help much, as the big bottleneck is memory bandwidth.
>>> To this code, CPU affinity will only help in NUMA machines to 
>>> memory access in local memory.
>>> If the machine has enough bandwidth to feed the cores, it will 
>> Exactly! But I thought this was the big advance with the Nehalem 
>> it has removed the CPU<->Cache<->RAM bottleneck. So if the code 
>> with the AMD Barcelona then it would continue to scale with the
>> Nehalem right?
>> I'm posting a copy of my scaling plot here if it helps.
>> http://dl.getdropbox.com/u/118481/nehalem_scaling.jpg
>> To remove most possible confounding factors this particular Nehlem
>> plot is produced with the following settings:
>> Hyperthreading OFF
>> 24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration
>> X5550
>> Even if we explained away the bizzare performance of the 4 node case
>> to the Turbo effect what is most confusing is how the 8 core data
>> point could be so much slower than the corresponding 8 core point on 
>> old AMD Barcelona.
>> Something's wrong here that I just do not understand. BTW, any other
>> VASP users here? Anybody have any Nehalem experience?
>What are you doing to ensure that you have both memory and processor
>affinity enabled?

As I mentioned here in "numactl&SuSE11.1' thread, on some kernels 
there is wrong behaviour for Nehalem (bad /sys/devices/system/node 
directory content). This bug is presented, in particular, in default 
OpenSuSE 11 kernels ( and 2.6.29-6), and (as it was writted 
in the corresponding thread discussion) in FC11 2.6.29 kernel.

I found that in such situation disabling of NUMA in BIOS gives only 
increase of STREAM throughput. Therefore I think this (Rahul) problem 
is not due to BIOS settings. Unfortunately I've no data about VASP 

It's interesting, do somebody have "normally working" w/Nehalem - in 
the sense of NUMA - kernels ? AFAIK more old 2.6 kernels (from SuSE 
10.3) works OK, but I didn't check. May be error in NUMA support is 
the reason of Rahul problem ?


>> --
>> Rahul
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>> To change your subscription (digest mode or unsubscribe) visit 
>Craig Tierney (craig.tierney at noaa.gov)
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>To change your subscription (digest mode or unsubscribe) visit 
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ

More information about the Beowulf mailing list