Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] bizarre scaling behavior on a Nehalem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mikhail Kuzminsky kus at free.net
Wed Aug 12 08:14:25 PDT 2009


In message from Craig Tierney <Craig.Tierney at noaa.gov> (Tue, 11 Aug 
2009 11:40:03 -0600):
>Rahul Nabar wrote:
>> On Mon, Aug 10, 2009 at 12:48 PM, Bruno 
>>Coutinho<coutinho at dcc.ufmg.br> wrote:
>>> This is often caused by cache competition or memory bandwidth 
>>>saturation.
>>> If it was cache competition, rising from 4 to 6 threads would make 
>>>it worse.
>>> As the code became faster with DDR3-1600 and much slower with Xeon 
>>>5400,
>>> this code is memory bandwidth bound.
>>> Tweaking CPU affinity to avoid thread jumping among cores of the 
>>>will not
>>> help much, as the big bottleneck is memory bandwidth.
>>> To this code, CPU affinity will only help in NUMA machines to 
>>>maintain
>>> memory access in local memory.
>>>
>>>
>>> If the machine has enough bandwidth to feed the cores, it will 
>>>scale.
>> 
>> Exactly! But I thought this was the big advance with the Nehalem 
>>that
>> it has removed the CPU<->Cache<->RAM bottleneck. So if the code 
>>scaled
>> with the AMD Barcelona then it would continue to scale with the
>> Nehalem right?
>> 
>> I'm posting a copy of my scaling plot here if it helps.
>> 
>> http://dl.getdropbox.com/u/118481/nehalem_scaling.jpg
>> 
>> To remove most possible confounding factors this particular Nehlem
>> plot is produced with the following settings:
>> 
>> Hyperthreading OFF
>> 24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration
>> X5550
>> 
>> Even if we explained away the bizzare performance of the 4 node case
>> to the Turbo effect what is most confusing is how the 8 core data
>> point could be so much slower than the corresponding 8 core point on 
>>a
>> old AMD Barcelona.
>> 
>> Something's wrong here that I just do not understand. BTW, any other
>> VASP users here? Anybody have any Nehalem experience?
>> 
>
>Rahul,
>What are you doing to ensure that you have both memory and processor
>affinity enabled?
>Craig

As I mentioned here in "numactl&SuSE11.1' thread, on some kernels 
there is wrong behaviour for Nehalem (bad /sys/devices/system/node 
directory content). This bug is presented, in particular, in default 
OpenSuSE 11 kernels (2.6.27.7-9 and 2.6.29-6), and (as it was writted 
in the corresponding thread discussion) in FC11 2.6.29 kernel.

I found that in such situation disabling of NUMA in BIOS gives only 
increase of STREAM throughput. Therefore I think this (Rahul) problem 
is not due to BIOS settings. Unfortunately I've no data about VASP 
itself.

It's interesting, do somebody have "normally working" w/Nehalem - in 
the sense of NUMA - kernels ? AFAIK more old 2.6 kernels (from SuSE 
10.3) works OK, but I didn't check. May be error in NUMA support is 
the reason of Rahul problem ?

Mikhail        

>
>
>> --
>> Rahul
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>>Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>>http://www.beowulf.org/mailman/listinfo/beowulf
>> 
>
>
>-- 
>Craig Tierney (craig.tierney at noaa.gov)
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>Computing
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>-- 
>üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
>É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
>ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
>




More information about the Beowulf mailing list