Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] performance tweaks and optimum memory configs for a Nehalem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Craig Tierney Craig.Tierney at noaa.gov
Mon Aug 10 13:20:36 PDT 2009


Joshua Baker-LePain wrote:
> On Mon, 10 Aug 2009 at 11:43am, Rahul Nabar wrote
> 
>> On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahn<hahn at mcmaster.ca> wrote:
>>>> (a) I am seeing strange scaling behaviours with Nehlem cores. eg A
>>>> specific DFT (Density Functional Theory) code we use is maxing out
>>>> performance at 2, 4 cpus instead of 8. i.e. runs on 8 cores are
>>>> actually slower than 2 and 4 cores (depending on setup)
>>>
>>> this is on the machine which reports 16 cores, right?  I'm guessing
>>> that the kernel is compiled without numa and/or ht, so enumerates
>>> virtual
>>> cpus first.  that would mean that when otherwise idle, a 2-core
>>> proc will get virtual cores within the same physical core.  and that
>>> your 8c
>>> test is merely keeping the first socket busy.
>>
>> No. On both machines. The one reporting 16 cores and the other
>> reporting 8. i.e. one hyperthreaded and the other not. Both having 8
>> physical cores.
>>
>> What is bizarre is I tried using -np 16. THat ought to definitely
>> utilize all cores, right? I'd have expected the 16 core performance to
>> be the best. BUt no the performance peaks at a smaller number of
>> cores.
> 
> Well, as there are only 8 "real" cores, running a computationally
> intensive process across 16 should *definitely* do worse than across 8.
> However, it's not so surprising that you're seeing peak performance with
> 2-4 threads.  Nehalem can actually overclock itself when only some of
> the cores are busy -- it's called Turbo Mode.  That *could* be what
> you're seeing.
> 

We are seeing that the chips will overclock themselves even with all cores
running.  The percent increase in speed can be from 2-10% per node.  I have
never had a run (single node HPL) run as slow as it does when Turbo is
turned off.  However, with all the variation per node, there isn't much
of a win for large jobs as they will generally slow down to the slowest node.

Craig


> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
Craig Tierney (craig.tierney at noaa.gov)



More information about the Beowulf mailing list