[Beowulf] performance tweaks and optimum memory configs for a Nehalem

Joshua Baker-LePain jlb17 at duke.edu
Mon Aug 10 12:09:48 PDT 2009

On Mon, 10 Aug 2009 at 11:43am, Rahul Nabar wrote

> On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahn<hahn at mcmaster.ca> wrote:
>>> (a) I am seeing strange scaling behaviours with Nehlem cores. eg A
>>> specific DFT (Density Functional Theory) code we use is maxing out
>>> performance at 2, 4 cpus instead of 8. i.e. runs on 8 cores are
>>> actually slower than 2 and 4 cores (depending on setup)
>> this is on the machine which reports 16 cores, right?  I'm guessing
>> that the kernel is compiled without numa and/or ht, so enumerates virtual
>> cpus first.  that would mean that when otherwise idle, a 2-core
>> proc will get virtual cores within the same physical core.  and that your 8c
>> test is merely keeping the first socket busy.
> No. On both machines. The one reporting 16 cores and the other
> reporting 8. i.e. one hyperthreaded and the other not. Both having 8
> physical cores.
> What is bizarre is I tried using -np 16. THat ought to definitely
> utilize all cores, right? I'd have expected the 16 core performance to
> be the best. BUt no the performance peaks at a smaller number of
> cores.

Well, as there are only 8 "real" cores, running a computationally 
intensive process across 16 should *definitely* do worse than across 8. 
However, it's not so surprising that you're seeing peak performance with 
2-4 threads.  Nehalem can actually overclock itself when only some of the 
cores are busy -- it's called Turbo Mode.  That *could* be what you're 

Joshua Baker-LePain
QB3 Shared Cluster Sysadmin

More information about the Beowulf mailing list