[Beowulf] bizarre scaling behavior on a Nehalem

Bill Broadley bill at cse.ucdavis.edu
Wed Aug 12 11:19:59 PDT 2009


Gus Correa wrote:
> Hi Bill, list
> 
> Bill:  This is very interesting indeed.  Thanks for sharing!
> 
> Bill's graph seem to show that Shanghai and Barcelona scale
> (almost) linearly with the number of cores, whereas Nehalem stops
> scaling and flattens out at 4 cores.

Right.  That's not really surprising since the core i7 has only 4 cores.  I
wasn't testing a dual socket nehalem.  So on a single socket core i7 that I
tested the hyperthreading provided no additional performance.  None to
surprising since hyperthreading is about sharing idle functional units, but
doesn't do much when the cache or memory system is saturated.

> The Nehalem 8 cores and 4 cores curves are virtually indistinguishable,

Yes, but it was 8 threads on 4 cores, vs 4 threads on 4 cores.  I'd expect
something less memory intensive and more cpu intensive would show a big
difference.  In fact many of the HPC codes I've tried see a benefit.

> and for very large arrays 4 cores is ahead.
> Only for huge arrays (>16M) Nehalem gets ahead
> of Shanghai and Barcelona.

Yes, impressive that a single socket intel has more main memory bandwidth then
a dual socket shanghai.

> Did I interpret the graph right?
> Wasn't this type of scaling problem that plagued
> the Clovertown and Harpertown?

Heh, the mention single socket core i7 has substantially more (2-4x) memory
bandwidth of the previous generation intels.

> Any possibility that kernels, BIOS, etc, are not yet ready for Nehalem?

They look good for me, still trying to find out why I don't see better
performance inside L1 though.



More information about the Beowulf mailing list