Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] bizarre scaling behavior on a Nehalem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Bill Broadley bill at cse.ucdavis.edu
Wed Aug 12 11:19:59 PDT 2009


Gus Correa wrote:
> Hi Bill, list
> 
> Bill:  This is very interesting indeed.  Thanks for sharing!
> 
> Bill's graph seem to show that Shanghai and Barcelona scale
> (almost) linearly with the number of cores, whereas Nehalem stops
> scaling and flattens out at 4 cores.

Right.  That's not really surprising since the core i7 has only 4 cores.  I
wasn't testing a dual socket nehalem.  So on a single socket core i7 that I
tested the hyperthreading provided no additional performance.  None to
surprising since hyperthreading is about sharing idle functional units, but
doesn't do much when the cache or memory system is saturated.

> The Nehalem 8 cores and 4 cores curves are virtually indistinguishable,

Yes, but it was 8 threads on 4 cores, vs 4 threads on 4 cores.  I'd expect
something less memory intensive and more cpu intensive would show a big
difference.  In fact many of the HPC codes I've tried see a benefit.

> and for very large arrays 4 cores is ahead.
> Only for huge arrays (>16M) Nehalem gets ahead
> of Shanghai and Barcelona.

Yes, impressive that a single socket intel has more main memory bandwidth then
a dual socket shanghai.

> Did I interpret the graph right?
> Wasn't this type of scaling problem that plagued
> the Clovertown and Harpertown?

Heh, the mention single socket core i7 has substantially more (2-4x) memory
bandwidth of the previous generation intels.

> Any possibility that kernels, BIOS, etc, are not yet ready for Nehalem?

They look good for me, still trying to find out why I don't see better
performance inside L1 though.



More information about the Beowulf mailing list