Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] bizarre scaling behavior on a Nehalem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Bruno Coutinho coutinho at dcc.ufmg.br
Tue Aug 11 16:27:55 PDT 2009


2009/8/11 Rahul Nabar <rpnabar at gmail.com>

> On Tue, Aug 11, 2009 at 5:57 PM, Bruno Coutinho<coutinho at dcc.ufmg.br>
> wrote:
> > Nehalem and Barcelona have the following cache architecture:
> >
> > L1 cache: 64KB (32kb data, 32kb instruction), per core
> > L2 cache: Barcelona :512kb, Nehalem: 256kb, per core
> > L3 cache: Barcelona: 2MB, Nehalem: 8MB , shared among all cores.
> >
> >
> > Both in Barcelona and Nehalem, the "uncore" (everything outside a core,
> like
> > L3 and memory controllers) runs at lower speed than the cores and all
> cores
> > communicate through L3, so it must handle some coherence signals too.
> > This makes impossible to L3 feed all cores at full speed if L2 caches
> have
> > big miss ratios.
> >
> > So, what is happening with your program is something like:
> >
> > Working set fits Barcelona 512kb L2 cache, so it has 10% miss rate,
> > but is doesn't fits Nehalem 256km L2 cache, so it has 50% miss rate.
> > So in Nehelem the shared L3 cache has to handle much more requests from
> all
> > cores than Barcelona, becoming a big bottleneck.
>
> Thanks Bruno! That makes a lot of sense now. Assuming that is what is
> happening is there any way of still using the Nehalems fruitfully for
> this code? Any smart tricks / hacks?


You can use profilers that monitor hardware performance counters like
oprofile or papi to measure miss ratios and verify if that is what is
happening. But solving it is a much larger problem. :)



>
>
> The reason is that the Nehalems seem to scale and perform beautifully
> for my other codes.
>
> The only other option is to relapse back to the AMDs. I believe the
> Shanghai would be a choice or an Instanbul. I assume the cache
> structure there is as good as the Barcelona if not better! Any
> experiences with these chips on the group?
>
> Funnily, I haven't heard of any such Nehalem (-ive) stories anywhere
> else. Am I the first one to hit this cache bottleneck? I doubt it. Any
> other cache heavy users?
>
> --
> Rahul
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20090811/e8709a6f/attachment.html


More information about the Beowulf mailing list