[Beowulf] bizarre scaling behavior on a Nehalem

Tue Aug 11 12:04:34 PDT 2009

On Tue, Aug 11, 2009 at 12:06 PM, Bill Broadley<bill at cse.ucdavis.edu> wrote:
> Looks to me like you fit in the barcelona 512KB L2 cache (and get good
> scaling) and do not fit in the nehalem 256KB L2 cache (and get poor scaling).

Thanks Bill! I never realized that the L2 cache of the Nehalem is
actually smaller than that of the Barcelona!

I have an E5520 and a X5550. Both have the 8 MB L3 cache I believe.
THe size of the L2 cache is fixed across the steppings of the Nehlem
isn't it?

> Were the binaries compiled specifically to target both architectures?  As a
> first guess I suggest trying pathscale (RIP) or open64 for amd, and intel's
> compiler for intel.  But portland group does a good job at both in most cases.

We used the intel compilers. One of my fellow grad students did the
actual compilation for VASP but I believe he used the "correct" [sic]
flags to the best of our knowledge. I could post them on the list
perhaps. There was no cross-compilation. We compiled a fresh binary
for the Nehalem.

> I"m curious about the hyperthreading on data point as well.

Didn't test for VASP yet but for our other two DFT codes i.e. DACAPO
and GPAW hyperthreading "off" seems to be about 10% faster.

> A doubling of the can have that effect.  The Intel L3 can no come anywhere
> close to feeding 4 cores running flat out.

Could you explain this more? I am a little lost with the processor
dynamics. Does this mean using a quad core for HPC on the Nehlem is
not likely to work well for scaling? Or do you imply a solution so
that I could fix this somehow?

Thanks again!

-- 
Rahul