Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] performance tweaks and optimum memory configs for a Nehalem

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Tom Elken tom.elken at qlogic.com
Mon Aug 10 14:07:23 PDT 2009


> Well, as there are only 8 "real" cores, running a computationally
> intensive process across 16 should *definitely* do worse than across 8.

Not typically.

At the SPEC website there are quite a few SPEC MPI2007 (which is an average across 13 HPC applications) results on Nehalem.

Summary:
IBM, SGI and Platform have some comparisons on clusters with "SMT On" of running 1 rank for every core compared to running 2 ranks on every core.  In general, on low core-counts, like up to 32 there is about an 8% advantage for running 2 ranks per core.  At larger core counts, IBM published a pair of results on 64 cores where the 64-rank performance was equal to the 128-rank performance.  Not all of these applications scale linearly, so on some of them you lose efficiency at 128 ranks compared to 64 ranks.

Details: Results from this year are mostly on Nehalem:
http://www.spec.org/mpi2007/results/res2009q3/ (IBM)
http://www.spec.org/mpi2007/results/res2009q2/ (Platform)
http://www.spec.org/mpi2007/results/res2009q1/ (SGI)
  (Intel has results with Turbo mode turned on and off
    in the q2 and q3 results, for a different comparison)

Or you can pick out the Xeon 'X5570' and 'X5560' results from the list of all results:
http://www.spec.org/mpi2007/results/mpi2007.html

In the result index, when 
" Compute Threads Enabled" = 2x "Compute Cores Enabled", then you know SMT is turned on.
In these cases, you can then check that when 
" MPI Ranks" = " Compute Threads Enabled" then you are running 2 ranks per core.


-Tom

> However, it's not so surprising that you're seeing peak performance
> with
> 2-4 threads.  Nehalem can actually overclock itself when only some of
> the
> cores are busy -- it's called Turbo Mode.  That *could* be what you're
> seeing.
> 
> --
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF




More information about the Beowulf mailing list