[Beowulf] bizarre scaling behavior on a Nehalem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bill Broadley bill at cse.ucdavis.eduWed Aug 12 08:14:09 PDT 2009
- Previous message: [Beowulf] bizarre scaling behavior on a Nehalem
- Next message: [Beowulf] bizarre scaling behavior on a Nehalem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I've been working on a pthread memory benchmark that is loosely modeled on McCalpin's stream. It's been quite a challenge to remove all the noise/lost performance from the benchmark to get close to performance I expected. Some of the obstacles: * For the compilers that tend to be better at stream (open64 and pathscale), you lose the performance if you just replace double a[],b[],c[] with double *a,*b,*c. Patch[1] available. I don't have a work around for this, suggestions welcome. Is it really necessary for dynamic arrays to be substantially slower than static? * You have to be very careful with pointer alignment both with cache lines, and each other * cpu_affinity (by CPU id) * numa (by socket id) The results are relatively smooth graphs, here's an example, it's uselessly busy until you toggle off a few graphs (by clicking on the key): http://cse.ucdavis.edu/bill/pstream.svg The biggest puzzle I have now is what the previous generation intel quads, the current generation AMD quads, and numerous other CPUs show a big benefit in L1, while the nehalem shows no benefit. [1] http://cse.ucdavis.edu/bill/stream-malloc.patch
- Previous message: [Beowulf] bizarre scaling behavior on a Nehalem
- Next message: [Beowulf] bizarre scaling behavior on a Nehalem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
