[Beowulf] bizarre scaling behavior on a Nehalem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mikhail Kuzminsky kus at free.netWed Aug 12 11:50:16 PDT 2009
- Previous message: [Beowulf] bizarre scaling behavior on a Nehalem
- Next message: [Beowulf] bizarre scaling behavior on a Nehalem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In message from Gus Correa <gus at ldeo.columbia.edu> (Wed, 12 Aug 2009 14:09:04 -0400): >Hi Bill, list > >Bill: This is very interesting indeed. Thanks for sharing! > >Bill's graph seem to show that Shanghai and Barcelona scale >(almost) linearly with the number of cores, whereas Nehalem stops >scaling and flattens out at 4 cores. >The Nehalem 8 cores and 4 cores curves are virtually >indistinguishable, >and for very large arrays 4 cores is ahead. >Only for huge arrays (>16M) Nehalem gets ahead >of Shanghai and Barcelona. IMHO, if arrays are not "huge", they will fit in cache L3 (8MB !). Or on X axe are presented Mwords ? Mikhail > >Did I interpret the graph right? >Wasn't this type of scaling problem that plagued >the Clovertown and Harpertown? >Any possibility that kernels, BIOS, etc, are not yet ready for >Nehalem? > >Thanks, >Gus Correa >--------------------------------------------------------------------- >Gustavo Correa >Lamont-Doherty Earth Observatory - Columbia University >Palisades, NY, 10964-8000 - USA >--------------------------------------------------------------------- > >Bill Broadley wrote: >> I've been working on a pthread memory benchmark that is loosely >>modeled on >> McCalpin's stream. It's been quite a challenge to remove all the >>noise/lost >> performance from the benchmark to get close to performance I >>expected. Some >> of the obstacles: >> * For the compilers that tend to be better at stream (open64 and >>pathscale), >> you lose the performance if you just replace double a[],b[],c[] >>with >> double *a,*b,*c. Patch[1] available. I don't have a work around >>for >> this, suggestions welcome. Is it really necessary for dynamic >>arrays >> to be substantially slower than static? >> * You have to be very careful with pointer alignment both with cache >>lines, >> and each other >> * cpu_affinity (by CPU id) >> * numa (by socket id) >> >> The results are relatively smooth graphs, here's an example, it's >>uselessly >> busy until you toggle off a few graphs (by clicking on the key): >> >> http://cse.ucdavis.edu/bill/pstream.svg >> >> The biggest puzzle I have now is what the previous generation intel >>quads, the >> current generation AMD quads, and numerous other CPUs show a big >>benefit in >> L1, while the nehalem shows no benefit. >> >> [1] http://cse.ucdavis.edu/bill/stream-malloc.patch >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >>Computing >> To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf > >-- >üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ× >É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ >MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ >ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ. >
- Previous message: [Beowulf] bizarre scaling behavior on a Nehalem
- Next message: [Beowulf] bizarre scaling behavior on a Nehalem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
