[Beowulf] bizarre scaling behavior on a Nehalem
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bill Broadley bill at cse.ucdavis.eduWed Aug 12 19:42:30 PDT 2009
- Previous message: [Beowulf] bizarre scaling behavior on a Nehalem
- Next message: [Beowulf] PathScale (RIP) WAS: bizarre scaling behavior on a Nehalem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Rahul Nabar wrote: > On Tue, Aug 11, 2009 at 12:06 PM, Bill Broadley<bill at cse.ucdavis.edu> wrote: >> Looks to me like you fit in the barcelona 512KB L2 cache (and get good >> scaling) and do not fit in the nehalem 256KB L2 cache (and get poor scaling). > > Thanks Bill! I never realized that the L2 cache of the Nehalem is > actually smaller than that of the Barcelona! Indeed. Usually a doubling of cache size doesn't make a huge difference, but of course there are the occasional times when it makes a big difference. > I have an E5520 and a X5550. Both have the 8 MB L3 cache I believe. > THe size of the L2 cache is fixed across the steppings of the Nehlem > isn't it? I believe so, at least so far. >> Were the binaries compiled specifically to target both architectures? As a >> first guess I suggest trying pathscale (RIP) or open64 for amd, and intel's >> compiler for intel. But portland group does a good job at both in most cases. > > We used the intel compilers. One of my fellow grad students did the > actual compilation for VASP but I believe he used the "correct" [sic] > flags to the best of our knowledge. I could post them on the list > perhaps. There was no cross-compilation. We compiled a fresh binary > for the Nehalem. I'd make sure the compiler is fairly current. I believe both the barcelona/shanghai and the core i7/nehalem have some significant tweaks that if the compiler isn't aware of the new functionality you leave significant performance on the table. In particular the newest SSE features won't be of any benefit without direct compiler support. >> A doubling of the can have that effect. The Intel L3 can no come anywhere >> close to feeding 4 cores running flat out. > > Could you explain this more? I am a little lost with the processor > dynamics. In general each step through the memory hierarchy (registers, l1, l2, l3, and main memory) approximately double latency and halve the bandwidth available. So for instance if you fit in L1 caches you might well be able to enjoy 160GB/sec, but if you more than 1MB on a nehalem chip you will be in L3 with only 48GB/sec or so. Check out: (the slightly updated) http://cse.ucdavis.edu/bill/pstream.svg So if you compare the 2MB lines the core i7 with 4 threads running can handle 47GB/sec. The dual socket barcelona or shanghai system can handle 128GB/sec. So even a dual socket Nehalem, even with one of the faster clocks (I tested 2.6 GHz) and perfect scaling the dual nehelam would only get 95GB/sec still well below the amd score. Of course there are many other things going on and it might well be other differences in the architecture responsible for the difference. Even if it was memory bandwidth there was many other parts of the graph where the single socket intel does substantially better than half the AMD, and in the case of accessing main memory the single socket intel is faster than the dual socket AMD. So basically it comes down to fun handwaving about the architecture, but if you are making a price/performance decision collect a bunch of production runs and get out a stop watch. Your vasp difference in performance and scaling might well disappear with different inputs. > Does this mean using a quad core for HPC on the Nehlem is > not likely to work well for scaling? Or do you imply a solution so > that I could fix this somehow? I didn't test a dual socket nehalem because I didn't have access, I hope to have numbers soonish. In the mean time contact me off list if you want the code to try it yourself.
- Previous message: [Beowulf] bizarre scaling behavior on a Nehalem
- Next message: [Beowulf] PathScale (RIP) WAS: bizarre scaling behavior on a Nehalem
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
