[Beowulf] Strange Opteron 2350 performance: Gaussian-03
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mikhail Kuzminsky kus at free.netSat Jun 28 15:30:54 PDT 2008
- Previous message: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
- Next message: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In message from Joe Landman <landman at scalableinformatics.com> (Sat, 28 Jun 2008 14:48:02 -0400):> > This is possible, depending upon the compiler used. Though I have >to >admit that I find it odd that it would be the case within the Opteron >family and not between Opteron and Xeon. > > Intel compilers used to (haven't checked 10.1) switch between fast >(SSE*) and slow (x87 FP) paths as a function of a processor version >string. If this is an old Intel compiler built code, this is >possible that the code paths may be different, though as noted, I >would find that surprising if this were the case within the Opteron >family. Well, I thought about (absense of) using of SSE in binary Gaussian 03 Rev.C02 version I used, but even if x87-codes were really generated by pgf77 - why this x87-based codes gives such "high" performance on Opteron 246 in comparison w/Opteron 2350 core ? On both CPUs I ran the same binary Gaussian codes ! > Modern PGI compilers (suggested default for Gaussian-03 last I >checked) have the ability to do this as well, though I don't know how >they implement it (capability testing hopefully?) > > Out of curiousity, how does streams run on both systems? I ran stream on Opteron 242 and 244 few years ago. The scalability and the troughput itself was OK. Currently I ran stream on my Opteron 2350-based dual-socket server. In accordance w/more fast DDR2-667 I obtained more high throughput. I reproduced in particular 8-cores result presented in McCalpin's table (sent from AMD), and some data presented early on our Beowulf maillist. (BTW, there is one bad thing for stream on this server - the corresponding data are absent in McCalpin's table: the throughput is scaled good from 1 to 2 OpenMP threads, and gives good result for 8 threads, but the throughput for 4 threads is about the same as for 2 threads. The reason is, IMHO, that for 8 threads RAM is allocated by kernel in both nodes, but for 4 threads the RAM allocated is placed in one node, and 4 threads have bad competition for memory access). Taking into account that Gaussian-03 was bad on Opteron 2350 core - in sequential run, Opteron 2350 RAM gives it only pluses in comparison w/Opteron 246. I didn't run stream on Opteron 246, but it's clear for me. > Also, it >is >possible, with a larger cache, that you might be running into some >odd cache effects (tlb/page thrashing). But DFTs are usually "small" >and thus "sensitive" to cache size. > > You might be able to instrument the run within a papi wrapper, and >see if you observe a large number of cache/tlb flushes for some >reason. > > On a related note: are you using a stepping before B3 of 2350? > That >could impact performance, if you have the patch in place or have the >tlb/cache turned off in bios (some MB makers created a patch to do >this). Gaussian-03 fails in link302 on Barcelona B2 because of this error. I use stepping B3. Yours Mikhail > >Joe > > >-- >Joseph Landman, Ph.D >Founder and CEO >Scalable Informatics LLC, >email: landman at scalableinformatics.com >web : http://www.scalableinformatics.com > http://jackrabbit.scalableinformatics.com >phone: +1 734 786 8423 >fax : +1 866 888 3112 >cell : +1 734 612 4615
- Previous message: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
- Next message: [Beowulf] Strange Opteron 2350 performance: Gaussian-03
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
