[Beowulf] x86-64 NUMA vs SMP kernel: appl. performance?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bill Broadley bill at cse.ucdavis.eduFri Sep 24 15:31:25 PDT 2004
- Previous message: [Beowulf] x86-64 NUMA vs SMP kernel: appl. performance?
- Next message: [Beowulf] x86-64 NUMA vs SMP kernel: appl. performance?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Sep 24, 2004 at 05:47:40PM -0400, Robert G. Brown wrote: > On Fri, 24 Sep 2004, Greg Lindahl wrote: > > > On Fri, Sep 24, 2004 at 03:04:05PM -0400, Robert G. Brown wrote: > > > > > What compilers have you tried, and what improvements do they > > > produce? > > > > Robert, > > > > As you might recall, I do work for a compiler company, so obviously that > > should be kept in mind. The 3 apps mentioned by the original poster > > Impressive results nonetheless (and besides, I trust your honesty). > Since your customers are reporting them, I would assume that they are > just swapping the compilers in and out and not necessarily doing lots of > compiler specific tuning. I posted very similar numbers (if not the same) to the nwchem list and was contacted by someone at pathscale. The comparison I was doing involved a nwchem simulation a user wanted to purchase a cluster for. In consideration was g5, itanium 2 and opteron (nacoma wasn't out). As it turned out IBM xlf+g5 won the price/performance comparison against pgc+opteron. I got the pathscale 30 day eval and as it turned out that tipped the balance towards opteron when considering cluster price/performance. There were already what looked to me like reasonable compiler flags for both, so I used them. Normally I read the compiler docs, browse the settings used for specbench runs and attempt to search the optimization space for at least the low hanging fruit. Not to mention I didn't have any way to test correctness, so I figured I'd stick with the (hopefully) well tested flags. I was impressed that the pathscale compiler (a newcomer to the market) managed to compile a wide range of codes with no problems and produce impressively fast binaries. Amusingly one of the codes that took just 10-20 minutes to compile usually took 3 days on the itanium 2 + intel compiler and no it wasn't a particularly fast binary either. NWchem seems rather bandwidth or latency sensitive depending on the size and nature of the simulation, for that reason I'm waiting a bit to see how pci-express and hypertransport connected interconnects play out. A few motherboard manufacturers and interconnect companies have been making very interesting noises as of late. Seems like for commmunications intensive codes SGI will have some competition from Octigabay and similar designs. I have not yet done a similar comparison against the 3.4 GHz nacoma and intel's 8.1 compiler. I'm not sure if the 8.1 compiler will cripple x86-64 running like the previous version did. Nor have I tested pathscale 1.3. On a smaller custom production code I did attempt to search the compiler optimization spaces and the pathscale advantage was even higher. > Are these fortran or C results (or do you know)? And how much do the > compilers cost (and how do the costs scale over a cluster)? My understanding (possibly flawed) that produced binaries could be run on the entire cost so a fixed or a dynamic license would allow a single user to compile on the head node. The licensing I believe was "sticky" for 10 minutes or so. I believe the pricing is available on the pathscale website. -- Bill Broadley Computational Science and Engineering UC Davis
- Previous message: [Beowulf] x86-64 NUMA vs SMP kernel: appl. performance?
- Next message: [Beowulf] x86-64 NUMA vs SMP kernel: appl. performance?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
