[Beowulf] Benchmark between Dell Poweredge 1950 And 1435
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bill Broadley bill at cse.ucdavis.eduThu Mar 8 14:40:37 PST 2007
- Previous message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Next message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
As Robert Brown (and others) so eloquently said. Nothing is better than your actual application with your actual input files in an actual production run. Results vary widely, and any kind of general statement could easily be proven significantly wrong in your specific case. Additional things to keep in mind: * Compilers can make a huge difference. Intel for instance used to penalize AMD chips on the order of 5-15% with their compiler. This was proven by removing the if (running_on_amd()) check and seeing the improved performance. Other compilers will achieve different performance because they achieve a different percentage of peak performance. Pathscale in particular seems to sometimes achieve great performance and at other times just average performance.... highly code dependent. * Intel and AMD have dramatically different cache and memory architectures. Make sure your runs are as close to real world usage as possible. In particular single thread performance on a dual socket dual core node can behave dramatically different than running 4 threads on a dual socket dual core node. * Performance of a single application can change radically based on performance. Intel for instance might win on your application with a "benchmark" dataset that runs quickly, but run more poorly on a real dataset that is more memory intensive. Then again some product codes/datasets will run dramatically better on the intel chips. In general Intel wins many floating point single thread codes, their 4MB of L2 (vs 1MB on AMD) and 7-9GB/sec memory system can keep up with the demands of the single thread well enough to leverage the generally higher floating point performance. SpecFP2000 isn't a terrible way to measure this (again not nearly as nice as running your own application). In the 4 thread case several factors cause the intel chip to scale poorly, the L2 cache is shared so you get 2MB per core (instead of 4) AND the cache can't meet the needs of 2 cores hitting L2 flat out. Then as you fall out of (the smaller) cache the memory system doesn't scale. I've yet to identify why, but the advertised "dual frontside bus" seems to improve bandwidth by about 0% compared to the rather poor throughput of the last generation netburst shared FSB. So despite a significant gain in cores (double), work done per cycle (about double) the current generation Intel chips have no more memory bandwidth than the previous generation. I played with various BIOS settings (cache snooping and related) with zero improvement in the observed numbers. If intel has somehow fixed this please post to the list, despite having 2 128 bit memory interfaces, and 2 frontside busses I've yet to see a case where the bandwidth improves (let along doubles). If you look at the Spec2000 FP Rate benchmarks you'll see that despite a substantial lead in single thread performance that the system performance is just about dead even with the opteron. Spec2000 isn't exactly a current benchmark and was intended for systems with relatively little ram (256 or 512MB if memory serves), any number of real world applications could be significantly more memory intensive than the old spec. So all the above is just so much handwaving, any of dozens of factors could double of halve performance on your application, get out a stop watch and run it. I suspect any number of vendors or even fellow beowulf list folks would either run your application code or allow you to run it. For a wide mix of applications in the past I've leaned towards AMD because my real world testing showed AMD usually won. The gap has closed significantly in the last year (it used to be so embarrassing). Today I'd call it mostly a wash. Things are shaping up to be pretty interesting, AMD has the opportunity to take a commanding lead with their next generation chip which rumors claim will be shipping this summer. The bad news is that while AMD's next generation promises dramatically better work done per cycle, the memory system doesn't look like it's going to get much (if any) more memory bandwidth.
- Previous message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Next message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
