[Beowulf] Benchmark between Dell Poweredge 1950 And 1435
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at mcmaster.caThu Mar 8 10:26:30 PST 2007
- Previous message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Next message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Great thanks. That was clear and the takeaway is that I should pay attention > to the number of memory channels per core (which may be less than 1.0) I think the takeaway is a bit more acute: if your code is cache-friendly, simply pay attention to cores * clock * flops/cycle. otherwise (ie, when your models are large), pay attention to the "balance" between observed memory bandwidth and peak flops. the stream benchmark is a great way to do this, and has traditionally promulgated the "balance" argument. here's an example: http://www.cs.virginia.edu/stream/stream_mail/2007/0001.html basically, 13 GB/s for a 2x2 opteron/2.8 system (peak flops would be 2*2*2*2.8=22.4, so you need 1.7 flops per byte to be happy. I don't have a report handy for core2, but iirc, people report hitting a wall of around 9 GB/s for any dual-FSB core2 system. assuming dual-core parts like the paper, peak theoretical flops is 37 GFlops, for a balance of just over 4. that ratio should really be called "imbalance" ;) quad-core would be worse, of course.
- Previous message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Next message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
