[Beowulf] Benchmark between Dell Poweredge 1950 And 1435
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Peter St. John peter.st.john at gmail.comThu Mar 8 11:25:29 PST 2007
- Previous message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Next message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark, Thanks, that led me (with a bit of wandering) to e.g. http://www.cs.virginia.edu/stream/top20/Balance.html. My immediate concern is for an app that is worse than embarassingly parallel; it can't (currently) trade memory for time, and can't really use any memory or network effectively, by the list's standards. Basically I want a zillion CPUs and they can communicate by crayon on postcard. That's not practical, but my initial valuator is just GHz/$. I care about the memory sharing and message passing efficiency issues only in that I want to smarten up my app to take advantage of other economies. Peter On 3/8/07, Mark Hahn <hahn at mcmaster.ca> wrote: > > > Great thanks. That was clear and the takeaway is that I should pay > attention > > to the number of memory channels per core (which may be less than 1.0) > > I think the takeaway is a bit more acute: if your code is cache-friendly, > simply pay attention to cores * clock * flops/cycle. > > otherwise (ie, when your models are large), pay attention to the "balance" > between observed memory bandwidth and peak flops. > > the stream benchmark is a great way to do this, and has traditionally > promulgated the "balance" argument. here's an example: > > http://www.cs.virginia.edu/stream/stream_mail/2007/0001.html > > basically, 13 GB/s for a 2x2 opteron/2.8 system (peak flops would > be 2*2*2*2.8=22.4, so you need 1.7 flops per byte to be happy. > > I don't have a report handy for core2, but iirc, people report hitting > a wall of around 9 GB/s for any dual-FSB core2 system. assuming dual-core > parts like the paper, peak theoretical flops is 37 GFlops, for a balance > of just over 4. that ratio should really be called "imbalance" ;) > quad-core would be worse, of course. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070308/f81261ce/attachment.html
- Previous message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Next message: [Beowulf] Benchmark between Dell Poweredge 1950 And 1435
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
