[Beowulf] Benchmark between Dell Poweredge 1950 And 1435
Peter St. John
peter.st.john at gmail.com
Thu Mar 8 11:25:29 PST 2007
Thanks, that led me (with a bit of wandering) to e.g.
My immediate concern is for an app that is worse than embarassingly
parallel; it can't (currently) trade memory for time, and can't really use
any memory or network effectively, by the list's standards. Basically I want
a zillion CPUs and they can communicate by crayon on postcard. That's not
practical, but my initial valuator is just GHz/$.
I care about the memory sharing and message passing efficiency issues only
in that I want to smarten up my app to take advantage of other economies.
On 3/8/07, Mark Hahn <hahn at mcmaster.ca> wrote:
> > Great thanks. That was clear and the takeaway is that I should pay
> > to the number of memory channels per core (which may be less than 1.0)
> I think the takeaway is a bit more acute: if your code is cache-friendly,
> simply pay attention to cores * clock * flops/cycle.
> otherwise (ie, when your models are large), pay attention to the "balance"
> between observed memory bandwidth and peak flops.
> the stream benchmark is a great way to do this, and has traditionally
> promulgated the "balance" argument. here's an example:
> basically, 13 GB/s for a 2x2 opteron/2.8 system (peak flops would
> be 2*2*2*2.8=22.4, so you need 1.7 flops per byte to be happy.
> I don't have a report handy for core2, but iirc, people report hitting
> a wall of around 9 GB/s for any dual-FSB core2 system. assuming dual-core
> parts like the paper, peak theoretical flops is 37 GFlops, for a balance
> of just over 4. that ratio should really be called "imbalance" ;)
> quad-core would be worse, of course.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf