energy costs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caWed Mar 12 07:23:50 PST 2003
- Previous message: energy costs
- Next message: energy costs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> >> PS Pentium 4 sustained performance from memory is about > >> 5% of peak (stream triad). > > > >that should be 50%, I think. > > Nope ... not "from memory". > > A 2.8 GHz P4 using SSE2 instructions can deliver two > 64-bit floating point results per clock or 5.6 Gflops > peak performance at this clock. The stream triad (a > from-memory, multiply-add operation) for a 2.8 GHz > P4 produces only 200 Mflops (see stream website). The > arithmetic is then: > > 200/5600 = .0357 or 3.57% (so 5% is a gift) oh, I see. to me, that's a strange definition of "peak", since stream is, by intention, always bottlenecked on memory bandwidth, since its FSB is either 3.2 or 4.3 GB/s. it'll deliver roughly 50% of that to stream. > As you suggest, the P4 will (as does the Cray X1) do > significantly better when cache use/re-use is a > significant factor. no, it's not a matter of reuse, but what you consider "peak". I think the real take-home message is that this sort of fraction-of-theoretical-peak is useless, and you need to look at the actual numbers, possibly scaled by price. as a matter of fact, I'm always slightly puzzled by this sort of conversation. yes, crays and vector computers in general are big/wide memory systems with a light scattering of ALU's. a much different ratio than the "cache-based" computing world. but if your data is huge and uniform, don't you win big by partitioning (data or work grows as dim^2, but communication at partitions scaling much slower)? that would argue, for instance, that you should run on a cluster of e7205 machines, where each node delivers a bit more than the 200 Gflops above under $2k, and should scale quite nicely until your interconnect runs out of steam, say, several hundred CPUs. the point is really that stream-like codes are almost embarassingly parallel. so what's the cost per stream-triad gflop from Cray?
- Previous message: energy costs
- Next message: energy costs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
