energy costs

Richard Walsh rbw at
Wed Mar 12 08:34:38 PST 2003

Mark H. wrote:

>> >> PS Pentium 4 sustained performance from memory is about
>> >>    5% of peak (stream triad).
>> >
>> >that should be 50%, I think.
>> Nope ... not "from memory".
>> A 2.8 GHz P4 using SSE2 instructions can deliver two
>> 64-bit floating point results per clock or 5.6 Gflops
>> peak performance at this clock.  The stream triad (a 
>> from-memory, multiply-add operation) for a 2.8 GHz 
>> P4 produces only 200 Mflops (see stream website). The 
>> arithmetic is then:
>> 200/5600 = .0357 or 3.57% (so 5% is a gift)
>oh, I see.  to me, that's a strange definition of "peak",
>since stream is, by intention, always bottlenecked on 
>memory bandwidth, since its FSB is either 3.2 or 4.3 GB/s.
>it'll deliver roughly 50% of that to stream.

 Not strange, reliable and consistent. Peak is always what 
 the processor's floating-point core can deliver without 
 data delivery bottlenecks. 

 It is also as you suggest a "marketing number".  The 
 stream triad performance defines another pole (a sort 
 of sea-level, far beneath peak, as you like) within which 
 the real-world performance of most real code will sit.

>> As you suggest, the P4 will (as does the Cray X1) do 
>> significantly better when cache use/re-use is a 
>> significant factor.
>no, it's not a matter of reuse, but what you consider "peak".
>I think the real take-home message is that this sort of 
>fraction-of-theoretical-peak is useless, and you need to look
>at the actual numbers, possibly scaled by price.

 "useless", I like the ring of that ;-) ... not so if the ratio 
 of flops to mops in your kernels is low (stream triad is 
 .667). It sets a floor for out-of-the-box performance from
 you may be able to raise your particular codes performance.
 You can almost always get more with cache twiddling/blocking. 

>as a matter of fact, I'm always slightly puzzled by this sort
>of conversation.  yes, crays and vector computers in general 
>are big/wide memory systems with a light scattering of ALU's.
>a much different ratio than the "cache-based" computing world.
>but if your data is huge and uniform, don't you win big by 
>partitioning (data or work grows as dim^2, but communication
>at partitions scaling much slower)?  that would argue, for instance,
>that you should run on a cluster of e7205 machines, where each node
>delivers a bit more than the 200 Gflops above under $2k, and should 
>scale quite nicely until your interconnect runs out of steam, 
>say, several hundred CPUs.  the point is really that stream-like 
>codes are almost embarassingly parallel.

 Right. This is a key (perhaps last, along with SSI) point of impact 
 between custom and commdity HPC systems products. By clustering you 
 are buying distributed bandwidth ... whether it is useable for your 
 code ... depends ... if your code has aready been modified for 
 message passing, it message passing cycles can be hidden behind
 computation, it has a nicely blockable foot print, it is not too 
 latency dependent, does not run so long that it needs check-pointing 
 to ensure its completion in a cluster enviroment.  

 There are folks with the cash and in these situations. 

>so what's the cost per stream-triad gflop from Cray?

 Le coupe de grace ... oui? ... but, 3 year TCOs of very large
 clusters with non-COTS, interconnects, and utilization factored
 in (large, like ala PNNL) are closer to those of the Cray X1 than 
 you might think. Doing 3-year TCO calculations is like massaging 
 the fat lady (finding a true global minimum at a given site ain't 
 that simple) and up being driven by local politics, so I am not 
 going to give you our site-specific/prejudiced numbers here ;-), 
 but I think that in certain markets and at certain sites Cray 
 likes their odds ... so does the bubble-wary stock market ...
 a 300% percent gain on their stock in the last year. (I don't
 own any ;-) )


# Richard Walsh
# Project Manager, Cluster Computing, Computational
#                  Chemistry and Finance
# netASPx, Inc.
# 1200 Washington Ave. So.
# Minneapolis, MN 55415
# VOX:    612-337-3467
# FAX:    612-337-3400
# EMAIL:  rbw at, richard.walsh at
#         rbw at
# "Beware, the shifting center of one's solar system.
#  Today's religion/truth is some tomorrow's historical 
#  footnote."
#                                  -Max Headroom

More information about the Beowulf mailing list