energy costs

Richard Walsh rbw at
Wed Mar 12 06:42:50 PST 2003

Mark Hahn wrote:

>> PS Pentium 4 sustained performance from memory is about
>>    5% of peak (stream triad).
>that should be 50%, I think.

Nope ... not "from memory".

A 2.8 GHz P4 using SSE2 instructions can deliver two
64-bit floating point results per clock or 5.6 Gflops
peak performance at this clock.  The stream triad (a 
from-memory, multiply-add operation) for a 2.8 GHz 
P4 produces only 200 Mflops (see stream website). The 
arithmetic is then:

200/5600 = .0357 or 3.57% (so 5% is a gift)

This is a worse-case, from-memory, scenario which 
include for little or no cache re-use. It asks the 
question, "what part of peak can my memory sub-system
sustain?" On the Cray X1 (and most vector machines), 
the same worst-case, from-memory, scenario yields 25% 
of peak. This is why Cray is still making custom vector 

As you suggest, the P4 will (as does the Cray X1) do 
significantly better when cache use/re-use is a 
significant factor.


# Richard Walsh
# Project Manager, Cluster Computing, Computational
#                  Chemistry and Finance
# netASPx, Inc.
# 1200 Washington Ave. So.
# Minneapolis, MN 55415
# VOX:    612-337-3467
# FAX:    612-337-3400
# EMAIL:  rbw at, richard.walsh at
#         rbw at
# "Without mystery, there can be no authority."
#                                  -Charles DeGaulle

More information about the Beowulf mailing list