Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

energy costs

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at physics.mcmaster.ca
Wed Mar 12 07:23:50 PST 2003


> >> PS Pentium 4 sustained performance from memory is about
> >>    5% of peak (stream triad).
> >
> >that should be 50%, I think.
> 
> Nope ... not "from memory".
> 
> A 2.8 GHz P4 using SSE2 instructions can deliver two
> 64-bit floating point results per clock or 5.6 Gflops
> peak performance at this clock.  The stream triad (a 
> from-memory, multiply-add operation) for a 2.8 GHz 
> P4 produces only 200 Mflops (see stream website). The 
> arithmetic is then:
> 
> 200/5600 = .0357 or 3.57% (so 5% is a gift)

oh, I see.  to me, that's a strange definition of "peak",
since stream is, by intention, always bottlenecked on 
memory bandwidth, since its FSB is either 3.2 or 4.3 GB/s.
it'll deliver roughly 50% of that to stream.

> As you suggest, the P4 will (as does the Cray X1) do 
> significantly better when cache use/re-use is a 
> significant factor.

no, it's not a matter of reuse, but what you consider "peak".

I think the real take-home message is that this sort of 
fraction-of-theoretical-peak is useless, and you need to look
at the actual numbers, possibly scaled by price.

as a matter of fact, I'm always slightly puzzled by this sort
of conversation.  yes, crays and vector computers in general 
are big/wide memory systems with a light scattering of ALU's.
a much different ratio than the "cache-based" computing world.

but if your data is huge and uniform, don't you win big by 
partitioning (data or work grows as dim^2, but communication
at partitions scaling much slower)?  that would argue, for instance,
that you should run on a cluster of e7205 machines, where each node
delivers a bit more than the 200 Gflops above under $2k, and should 
scale quite nicely until your interconnect runs out of steam, 
say, several hundred CPUs.  the point is really that stream-like 
codes are almost embarassingly parallel.

so what's the cost per stream-triad gflop from Cray?




More information about the Beowulf mailing list