Top 500 trends

Wed Nov 27 11:42:42 PST 2002

Mark Hahn:
> stream is a piece of source code.  how the compiler/runtime actually
> implements daxpy is completely free, and certainly does not require a
> single address space.  therefore, it's quite reasonable to talk about the
> Stream score for a loosely coupled cluster.  stream is almost the worst
> possible kind of code to run on a cluster, though, simply because it has
> such a low work:bandwidth ratio.

The numbers don't change if you do this because I quoted the per-CPU numbers 
of a fully-loaded node.

> IMO, a benchmark appropriate for SMP would necessarily measure inter-CPU
> latency, somehow, and stream does not.  I always ignore multiprocessor
> stream results, or else look strictly at the scaling of their per-cpu
> scores as the machine gets bigger.

I don't understand what you mean with "inter-CPU-latency". MPI message latency 
for intra-node communication is about the same for all SMPs with a decent MPI 
implementation (a few us) - if it that what you mean. 

And again: the per-CPU numbers I quoted *are* for fully loaded nodes (8 CPUs 
on SX-6 node, 2 CPUs on Xeon node).

> a "cutting edge chicken" would be a uniprocessor P4/fsb533/dual-PC2700,
> delivering (as a guess) a little under 3 GBps/CPU.

I would rather measure than guess. I'd be surprised to see a bandwidth 
increase by a factor three in less than a years time.

> > The SX-5 had even higher memory bandwidth, but in turn, the SX-6 is has
> > become more cost- and energy-efficient.
>
> the 3 Gflop chicken would dissipate around 200W; I am guessing the SX-6
> dissipates more than 25/3*200=1.7 KW, no?

I compared the cost- and energy-efficiency of SX-5 and SX-6. And you shouldn't 
mix Gflop with GBps - 3GBps give you at most (!) 3/8Gflop/s. 

Please don't get me wrong: I don't say everybody should buy vector machines. 
But it is important to understand that (and why) certain codes run with such 
a bad efficiency on PC clusters - while they surely do a nice job for many 
applications and are affordable for many more people than a vector machine. I 
use them and develop for them as well.

  Joachim

-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de