Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Top 500 trends

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Joachim Worringen joachim at ccrl-nece.de
Wed Nov 27 11:42:42 PST 2002


Mark Hahn:
> stream is a piece of source code.  how the compiler/runtime actually
> implements daxpy is completely free, and certainly does not require a
> single address space.  therefore, it's quite reasonable to talk about the
> Stream score for a loosely coupled cluster.  stream is almost the worst
> possible kind of code to run on a cluster, though, simply because it has
> such a low work:bandwidth ratio.

The numbers don't change if you do this because I quoted the per-CPU numbers 
of a fully-loaded node.

> IMO, a benchmark appropriate for SMP would necessarily measure inter-CPU
> latency, somehow, and stream does not.  I always ignore multiprocessor
> stream results, or else look strictly at the scaling of their per-cpu
> scores as the machine gets bigger.

I don't understand what you mean with "inter-CPU-latency". MPI message latency 
for intra-node communication is about the same for all SMPs with a decent MPI 
implementation (a few us) - if it that what you mean. 

And again: the per-CPU numbers I quoted *are* for fully loaded nodes (8 CPUs 
on SX-6 node, 2 CPUs on Xeon node).

> a "cutting edge chicken" would be a uniprocessor P4/fsb533/dual-PC2700,
> delivering (as a guess) a little under 3 GBps/CPU.

I would rather measure than guess. I'd be surprised to see a bandwidth 
increase by a factor three in less than a years time.

> > The SX-5 had even higher memory bandwidth, but in turn, the SX-6 is has
> > become more cost- and energy-efficient.
>
> the 3 Gflop chicken would dissipate around 200W; I am guessing the SX-6
> dissipates more than 25/3*200=1.7 KW, no?

I compared the cost- and energy-efficiency of SX-5 and SX-6. And you shouldn't 
mix Gflop with GBps - 3GBps give you at most (!) 3/8Gflop/s. 

Please don't get me wrong: I don't say everybody should buy vector machines. 
But it is important to understand that (and why) certain codes run with such 
a bad efficiency on PC clusters - while they surely do a nice job for many 
applications and are affordable for many more people than a vector machine. I 
use them and develop for them as well.

  Joachim
 
-- 
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de




More information about the Beowulf mailing list