Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Top 500 trends

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at physics.mcmaster.ca
Wed Nov 27 11:26:24 PST 2002


> STREAM bandwidth is a performance characteristic: it's the bandwidth that a 
> single processor achieves with the STREAM benchmark. It's not an application.

stream is a piece of source code.  how the compiler/runtime actually 
implements daxpy is completely free, and certainly does not require a single
address space.  therefore, it's quite reasonable to talk about the Stream
score for a loosely coupled cluster.  stream is almost the worst possible 
kind of code to run on a cluster, though, simply because it has such a low 
work:bandwidth ratio.

IMO, a benchmark appropriate for SMP would necessarily measure inter-CPU 
latency, somehow, and stream does not.  I always ignore multiprocessor stream
results, or else look strictly at the scaling of their per-cpu scores as the
machine gets bigger. 

> To illustrate: on an SX-6, this is in the range of 25 GB/s/CPU on a 8-CPU 
> node. A Pentium-4/Xeon Dual-SMP node get's about 0,5 GB/s/CPU (E7500 chipset 
> - which has dual channel RAM, IIRC). This alone gives a performance advantage 
> of about a factor 20-40 if not inside the caches, which shows in the MFLOPS 
> efficiency (achieved vs. peak) of many codes (the ones which can be 
> vectorized). 

a "cutting edge chicken" would be a uniprocessor P4/fsb533/dual-PC2700,
delivering (as a guess) a little under 3 GBps/CPU.

> The SX-5 had even higher memory bandwidth, but in turn, the SX-6 is has become 
> more cost- and energy-efficient.

the 3 Gflop chicken would dissipate around 200W; I am guessing the SX-6
dissipates more than 25/3*200=1.7 KW, no?




More information about the Beowulf mailing list