[Beowulf] Re: vectors vs. loops

Josip Loncaric josip at lanl.gov
Wed May 4 08:19:35 PDT 2005

Vincent Diepeveen wrote:
> You shift the bandwidth problem of the expensive network in that case to
> the processor itself.

That may work for games, but not for everyone.  A common operation like

C = A + B

is very fast when A, B, and C are small enough to fit into the cache 
simultaneously.  However, for scientific computing, the size of these 
vectors could be 1 GB each (per CPU!), and the problem is memory 
bandwidth bound.  Today's memory bandwidths cannot support full CPU 
speed on a problem like this.

A fact of life in scientific computing, e.g. CFD, is that the workload 
resembles "C=A+B".  People try to get better reuse of data in cache, but 
there is only so much that an algorithm will allow.  Thus, memory (and 
network) bandwidths remain the main bottleneck.


