[Beowulf] Re: vectors vs. loops
josip at lanl.gov
Wed May 4 08:19:35 PDT 2005
Vincent Diepeveen wrote:
> You shift the bandwidth problem of the expensive network in that case to
> the processor itself.
That may work for games, but not for everyone. A common operation like
C = A + B
is very fast when A, B, and C are small enough to fit into the cache
simultaneously. However, for scientific computing, the size of these
vectors could be 1 GB each (per CPU!), and the problem is memory
bandwidth bound. Today's memory bandwidths cannot support full CPU
speed on a problem like this.
A fact of life in scientific computing, e.g. CFD, is that the workload
resembles "C=A+B". People try to get better reuse of data in cache, but
there is only so much that an algorithm will allow. Thus, memory (and
network) bandwidths remain the main bottleneck.
More information about the Beowulf