[Beowulf] Cell

Wed Apr 27 15:46:54 PDT 2005

Vincent Diepeveen wrote:
> A raid5 array of 2 terabyte costs like $2000-$3000 and it can deliver
> 400-600MB/s i/o hands down when attached to a single machine. So if you
> make the 1 tflop processor, there is no need to worry!

I need to find out where you are getting your raids...

[...]

> Anything that has to do with huge calculations is in the first place cpu
> power limited. Not anything else.

There is a statement I like to make when I see comments like this.

"Gross generalizations tend to be incorrect".

If you think about it long enough, you can see the recursive humor.

There are many different factors that will affect the overall 
performance of a machine on a particular code/data set.  To illustrate 
this, I often suggested the following gedankenexperiment.

Imagine you have a CPU that is infinitely fast, coupled to resources 
that are not infinitely fast.  This means that while operations take 
exactly 0 time on the CPU, we haven't done a thing to make the memory or 
IO faster.  In this gedankenexperiment, how much of a speedup do you get 
from an infinitely fast CPU?  Memory moves still take time.  Data 
loading and storing still takes time.  Data motion is quickly becoming 
one of the (if not the) most critical aspect of performance for a fair 
number of calculations.  So unless all parts are infinitely fast, you 
still have to pay for the data motion time, the IO time, the memory-> 
memory time, the memory->CPU time (and CPU to memory time).

In short, an infinitely fast CPU would reduce the execution time of 
(possibly significantly) a  class of applications that are only CPU 
bound (say operating out of internal cache only).

It will do very little for a code which is IO or memory bandwidth or 
latency bound.

> Big RAM is nice to have for most clever algorithms, but it is second most
> important. CPU power is most important. If there is some bottleneck that
> limits the RAM we have, do not worry!
> 
> We will find a solution!
> 
> The real bottleneck is in the end the number of instructions a cpu can
> process a second.

Not really.  The bottleneck in performance is how full you can keep the 
multiple pipelines of the processor.  Branch statements tend to force 
pipeline flushes.  You can "handle" this with speculative execution. 
Real memory accesses can bottleneck the memory subsystem, so real 
processors allow specific mixtures of instructions in flight at once to 
reduce resource contention.  If you overflow any of the fixed CPU 
resources, you can stall a pipeline while waiting for the contention to 
be eliminated, or you can stall the entire CPU while flushing TLB and 
other shared resources.  Basically you have multiple simultaneous zero 
sum games (fixed number of operations per unit time, specific mixtures 
of operations that maximize the performance of instructions in flight). 
  Compilers are, as I indicated before, not particularly smart in most 
cases, and they generate code locally that might not make sense 
globally.  Moreover, how instructions are ordered and presented to the 
CPU will fundamentally impact the overall performance.  Code optimizers 
are, in a large sense, an attempt to better fit the emitted instructions 
to the processor architecture, by rewriting loops, mathematical 
constructs, and related.  Optimizers are not perfect.

Some architectures are pretty much impossible to write optimal code for 
(turns out to be NP-hard), and you have to accept a set of compromises 
at some point to avoid having your compilation take 24 hours (my MD 
codes used to take about 24 hours to build on a Trace Multiflow, VLIW 
architecture).

The overall point of this is

a) writing good code is hard
b) writing fast code is harder
c) CPUs don't automagically make things faster, compilers are implicated 
in this mess
d) some optimizers are better left off :(

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615