[Beowulf] Cell

Wed Apr 27 03:51:36 PDT 2005

At 06:45 PM 4/26/2005 -0400, Mark Hahn wrote:
>> Obviously clever governments, who currently have giants of supercomputers
>> which costs several million, will conclude they can buy a few cheapo cell
>> processor machines which do more work than the entire system currently.
>
>this is ridiculous.  the Cell is basically a GPU - slightly more general
>than the current-gen GPUs from Nvidia and ATI, but not drastically different.

Cell is from my viewpoint a vector floating point processor which only
disadvantage is executing branchy code. 

Just like Cray machines were in the past vector processors.

>similar to GPUs, it is of interest to small niches of people who do

A gpu doing effectively 256 gflop for just a few dollar would be nice.

>non-graphics computing.  for instance, Cell might be very good for people
>doing some kinds of MD or dense-cluster cosmology.  it's *NOT* going to 
>change the supercomputing industry.
>
>actually, I'm pretty disappointed with Cell.  from the rumors it's gotten
>in preceeding years, I thought they were going to do something truely 
>interesting, scalable, novel.  but it's just replicated func-units treated
>as a coprocessor to otherwise off-the-shelf parts.  yes, some slightly 
>exotic (the rambus angle - but NV/ATI use gddr3 today which is also exotic
>but technically off-the-shelf.)  no processor-in-memory innovation,
>no clever scalable network, etc.

The advantage of cell might be that it is not a new design idea. It's
technology is completely tested in the past.

>> To give example the dutch government has a 2.2 tflop 416 processor itanium2
>> 1.3Ghz. I do not argue either that it is 2.2 tflop, just like i do not
>> argue that a cell processor delivers > 0.25 gflop on its own.
>
>tflops (alone) is a stupid way to measure computers.  that's the real point:
>tflops need to be balanced with good memory bandwidth, latency, capacity,
>good interconnect, good IO.  not to mention good tools and libraries.

I quoted HARD statistics that in entire Europe, all those expensive
supercomputers are for far over 50% of their system time busy with ideal
embarrassingly parallel vectorizable floating point. Especially matrix
calculations and invariant calculations. That is just a matter of a
multiply and an add at most. All floating point.

See supercomputer reports europe.

So there is a BIG need for a CHEAP vector processor doing floating point
there.

Obviously i would love even more a 8 core opteron doing 4 instructions a
cycle for my own diep chess program. 

>besides, Cell appears to be more like 25 Gflops (doubles, of course):
>http://www.mdronline.com/mpr_public/editorials/edit19_09.html

You must compare intel, AMD and also Cell in the same way. If that intel
c++ compiler default optimizes floats as single precision floats not
following ansi-c standards (as i have proven in different hardware groups
already and what i found out disgusted when i was busy optimizing some
floating point software at itanium2), then obviously cell processor must be
using the same standards.

It's getting 256 gflop single precision then.

Do not measure them with other standards than your own benchmarks measure you.

>> Just 10 cell processors are faster than the theoretical peak a 416
>> processor machine delivers. This for far under 1/1000 of the price that
>> this machine has cost.

>yeah, I bet SGI is just kicking themselves for not realizing they could just
>clamp a few numalink4 cables onto a Cell chip...

Nah, they will have to wait for intels CELL i guess.

Show me objective test data about numalink4 by the way, can't find it at
internet. 

I want one way pingpong latencies of it from this processor to a remote
processor 512, and not theoretical bandwidth that doesn't take into account
the practical limitations of 1 or more components in the network. :)