[Beowulf] Cell

Vincent Diepeveen diep at xs4all.nl
Wed Apr 27 14:39:21 PDT 2005


A raid5 array of 2 terabyte costs like $2000-$3000 and it can deliver
400-600MB/s i/o hands down when attached to a single machine. So if you
make the 1 tflop processor, there is no need to worry!

I'll have to see the first network that can deliver that to a beowulf at
each processor simultaneously (and more importantly at what price).

At the origin3800 i managed to get an i/o streaming speed of 10MB/s. That
was not because the harddisks couldn't deliver faster i guess, but because
1 processor at 500Mhz processors couldn't handle more data per second.

Let me give an example. If i do a FFT at opteron using GMP library 
from a number of 10 million bits times 10 million bits using FFT, then
that's taking 30 seconds. 

Now you can argue that GMP is a slow library and some clever programmer
might be able to speed it up factor 3. I heard that before.

My point is that 10 million bits is less than 2 megabyte.

So a single calculation at 2 megabyte takes 30 seconds.

Anything that has to do with huge calculations is in the first place cpu
power limited. Not anything else.

Big RAM is nice to have for most clever algorithms, but it is second most
important. CPU power is most important. If there is some bottleneck that
limits the RAM we have, do not worry!

We will find a solution!

The real bottleneck is in the end the number of instructions a cpu can
process a second.

Only after that is solved the other parameters are interesting to optimize :)

Vincent

b.t.w. in highend i am always very dissappointed if i browse homepages and
i do not find prices to it attached. how do you guys plan to sell products
without quoting online the price something costs, if i would be doing that
i would lose 95% of all my possible clients. when no price they say: "skip".

At 11:29 AM 4/27/2005 -0700, Michael Will wrote:
>Vincent Diepeveen wrote:
>
>>At 11:07 AM 4/27/2005 -0500, Ben Mayer wrote:
>>  
>>
>>>On 4/27/05, Vincent Diepeveen <diep at xs4all.nl> wrote:
>>>    
>>>
>>>>At 06:45 PM 4/26/2005 -0400, Mark Hahn wrote:
>>>>      
>>>>
>>>>>>Obviously clever governments, who currently have giants of
>>>>>>          
>>>>>>
>>supercomputers
>>  
>>
>>>>>>which costs several million, will conclude they can buy a few cheapo
>>>>>>          
>>>>>>
>>cell
>>  
>>
>>>>>>processor machines which do more work than the entire system currently.
>>>>>>          
>>>>>>
>>>>>this is ridiculous.  the Cell is basically a GPU - slightly more general
>>>>>than the current-gen GPUs from Nvidia and ATI, but not drastically
>>>>>        
>>>>>
>>different.
>>  
>>
>>>>Cell is from my viewpoint a vector floating point processor which only
>>>>disadvantage is executing branchy code.
>>>>
>>>>Just like Cray machines were in the past vector processors.
>>>>      
>>>>
>>>Their current machine (X1E) is a vector machine. The problem on that
>>>machine is that the code needs to vectorize. You can do it with the
>>>compiler or libraries, but it HAS to vectorize to get that
>>>performance. Cray's current machines depend on the compiler and highly
>>>trained humans writing code (they have some libraries for specific
>>>things like sequence alignment) to make things run faster then a
>>>Pentium 4. Granted when they do run faster, it is a 32-64x speed up
>>>*per CPU*.
>>>
>>>The people writing code for the PS3 (Cell) are going to have some
>>>experience writing parallel vector code because that is what the PS2
>>>was. But I will be very surprised if they can consistently get more
>>>then 10% of peak.
>>>
>>>
>>>    
>>>
>>>>A gpu doing effectively 256 gflop for just a few dollar would be nice.
>>>>      
>>>>
>>>GPUs are often time doing calcs at half precision.
>>>
>>>    
>>>
>>>>See supercomputer reports europe.
>>>>
>>>>So there is a BIG need for a CHEAP vector processor doing floating point
>>>>there.
>>>>      
>>>>
>>>Is it processors that they need or bandwidth?
>>>    
>>>
>>
>>If you can deliver 1 processor that can do 1 tflop, there is no need for
>>bandwidth anymore, everything happens on that chip in such a case :)
>>  
>>
>And of course it has an infinite amount of memory in there too, with a 
>telepatic connection to
>the storage device to glean what might be the next thing to compute. It 
>won't send anything
>out either, but we assume the answer is 42.
>
>Michael
>
>>"If you were plowing a field, which would you rather use? Two strong oxen
>>or 1024 chickens?"
>>  Seymour Cray
>>
>>Vincent
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>>  
>>
>
>
>-- 
>Michael Will
>Penguin Computing Corp.
>Sales Engineer
>415-954-2887
>415-954-2899 fx
>mwill at penguincomputing.com 
>Visit us at the following Linux Shows!
>
>ClusterWorld '05
>Clarion Hotel, Milbrae, CA
>May 16th-18th, 2005
>Booth 7025
>
>Bio-IT World Conference and Expo '05
>Hynes Convention Center, Boston, MA
>May 17th-19th, 2005
>Booth 201 
>
>
>



More information about the Beowulf mailing list