Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Cell

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Vincent Diepeveen diep at xs4all.nl
Wed Apr 27 16:07:35 PDT 2005


At 06:46 PM 4/27/2005 -0400, Joe Landman wrote:
>
>
>Vincent Diepeveen wrote:
>> A raid5 array of 2 terabyte costs like $2000-$3000 and it can deliver
>> 400-600MB/s i/o hands down when attached to a single machine. So if you
>> make the 1 tflop processor, there is no need to worry!
>
>I need to find out where you are getting your raids...

I'm no major expert here, but i plan to buy a cheap 400 euro
raid-5 from 3ware. The shop has no other raid cards than that, so perhaps
there is better brands out there, but even real cheapo S-ATA it gets
400MB/s readspeed at a slow P4 2.4Ghz says their homepage.

If you look further perhaps you'll find a tad faster.

Not sure for you, for me only readspeed matters. Writing speed less
important. Use Raid 0 for that and now and then backup the data :)

So my viewpoint still hasn't changed. 

Get a cheap single cpu machine. Get a cheapo 400 euro S-ATA controller. Get
a bunch of cheak S-ATA disks. 

Get 400MB/s+ a second i/o speed from it and all what you need is a few
memory banks with expensive memory and the virtual 1 Thz processor.

Oh and all i want it to output is 42 anyway :)

>[...]
>
>> Anything that has to do with huge calculations is in the first place cpu
>> power limited. Not anything else.
>
>There is a statement I like to make when I see comments like this.
>
>"Gross generalizations tend to be incorrect".
>
>If you think about it long enough, you can see the recursive humor.
>
>There are many different factors that will affect the overall 
>performance of a machine on a particular code/data set.  To illustrate 
>this, I often suggested the following gedankenexperiment.
>
>Imagine you have a CPU that is infinitely fast, coupled to resources 
>that are not infinitely fast.  This means that while operations take 
>exactly 0 time on the CPU, we haven't done a thing to make the memory or 
>IO faster.  In this gedankenexperiment, how much of a speedup do you get 
>from an infinitely fast CPU?  Memory moves still take time.  Data 
>loading and storing still takes time.  Data motion is quickly becoming 
>one of the (if not the) most critical aspect of performance for a fair 
>number of calculations.  So unless all parts are infinitely fast, you 
>still have to pay for the data motion time, the IO time, the memory-> 
>memory time, the memory->CPU time (and CPU to memory time).
>
>In short, an infinitely fast CPU would reduce the execution time of 
>(possibly significantly) a  class of applications that are only CPU 
>bound (say operating out of internal cache only).
>
>It will do very little for a code which is IO or memory bandwidth or 
>latency bound.
>
>> Big RAM is nice to have for most clever algorithms, but it is second most
>> important. CPU power is most important. If there is some bottleneck that
>> limits the RAM we have, do not worry!
>> 
>> We will find a solution!
>> 
>> The real bottleneck is in the end the number of instructions a cpu can
>> process a second.
>
>Not really.  The bottleneck in performance is how full you can keep the 
>multiple pipelines of the processor.  Branch statements tend to force 
>pipeline flushes.  You can "handle" this with speculative execution. 
>Real memory accesses can bottleneck the memory subsystem, so real 
>processors allow specific mixtures of instructions in flight at once to 
>reduce resource contention.  If you overflow any of the fixed CPU 
>resources, you can stall a pipeline while waiting for the contention to 
>be eliminated, or you can stall the entire CPU while flushing TLB and 
>other shared resources.  Basically you have multiple simultaneous zero 
>sum games (fixed number of operations per unit time, specific mixtures 
>of operations that maximize the performance of instructions in flight). 
>  Compilers are, as I indicated before, not particularly smart in most 
>cases, and they generate code locally that might not make sense 
>globally.  Moreover, how instructions are ordered and presented to the 
>CPU will fundamentally impact the overall performance.  Code optimizers 
>are, in a large sense, an attempt to better fit the emitted instructions 
>to the processor architecture, by rewriting loops, mathematical 
>constructs, and related.  Optimizers are not perfect.
>
>Some architectures are pretty much impossible to write optimal code for 
>(turns out to be NP-hard), and you have to accept a set of compromises 
>at some point to avoid having your compilation take 24 hours (my MD 
>codes used to take about 24 hours to build on a Trace Multiflow, VLIW 
>architecture).
>
>The overall point of this is
>
>a) writing good code is hard
>b) writing fast code is harder
>c) CPUs don't automagically make things faster, compilers are implicated 
>in this mess
>d) some optimizers are better left off :(
>
>-- 
>Joseph Landman, Ph.D
>Founder and CEO
>Scalable Informatics LLC,
>email: landman at scalableinformatics.com
>web  : http://www.scalableinformatics.com
>phone: +1 734 786 8423
>fax  : +1 734 786 8452
>cell : +1 734 612 4615
>
>
>



More information about the Beowulf mailing list