[Beowulf] Re: vectors vs. loops

Thu May 5 09:22:25 PDT 2005

Well said RGB!

I view the success of Beowulf clusters in this light:
The ability to use parallel processing essentially turns the problem
of maximizing performance, into the problem of maximizing the
performance/dollar ratio of the building blocks of your parallel machine.
(Replace "dollar" with "watt" or "square feet" depending on your
true budget limits.)
It also helps to have a cool name for this... Spareparticus didn't
catch on... ;-)

And for a neat (but imperfect) tool for seeing what kind of COTS
cluster fits your application needs and various budget constraints,
check out my colleagues Cluster Design Rules expert system:
http://aggregate.org/CDR/

Make sure you pick a more recent parameter/price data set than
the default.  (The "Default Parameter Set" menu choice has prices
from 2003 when I checked it today... that's supposed to be updated
soon, but there is a new set from a few weeks ago you can pick.)

P.S. - Googling for Sparparticus will find among other things this
article:
http://www.research.uky.edu/odyssey/spring02/supercomputers.html

On 5/5/05, Robert G. Brown <rgb at phy.duke.edu> wrote:
> On Thu, 5 May 2005, Vincent Diepeveen wrote:
> 
> > The point they seem to ignore is that a single chip can be cheap.
> 
> Who are "they" and why do you think that they ignore this?  This is the
> whole point of the list.  Not a single person on this list is unaware of
> this; most are here BECAUSE of this.  What is the purpose of such a
> provocative and obviously untrue statement?
> 
> > A beowulf network with 1024 nodes always will cost millions.
> 
> It CAN cost millions, or it CAN cost under a million, in US dollars or
> Euros (hardware cost, exclusive of the infrastructure or human
> management costs).  It depends on the design used and purpose.  In fact
> I'd guess that "most" clusters (gotta stop using that word:-) cost
> roughly $1000-2000 per CPU when all is said and done.  I personally
> think that dual CPU systems (not infrequently armed with dual network
> interfaces) dominate so you have to decide whether the word "node"
> refers to boxes, network interfaces, or CPUs in the cost estimate, and
> have to decide how to split up infrastructure costs fairly in cost
> comparisons that go beyond just the hardware (a point of considerably,
> occassionally spirited, debate).
> 
> > So creating faster chips is most important to keep prices cheap.
> > Complaining about memory-bandwidth is just a side effect and not
> > a convincing reason to not produce faster chips.
> >
> > Or do you disagree with that?
> 
> Look, clearly we're not communicating here.  A modern compute cluster
> (be it beowulf, NOW, Grid style) is this thing some Very Smart People
> (quite a few of them on this list) have developed collectively that
> permits people to harness COTS hardware, which by definition is the mass
> market sweet spot in price/performance.  That is the whole point.  The
> point was made explicitly and can still be read in the original beowulf
> documentation from close to ten years ago.  The point motivated the
> creation of PVM by a team of some of the world's best computer
> scientists (led by Jack Dongarra) back in the early 90's, which in turn
> motivated things like MPICH and LAM mpi and set the stage for the
> beowulf project and the "cluster revolution".  The point has been
> independently "discovered" by hundreds of individuals and groups on
> their own ever since the fifties or sixties, thousands of groups in
> modern times with the help of this list and the memetic resource base it
> helps to self-organize.
> 
> NOBODY on this list thinks that it isn't worthwhile to produce, dream
> of, discuss, design faster chips at ANY level and for nearly any
> purpose.  We are ALL hardware junkies on this list.  It is a requirement
> for membership.  In order to join the beowulf list you have to sign a
> form that says "I am a hardware addict and am not currently enrolled in
> a twelve point plan to kick my habit and solemnly promise to drool over
> every successive Moore's Law generation of hardware in its turn as I
> harness it cheaply in my own personal cluster environment".  Did you
> somehow miss this?
> 
> Most of us are ALSO here because we are de facto ECONOMICS BOUND.  We
> have this horrible lust for unlimited compute power, but only a tiny
> budget.  (Where "tiny" for a true addict equals "my current budget" no
> matter what it is.:-) When somebody announces that they've built a 27
> million dollar computer (cluster or otherwise) somewhere, we have
> momentary, wistful visions of somehow fitting the whole damn thing into
> our garages, or lustful dreams of what WE'D do if WE were given that
> kind of money, just once, because OUR budgets are more likely to be $27
> THOUSAND dollars, or for a lucky few, $270 thousand dollars, or for an
> unlucky few, $2700 dollars.  Lots of us -- MOST of us -- are orders of
> magnitude short of a thousand chickens.
> 
> This tends to keep the list discussion focussed on the sweet spot of
> mass market sanity.  We don't waste time talking about $100K CPUs,
> because for nearly all of us $100K is likely to be all of our computer
> budgets for any given year and then some, and besides, if we HAD $100K
> we would NEVER buy an oxen with it as we can get far, far more work done
> with the chickens the same money would buy.  Not to mention the risk of
> mad oxen disease...
> 
> Besides, those of us who have been in the game for a LONG time have
> learned (in my own case, through bitter experience and very much the
> hard way) that if you DO spend that $100K on a single big-iron system
> that IS for the moment the world's fastest $100K investment you can make
> for your problem, three years later mass market hardware can do even
> more work even faster for a third of the cost, sometimes even a tenth of
> the cost.  I personally have watched $75K single-piece-of-hardware
> systems be sold as "junk" for $3000 when they are four years old because
> one could buy brand new systems that were twice as fast for less than
> the MAINTENANCE cost of the expensive refrigerator-sized hardware.  I
> have lived through Duke spending a huge amount of money on a CM5,
> watched a friend of mine devote two YEARS learning to program the damn
> thing and porting a big project to it to run optimally fast, only to
> have the CM5 sold for a couple of thousand a year or so later by a firm
> interested only in recovering the gold in its contacts -- so little that
> I and several friends were seriously tempted to buy it and PUT it into a
> garage, only we couldn't find a garage with the right kind of electrical
> outlet...;-) Still regret it -- it would have made WAY cool yard art up
> on blocks out front, if nothing else.
> 
> So sure, we'd all LOVE to see 80 gazillion-flop CPUs invented with
> eighteen SIMD pipelines and eight-way branch prediction, with a GB of
> 0.1 nsec SRAM cache, with total power requirements that would let them
> be used in personal digital assistants, as long as they are sold in the
> mass market at price points in between $50 and $1000, (preferrably on
> the lower end of that range) just like the rest of the mass market CPUs.
> I have personally enjoyed each successive generation of Intel CPU from
> the 8088 and IBM PC up through the P4; as well as several excursions
> along the way into efforts by Motorola, by Cyrix, by AMD, by MIPS, by
> Sun.  The Opteron has been lovely, and I currently await with great
> interest multicores and some of the exotic new "building block" mediated
> designs, if any of them prove to be economically viable and make it down
> into the mass market price range with viable linux distributions.
> 
> We'd ALL love to see newer better memory invented (even those of us who
> don't really need it).  For me personally, each new generation of memory
> has been mostly a bust because MY tasks tend to be CPU bound and scale
> strictly with CPU speed almost independent of the speed of the
> underlying memory subsystem, but even I see advantages for certain
> non-research tasks.  Note well:  I personally have historically observed
> no significant advantage even to DDR vs EDO vs SDRAM or to any cache
> size above the measley 128 KB associated with the Celeron I'm typing
> this on -- my task has scaled with Intel P6-class clock across the
> entire P6 family from the 200 MHz Pentium Pro through latest generation
> Celerons and PIII's -- the P4 actually LOSES a bit of ground relative to
> clock because of changes made in the processor balance.  The last few
> generations of AMD processor (Athlon, Opteron, AMD 64) have WON ground
> relative to Intel and relative to pure clock, indicating real advances
> in internal technology.  All for mass market cheap.  What's not to love
> about this?
> 
> However, there are others for whom the task balance is completely
> different.  They drool not over a faster CPU per se, but over those
> faster memory subsystems.  For them faster memory means faster task
> completion times much more directly than a faster CPU.  When they adopt
> a DDR system vs an SDRAM system at constant CPU clock, they see an
> immediate benefit.  There are still others for which neither one makes
> that big a difference, because their task is network bound.  If they
> spend too much on bleeding edge CPUs and memory, those resources just
> have to wait even longer on the network to do each incremental chunk of
> work; their task scales to even fewer systems than before.  They'd
> rather get CHEAPER CPU/memory systems and spend MORE of their money on
> faster networks, given a fixed budget.  Still others care more about
> having huge fast hardware graphics adapters (game junkies, real time
> renderers).  Still others need bleeding edge STORAGE because their HPC
> tasks are DISK bound.  Keep your expensive Opterons, they say, except
> insofar as they help me pump bytes quickly out of an enormous and very
> expensive disk array with a bleeding edge network all its own.
> 
> We are, I promise, ALL hardware junkies (and signed the form to join the
> list;-) but we RESPECT THE DIFFERENCES in our individual requirements
> for cost-benefit optimal design under the common designation of "cluster
> computing".  So in spite of the fact that I tend to run EP CPU bound
> code, I don't insist that EVERYBODY should ALWAYS design clusters that
> would solve MY problem optimally and that anybody is stupid if they do
> otherwise.  The people on this BEOWULF list, which was originally set up
> primarily to support the design of REAL cluster supercomputers where the
> tasks are not EP but are partitioned, have a nontrivial IPC burden, and
> where the model is running "a job" on a virtual SMP machine are very
> kind in both permitting folks like me with different interests to
> participate and in not insisting that all clusters discussed have to be
> beowulfs per se.
> 
> The list tends to be tolerant as well of discussions of big iron, of
> "oxen", of vector processors, of storage arrays, of networks, of shared
> memory designs, of message passing software, of all sorts of HPC issues
> even though the list address is NOT "hpc at hpc.org".  It has come to be
> more or less synonymous with HPC because of the enormous success of
> cluster computing economically to the extent that it now dominates HPC,
> but the two aren't really the same thing.  It tends to be LESS tolerant
> the further out there one goes away from the COTS part (especially if a
> point is made obnoxiously), though, as the bulk of us are here because
> of that mix of hardware addiction, infinite needs, and finite budgets,
> and appreciate that this list is ABOUT COTS clusters if it is about any
> single unifying factor.  It has been known to (gently) ridicule even
> COTS clusters that were built for the wrong reasons or with a silly
> design process (usually as a showcase piece for some major vendor in
> cahoots with some perhaps understandably greedy University department
> somewhere, a hammer in search of a nail and a top 10 slot on the Top 500
> list).
> 
> I suspect that some of the ridicule is sheer jealousy, mind you.  I know
> that mine is....;-)
> 
> To summarize (again) my own feelings (letting Andrew and Greg speak far
> more tersely for themselves) -- lighten up, and RESPECT the collective
> abilities of the people on the list (and I don't mean me personally, as
> I'm only a physicist and glorified amateur at cluster computing).
> Remember that they include Gordon H. Bell prizewinners, Ph.D. wielding
> computer scientists and electrical engineers who know far more than you
> or me about hardware, the directors, chief administrators, designers of
> major computer clusters all over the world, the toplevel technical
> employees, founders and owners, or customers of cluster oriented
> businesses, scientists and engineers and students and faculty who USE
> clusters every day to do their research -- really, you'd probably have
> to look long and hard to find a technical list this large whose
> participants have a higher mean or median IQ and/or a broader base of
> practical knowledge and experience in computer hardware.  I myself may
> be a bozo (sorry, ancient joke:-) but they collectively are Not Stupid.
> At all.  Nor are they ignorant.
> 
> I'm sure you aren't stupid either -- expertise in chess AND computers
> suggests a degree of dangerous, deranged brilliance that sucks up facts
> like water and turns them into intuitive structures of great beauty and
> complexity -- but cut the rest of us some slack and allow for a) our
> just maybe being not complete idiots and b) the DIFFERENCES in our basic
> requirements and the REALITIES of the marketplace that fuels the
> possibility of COTS clustering in the first place.  Statements like your
> original one that suggest that "they" are ignoring anything as
> elementary as the general desireability of cheap powerful chips (whoever
> "they" might be) are pretty arrogant and in this crowd will cause most
> people to just shrug and, if they persist, reach for their .procmailrc.
> 
>    rgb
> 
> >
> > Vincent
> >
> > At 06:14 AM 5/5/2005 -0400, Andrew Piskorski wrote:
> > >On Wed, May 04, 2005 at 04:14:42PM +0200, Vincent Diepeveen wrote:
> > >
> > >> Those bottlenecks are derived because they use 1000s of chickens.
> > >
> > >Vincent, however expert you might be in your own field, you clearly
> > >have no real idea what you're talking about in this one, and I suspect
> > >all you've accomplished in this thread is convincing 90+% of the list
> > >readers to simply ignore your posts as a waste of their time.
> > >
> > >I recommend reading more and talking less.  Especially when folks like
> > >RGB and Greg Lindahl have already pointed this out to you in Not So
> > >Subtle terms.
> > >
> > >--
> > >Andrew Piskorski <atp at piskorski.com>
> > >http://www.piskorski.com/
> > >_______________________________________________
> > >Beowulf mailing list, Beowulf at beowulf.org
> > >To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> 
> --
> Robert G. Brown                        http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Tim Mattox - tmattox at gmail.com - http://homepage.mac.com/tmattox/