[Beowulf] Re: vectors vs. loops

Thu May 5 08:15:09 PDT 2005

On Thu, 5 May 2005, Vincent Diepeveen wrote:

> The point they seem to ignore is that a single chip can be cheap.

Who are "they" and why do you think that they ignore this?  This is the
whole point of the list.  Not a single person on this list is unaware of
this; most are here BECAUSE of this.  What is the purpose of such a
provocative and obviously untrue statement?

> A beowulf network with 1024 nodes always will cost millions.

It CAN cost millions, or it CAN cost under a million, in US dollars or
Euros (hardware cost, exclusive of the infrastructure or human
management costs).  It depends on the design used and purpose.  In fact
I'd guess that "most" clusters (gotta stop using that word:-) cost
roughly $1000-2000 per CPU when all is said and done.  I personally
think that dual CPU systems (not infrequently armed with dual network
interfaces) dominate so you have to decide whether the word "node"
refers to boxes, network interfaces, or CPUs in the cost estimate, and
have to decide how to split up infrastructure costs fairly in cost
comparisons that go beyond just the hardware (a point of considerably,
occassionally spirited, debate).

> So creating faster chips is most important to keep prices cheap.
> Complaining about memory-bandwidth is just a side effect and not 
> a convincing reason to not produce faster chips.
> 
> Or do you disagree with that?

Look, clearly we're not communicating here.  A modern compute cluster
(be it beowulf, NOW, Grid style) is this thing some Very Smart People
(quite a few of them on this list) have developed collectively that
permits people to harness COTS hardware, which by definition is the mass
market sweet spot in price/performance.  That is the whole point.  The
point was made explicitly and can still be read in the original beowulf
documentation from close to ten years ago.  The point motivated the
creation of PVM by a team of some of the world's best computer
scientists (led by Jack Dongarra) back in the early 90's, which in turn
motivated things like MPICH and LAM mpi and set the stage for the
beowulf project and the "cluster revolution".  The point has been
independently "discovered" by hundreds of individuals and groups on
their own ever since the fifties or sixties, thousands of groups in
modern times with the help of this list and the memetic resource base it
helps to self-organize.

NOBODY on this list thinks that it isn't worthwhile to produce, dream
of, discuss, design faster chips at ANY level and for nearly any
purpose.  We are ALL hardware junkies on this list.  It is a requirement
for membership.  In order to join the beowulf list you have to sign a
form that says "I am a hardware addict and am not currently enrolled in
a twelve point plan to kick my habit and solemnly promise to drool over
every successive Moore's Law generation of hardware in its turn as I
harness it cheaply in my own personal cluster environment".  Did you
somehow miss this?

Most of us are ALSO here because we are de facto ECONOMICS BOUND.  We
have this horrible lust for unlimited compute power, but only a tiny
budget.  (Where "tiny" for a true addict equals "my current budget" no
matter what it is.:-) When somebody announces that they've built a 27
million dollar computer (cluster or otherwise) somewhere, we have
momentary, wistful visions of somehow fitting the whole damn thing into
our garages, or lustful dreams of what WE'D do if WE were given that
kind of money, just once, because OUR budgets are more likely to be $27
THOUSAND dollars, or for a lucky few, $270 thousand dollars, or for an
unlucky few, $2700 dollars.  Lots of us -- MOST of us -- are orders of
magnitude short of a thousand chickens.

This tends to keep the list discussion focussed on the sweet spot of
mass market sanity.  We don't waste time talking about $100K CPUs,
because for nearly all of us $100K is likely to be all of our computer
budgets for any given year and then some, and besides, if we HAD $100K
we would NEVER buy an oxen with it as we can get far, far more work done
with the chickens the same money would buy.  Not to mention the risk of
mad oxen disease...

Besides, those of us who have been in the game for a LONG time have
learned (in my own case, through bitter experience and very much the
hard way) that if you DO spend that $100K on a single big-iron system
that IS for the moment the world's fastest $100K investment you can make
for your problem, three years later mass market hardware can do even
more work even faster for a third of the cost, sometimes even a tenth of
the cost.  I personally have watched $75K single-piece-of-hardware
systems be sold as "junk" for $3000 when they are four years old because
one could buy brand new systems that were twice as fast for less than
the MAINTENANCE cost of the expensive refrigerator-sized hardware.  I
have lived through Duke spending a huge amount of money on a CM5,
watched a friend of mine devote two YEARS learning to program the damn
thing and porting a big project to it to run optimally fast, only to
have the CM5 sold for a couple of thousand a year or so later by a firm
interested only in recovering the gold in its contacts -- so little that
I and several friends were seriously tempted to buy it and PUT it into a
garage, only we couldn't find a garage with the right kind of electrical
outlet...;-) Still regret it -- it would have made WAY cool yard art up
on blocks out front, if nothing else.

So sure, we'd all LOVE to see 80 gazillion-flop CPUs invented with
eighteen SIMD pipelines and eight-way branch prediction, with a GB of
0.1 nsec SRAM cache, with total power requirements that would let them
be used in personal digital assistants, as long as they are sold in the
mass market at price points in between $50 and $1000, (preferrably on
the lower end of that range) just like the rest of the mass market CPUs.
I have personally enjoyed each successive generation of Intel CPU from
the 8088 and IBM PC up through the P4; as well as several excursions
along the way into efforts by Motorola, by Cyrix, by AMD, by MIPS, by
Sun.  The Opteron has been lovely, and I currently await with great
interest multicores and some of the exotic new "building block" mediated
designs, if any of them prove to be economically viable and make it down
into the mass market price range with viable linux distributions.

We'd ALL love to see newer better memory invented (even those of us who
don't really need it).  For me personally, each new generation of memory
has been mostly a bust because MY tasks tend to be CPU bound and scale
strictly with CPU speed almost independent of the speed of the
underlying memory subsystem, but even I see advantages for certain
non-research tasks.  Note well:  I personally have historically observed
no significant advantage even to DDR vs EDO vs SDRAM or to any cache
size above the measley 128 KB associated with the Celeron I'm typing
this on -- my task has scaled with Intel P6-class clock across the
entire P6 family from the 200 MHz Pentium Pro through latest generation
Celerons and PIII's -- the P4 actually LOSES a bit of ground relative to
clock because of changes made in the processor balance.  The last few
generations of AMD processor (Athlon, Opteron, AMD 64) have WON ground
relative to Intel and relative to pure clock, indicating real advances
in internal technology.  All for mass market cheap.  What's not to love
about this?

However, there are others for whom the task balance is completely
different.  They drool not over a faster CPU per se, but over those
faster memory subsystems.  For them faster memory means faster task
completion times much more directly than a faster CPU.  When they adopt
a DDR system vs an SDRAM system at constant CPU clock, they see an
immediate benefit.  There are still others for which neither one makes
that big a difference, because their task is network bound.  If they
spend too much on bleeding edge CPUs and memory, those resources just
have to wait even longer on the network to do each incremental chunk of
work; their task scales to even fewer systems than before.  They'd
rather get CHEAPER CPU/memory systems and spend MORE of their money on
faster networks, given a fixed budget.  Still others care more about
having huge fast hardware graphics adapters (game junkies, real time
renderers).  Still others need bleeding edge STORAGE because their HPC
tasks are DISK bound.  Keep your expensive Opterons, they say, except
insofar as they help me pump bytes quickly out of an enormous and very
expensive disk array with a bleeding edge network all its own.

We are, I promise, ALL hardware junkies (and signed the form to join the
list;-) but we RESPECT THE DIFFERENCES in our individual requirements
for cost-benefit optimal design under the common designation of "cluster
computing".  So in spite of the fact that I tend to run EP CPU bound
code, I don't insist that EVERYBODY should ALWAYS design clusters that
would solve MY problem optimally and that anybody is stupid if they do
otherwise.  The people on this BEOWULF list, which was originally set up
primarily to support the design of REAL cluster supercomputers where the
tasks are not EP but are partitioned, have a nontrivial IPC burden, and
where the model is running "a job" on a virtual SMP machine are very
kind in both permitting folks like me with different interests to
participate and in not insisting that all clusters discussed have to be
beowulfs per se.

The list tends to be tolerant as well of discussions of big iron, of
"oxen", of vector processors, of storage arrays, of networks, of shared
memory designs, of message passing software, of all sorts of HPC issues
even though the list address is NOT "hpc at hpc.org".  It has come to be
more or less synonymous with HPC because of the enormous success of
cluster computing economically to the extent that it now dominates HPC,
but the two aren't really the same thing.  It tends to be LESS tolerant
the further out there one goes away from the COTS part (especially if a
point is made obnoxiously), though, as the bulk of us are here because
of that mix of hardware addiction, infinite needs, and finite budgets,
and appreciate that this list is ABOUT COTS clusters if it is about any
single unifying factor.  It has been known to (gently) ridicule even
COTS clusters that were built for the wrong reasons or with a silly
design process (usually as a showcase piece for some major vendor in
cahoots with some perhaps understandably greedy University department
somewhere, a hammer in search of a nail and a top 10 slot on the Top 500
list).  

I suspect that some of the ridicule is sheer jealousy, mind you.  I know
that mine is....;-)

To summarize (again) my own feelings (letting Andrew and Greg speak far
more tersely for themselves) -- lighten up, and RESPECT the collective
abilities of the people on the list (and I don't mean me personally, as
I'm only a physicist and glorified amateur at cluster computing).
Remember that they include Gordon H. Bell prizewinners, Ph.D. wielding
computer scientists and electrical engineers who know far more than you
or me about hardware, the directors, chief administrators, designers of
major computer clusters all over the world, the toplevel technical
employees, founders and owners, or customers of cluster oriented
businesses, scientists and engineers and students and faculty who USE
clusters every day to do their research -- really, you'd probably have
to look long and hard to find a technical list this large whose
participants have a higher mean or median IQ and/or a broader base of
practical knowledge and experience in computer hardware.  I myself may
be a bozo (sorry, ancient joke:-) but they collectively are Not Stupid.
At all.  Nor are they ignorant.

I'm sure you aren't stupid either -- expertise in chess AND computers
suggests a degree of dangerous, deranged brilliance that sucks up facts
like water and turns them into intuitive structures of great beauty and
complexity -- but cut the rest of us some slack and allow for a) our
just maybe being not complete idiots and b) the DIFFERENCES in our basic
requirements and the REALITIES of the marketplace that fuels the
possibility of COTS clustering in the first place.  Statements like your
original one that suggest that "they" are ignoring anything as
elementary as the general desireability of cheap powerful chips (whoever
"they" might be) are pretty arrogant and in this crowd will cause most
people to just shrug and, if they persist, reach for their .procmailrc.

   rgb

> 
> Vincent
> 
> At 06:14 AM 5/5/2005 -0400, Andrew Piskorski wrote:
> >On Wed, May 04, 2005 at 04:14:42PM +0200, Vincent Diepeveen wrote:
> >
> >> Those bottlenecks are derived because they use 1000s of chickens.
> >
> >Vincent, however expert you might be in your own field, you clearly
> >have no real idea what you're talking about in this one, and I suspect
> >all you've accomplished in this thread is convincing 90+% of the list
> >readers to simply ignore your posts as a waste of their time.
> >
> >I recommend reading more and talking less.  Especially when folks like
> >RGB and Greg Lindahl have already pointed this out to you in Not So
> >Subtle terms.
> >
> >-- 
> >Andrew Piskorski <atp at piskorski.com>
> >http://www.piskorski.com/
> >_______________________________________________
> >Beowulf mailing list, Beowulf at beowulf.org
> >To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu