Top 500 trends

Ken Chase math at velocet.ca
Tue Nov 26 14:05:16 PST 2002


On Tue, Nov 26, 2002 at 08:24:04AM -0500, Robert G. Brown's all...
> On Mon, 25 Nov 2002, Ken Chase wrote:
> 
> > Anyone have stats on the top 500 in FLOPS per $?
> > 
> > Does anyone care? ("no.") Shouldnt we? Isnt that what Beowulf (but
> > not HPC itself, obviously) is about?
> 
> Yes. Yes.  Yes.
> 
> The really interesting issue is the overall lack of consideration of
> Moore's Law in cluster purchase design and long term economic strategy
> by the granting agencies.  For example, let's pick a project "X", which
> accomplishes some worthy HPC goal, whether it be computing properties of
> a quark-gluon plasma, studying critical phenomena, simulating nuclear
> devices, computing cosmological evolution.

> It would be lovely if time, CBA, and Moore's Law were prominently
> associated with project proposal and approval, IF our goal was really to
> Probe the Secrets of the Universe most efficiently.  Of course, our REAL
> purpose is to Have A Job probing those secrets, and imagine the pain if
> I were told "sorry, we can do your Monte Carlo computations now at a
> cost of $50K/year for eight years or wait until year seven and complete
> them in year eight for $100K total, and you'll just have to wait".
> Multiplied by thousands of the nation's top University researchers.

This unfortunately doesnt consider a couple things - one of which makes
your point stronger, and one caveat.

The stronger point is investing the money. If you invest that money instead
of spending it on cluster gear, you end up with some 8-10% (or perhaps
less now) with a low-risk well researched investment, per year. Pretty
hefty over a number of years.

The caveat of course is that having the cluster and designing it, as well as
running it and using it, increases the knowledge surrounding how to do so, as
well as the problem you are trying to model it. Hopefully you achieve a more
than 10% compounded per annum increase in knowledge of the problem and how to
attack it by using the cluster properly and investigating new algorithms and
the like.

This 10% on addresses the value lost in investing the money instead.  You also
have to deal with a 100% return every 18 months from Moore's law on not buying
clusters :) (~60% a year, I know its not 100% every 18 months, its up and down
with release cycles, but on avg) This gives us something like 66 or 70%
return a year for those waiting to spend the money at the end of the
project instead.

Hopefully intelligent people can spend a year working on stuff such
that the intellectual capital gained outweighs the 65-70% returns otherwise
granted by waiting. (And if people started waiting around all the time
would that adversely  affect moore's law? I suppose WinTel causes Moore's
law, not beowulves :)

Furthermore, 'discovering' something first is a one time gain that
would be worth 5 or 10 or more years of compounded 65%+ returns - you
discover the new prozac first... (of course, development of a drug or
other product is almost never related only to the cost of acquiring or
even operating the cluster - there are dozens of other costs that
would dwarf it, even for much more theoretical computational-only concerns.)

Perhaps thats why people install beowulfs now, instead of waiting for
some distant future optimal time, say, when Moore's law is broken,
or the cycles get longer apart. (Suddenly MTBF figures will matter
a hell of alot more!)

Obviously people handing out funding would like to see the money used
immediately - playing the markets with it for 4 years really isnt going to fly
well with them. Perhaps a grant designed to sustain a group for 5 years
would be better spent with a larger amount in the early years going
towards algorithm and cluster design research, and in later years going
towards cluster gear. Of course, its hard to budget like that - since
you have to pay people a constant (or slightly increasing) amount.

As always, it depends on the job at hand, and if research/algorithm
development can be done without a large test cluster in the first place, etc.
IBM didnt seem to be right off the starting blocks with a huge cluster on day
1 for Blue Gene - perhaps they're following this model in some (customized)
way. (Then again they got their own fabs, so, all bets are off! :)

> I personally think that we'd be a bit better off as a country if we sort
> of split the difference -- recognize that our real goal is supporting
> the people and the infrastructure, and spending a bit less on ginormous
> supercomputers whose primary benefit is that they provide us with the
> answers to questions of interest to (and understandable by) perhaps a
> few hundred of the humans on the planet.  That is NOT to say that those
> answers aren't important to pursue -- I am a scientist and in hot
> pursuit myself -- but I question the wisdom of spending quite as much as
> we do to get these answers "right now" without the perspective
> associated with Moore's law and the real goal of stretching out human
> support anyway.

Interesting. Problem is the technocratic society we live in - more technology
is good! People advertise "the most complex (service name) network in the
industry!" all the time - as if that's a good thing. (This was related to a
massive cell carrier's network failure over a weekend last summer -- the
article referred to the carrier's ads that stated they had the most complex
network -- I mused that perhaps thats why it took so long to fix. I'd rather
see my carrier have a simple network thats easy to fix. :)

People are enamoured with big technology in snazzy cases with lots of
blinkenlights. And as you are stating here, its not even fiscally/socially
responsible in many cases to invest in it as a means to conduct research -
however, these clusters have become their own end - no questions asked. Thats
why there's a top 500. Its a d*ck waving contest. And people dont even ask the
annoying questions of 'how many flops per $ is that', or 'how many jobs per
hour can this chunk of $ properly turned into a cluster give us' -- or even
more appropriate - 'what is this cluster being used for and how is the
community/society going to benefit from it'. It just IS and IT is good.

Furthermore, even the way the clusters are used should be in dispute. A
research group I work with has spent perhaps 5-10x the average time and effort
in constructing their particular types of jobs, and a whole filtering system
(involving human effort/time) to weed out innapropriate jobs, or jobs with
solution spaces that do not need to be rendered at high levels of theory (thus
costing lots of cpu time to no advantage).

After having done this for many years, they have surpassed the expense curve
('humans are expensive') and instead of just throwing large but inappropriate
jobs into huge clusters of computers, they are achieving 10-100x the speedup
(or increase in the size of jobs they can get relevant results for) because
they actually dont just see the technology as some huge magic black box - they
use it as a tool (a means).  They have had limited funding, which is partially
responsible for this methodology, and as they are starting to get some
recognition, their methods are giving them an advantage over similar research
groups - in fact, they're cognisant of their methods and their origins, and
have instilled a semi rigid structure for new researchers to follow so that
they undergo the same learning/investigative attitude that the original
leaders were forced to adopt over the years.

I think this is probably simply referred to as 'good research with
appropriate tools'. And its also led my design of clusters for them
into some interesting areas too that I would have otherwise not have gone
were I not trying to do the impossible with a limited amount of funding. ;)

Obviously governments are accused of the lack of this type of mentality all
the time - 'deep pockets' replace 'frugal and intelligent allocation' and many
other stereotypes that are applied to almost any western government's funding
policies of certain high profile high tech endeavours - to the detriment of
lower profile but probably much more innovative work going on.  However,
there's obviously no easy solution to determine where money should go - and
even less of an idea of how to determine whose research is going to benefit
the public the fastest/most. (whats better for the public?  fastest or
mostest? :)

I suppose I've not seen the other adage most often touted for large
investments of capital into the economy stated along side a cluster install
yet - "this new extremely large cluster built entirely out of non HA boxes
will create 200 jobs for magic little cluster elves for 5 years." I suppose
its only a matter of time before we start hearing that ;)

/kc


> 
>     rgb
> 
> > 
> > /kc
> > 
> > > -- Jim 
> > > 
> > > James Cownie	<jcownie at etnus.com>
> > > Etnus, LLC.     +44 117 9071438
> > > http://www.etnus.com
> > > 
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> > 
> 
> -- 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 



More information about the Beowulf mailing list