Athlon SDR/DDR stats for *specific* gaussian98 jobs

Velocet math at velocet.ca
Thu May 3 11:15:50 PDT 2001


On Thu, May 03, 2001 at 08:23:56AM -0400, Robert G. Brown's all...

> > Bursts of CPU usage? Arent all our clusters all hammering our CPUs as much as
> > possible? And if ATLAS is really doing its job, arent we hammering all parts
> > of the CPU as much as possible? :)
> 
> Not variations in the CPU/memory load, which as you note is nearly
> constant (and not really all THAT different between idle and loaded -- a
> lot of juice is expended just keeping it and the memory running in idle
> mode), variations in the peripheral load -- using the disk(s), the
> network and so forth.  In your configuration (very stripped, if I
> recall) you don't think you'll see much variation.

nope ;)

> I'm going to see if I can get Duke to spring for some tools to measure
> power draw properly.  There are all sorts of peak vs rms issues that

We have power measuring tools for our colocation customers. Im going
to plug in a bunch of these boards and see what I get.

> In my case the best solution to optimize these parameters seems to be do
> business with a reliable local vendor, missing the absolute best deals
> available anywhere by an easy 10% or so but getting the warm fuzzies of
> a place where I'm on a first name basis with the staff (who of course

Oh I agree. Our provider will come in and sit on the floor and fix things
for 5-20 minutes if he can, or take it with him to his lab. If he's
going to take more than a day he usually just gets us a drop in replacement
part and RMAs the thing for himself to get a new board for some future
customer. Works out great. Its not the cheapest, but 10% premium on a
design which is 80% cheaper than most designs is ok by me if it
saves a lot of time.

In fact, I find that having all the nodes booting off a central NFS
server makes management easier as well. With each one booting off the
same mount point and set of directories, upgrading stuff is hyper
simple, and only the unique portions of disk that I need (any scratch
disk) needs to be seperate per machine.

In fact, the raid server  is actually being shared with another project
already in operation, so we get to save effort there too. Yay, I dont
have any work to do! :)
 
> This last point is worth examining.  The way Moore's Law works it is
> amusing but true that if you take a fixed three year budget of 3A and
> spend it all at once, you get (3A)*(3 years) = 9 work units done over
> three years.  If you spend it A per year, you get (A)*(3 years) +
> 2*(A)*(2 years) + 4*(A)*(1 year) = 11 work units done over the same time
> (the numbers reflecting the approximate annual doubling in speed from
> ML).  That is, you break even in work done between years 2 and 3 and
> thereafter accumulate work units at A+2A+4A = 7A per year vs 3A.  Also
> note that you break even in the RATE at which work is done at the
> BEGINNING of the second year -- by spending your money incrementally
> (likely matching the ramp-up in work load, unless your users are "ready"
> to jump in and simply crank up to full speed immediately) you get almost
> as much work done in the third year alone as one would in three spending
> everything all at once.

IIRC, Moore's law was at 18months now. From everything2.com (because I knew
I'd find it there, not because its authoritative):

  The observation that the logic density of silicon integrated circuits has
  closely followed the curve (bits per square inch) = 2^(t - 1962) where t is
  time in years; that is, the amount of information storable on a given amount
  of silicon has roughly doubled every year since the technology was invented.
  This relation, first uttered in 1964 by semiconductor engineer Gordon Moore
  (who co-founded Intel four years later) held until the late 1970s, at which
  point the doubling period slowed to 18 months. The doubling period remained
  at that value through time of writing (late 1999).

This doesnt talk about the speed of the chips. Assuming it applies,
however, as you have: 

At 1.587/year, over 3 years, the schedule would be:

all at once		 3A*3 = 				9 A
in stages once a year	 3*A + 2*1.587*A + 1.587^2*A =		8.7 A
all in the last year	 3*1.587^2*A =				7.55 A

So its close, but slightly losing. There are a few caveats here of course.

- The people controling the money may want to see a fair number of results
early on, instead of waiting the full 3 years. 

- Hopefully the techniques themselves as well as the software, will
be improving becasue of the previous research/results. The more early
results you get, the better. Even at a 'compounded advancement' rate of
10% 'better technology' per year, this favours the early implimentation
earlier on. Im not just talking about you wiating around for others
to improve their software - you yourself will use your own results
to improve/guide your research. Humans thinking for years is alot of
valuable input and will change and hopefully improve the research.

(a possible argument to this is that investing the money on the market
at 10% would counteract this 10% favour to 'early implimentation' ;)

- The other caveat is that Moore's law is a smooth curve that approximates
the increase in speed/performance, but the actual advancements are done
when Intel and AMD release stuff according to marketing schedules, etc.
So if you catch the wave at the beginning or end of a cycle, you could
ride the value of the cheaper components dropping in price suddenly
and drastically in one shot, and jump ahead of the curve for a few 
months. You can also get screwed by the same effect, and its hard to tell
where you are in the cycle too.

> So my one piece of parting advice is to worry less about getting "the"
> absolute best (most cost effective) hardware as it exists right now --
> your cost-benefit optimization calculation may not survive literally
> from week to week anyway.  Last week the bleeding edge Tbird dropped by
> almost 25% of its price (so my concern about NIC prices in the cluster

Yes I agree to some extent. The thing is a number of results have to be
produced within a couple months. The speed and cost/performance of the cluster
over the next three years is a declining concern, compared with getting a
number of things done over the next 6 months.

Also, if we always buy stuff that costs twice as much than the best
price performance, then we never win by buying in stages/not worrying
about it. I buy the best price performance now, and I buy it again in
6 months, which is some totally different architecture by that time. And
6 months after that, again. Eventually I end up with Tb 1.3Ghz in my
cluster, but 6 months after everyone else. In the meantime, I've always
had the best price performance. 

Also, for me the money is structured differnetly. This is a grant to
the group and must be ALL allocated NOW. There's no options to waiting
at all. So I might as well spend it all now, on the best possible
performance for the money.
 
> [...a clever person perhaps have noting that if one did nothing but bask
> in the rays of the Caribbean sun for two years and mosey back north to
> buy 3A*4-speed systems at the start of the third year, one would get 12
> work units done in the third year alone and thus beat out even my A per
> year purchase schedule and get a nice tan besides.  Or worse, waiting
> one MORE year gets 3A*8 = 24 work units done in the fourth year alone.

If moore's was 2x per year ya, but its 18 months ;)

> In the limit, NOBODY should EVER buy computers to do numerical
> calculations now, as the longer they wait the less time it will take to
> complete them once they start and if we all just waited long enough a
> single desktop unit would get more work done than all the beowulf units
> currently in existance put together...  hmmm, something wrong with this
> logic, head hurts, must seek solution -- oh hell, might as well go get
> tickets to Jamaica...:-)
> 
> Forgive my morning ramblings...

Heh, they're fun, keep us sane, remind us why we're doing our work. Luckily
moore's law isnt so quick, now we know why we research things NOW instead
of waiting 200 years for someone to figure it all out ;)

/kc

> 
>      rgb
> 
> -- 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 




More information about the Beowulf mailing list