Athlon SDR/DDR stats for *specific* gaussian98 jobs

Velocet math at
Wed May 2 12:35:34 PDT 2001

On Wed, May 02, 2001 at 12:53:17PM -0400, Mark Hahn's all...
> > the same duron on a DDR board w/256Mb DDR RAM (talk about a waste! :)
> well, a duron has the same dram performance as a tbird at the same FSB.
> so in that sense, it's actually a better match for a lot of computational
> codes that sneer at cache.
> otoh, tbirds are dirt cheap, not much more than durons.

The thing is for the cost levels Im working with, the price diff between
a Duron and a Thunderbird does matter. As does the cost of a videocard
being needed to get a system booted or not.

For gaussian98, for my jobs (specifically) the cache seems to really
matter quite alot. Thats the only thing that I can guess that accounts
for the very high speed of my K7-700, which performs, for most jobs,
very close to a Tbird 900.

I really wish I could find some old K7s that people are tossing out for
cheap. Problem is the board ends up being huge - wont fit into a 1U
case for sure obviously.

> > I havent had a chance to run any of these on a P4 with SSE3 (is that the
> sse2.  gcc 3.0 snapshots apparently can generate sse2 code...

Wonder if g77 can generate sse2 code.

> does ATLAS include prefetching?  it's fairly astonishing how big a 
> difference prefetching (and movntq) can make on duron/athlon code.
> for an extreme case (Arjan van de Ven's optimized page-copy and -zero):

Im not too up on the internals of atlas. Others on the list probably

> > There are some caveats and other environmental factors discussed on
> > the page as well.
> error bars would be nice.

Heh, my stats are so unprofessionally done, you flatter me by even
asking. I dont know what the factors of error are at all - does
/usr/bin/time have problems? Is my clock accurate? Like I said, I just
ran the jobs 5 times each minimum (in one case 90 times) and took
the median performance.

> > I am not trying to start a jihad here against high speed Thunderbird
> > and Dual Thunderbird proponents; this is just what i've found for *MY*
> maybe I'm being dense, but how would these results be interpreted 
> jihadically?  in general, they show two things: high-end machines

Because I've mentioned that "perhaps buying an assload of Durons instead
of dual Athlon DDR boards would give more bang for the buck overall"
caused some consternation. Most were regarding increased switch costs
for having so many more nodes, or possibly the cost of havving a boot
hardrive for each box (which would increase the costs quite heavily).

My advantages that probably apply to few others:

- I dont need any high speed paralellism for this cluster. all jobs run
  singly by themselves on one node.

- quality isnt even a massive factor - if a board crashes, the job is
  rescheduled.  Obviously there's an acceptable threshhold, and we're
  way beyond that - two boards running for 12 days didnt crash on either
  OS. I am not putting together a mini prototype to start before the shipment
  of the rest of the parts comes and I finalize the cabinet design: Im gonna
  run 10 boards within a few inches of eachother in a stack (with proper
  cooling) and see how they fare - Im mainly curious about RF interference
  and cooling problems. I am pretty sure they'll be fine (a friend ran
  4 of these boards even closer than I plan for 4 weeks with no prbolems).

- we are running diskless nodes over NFS and we dont read or write to it
  often. (256kbps/node average during calculations)

So I dont need a big switch, I dont need super reliable BrandName equipment
with a service contract and I dont even need high performance network

Im actually somewhat interested in the power usage stats for different
Athlon systems. Having fewer but faster nodes may well not save all
that much power. Someone have wattage stats handy?

> bear a price-premium that decreases their speed/cost merit, and 
> that freebsd's page coloring sometimes has a measurable benefit.

Wonder how many people are using FreeBSD on their clusters instead of

> I'm dubious about further interpretation, though.  for instance,
> you seem to show a significant benefit to tbird's larger cache
> (384 vs 192K), but surely you chose this workload to be bandwidth
> intensive, didn't you?  if not, then the DDR comparison is rather
> specious...

I didnt. I chose it to be related to what we need the cluster for. Im
trying to justify my design because it may come into question. Which is
partly why I really need to check out a P4s stats, but Im pretty sure
the price/performance is going to be lower than we can afford. The only
question Im really trying to head off is why we didnt use the fastest
Tbirds available and DDR ram.

I am not going to read the G98 code, its horrid spaghetti ;) and my
fortran isnt that great. And there's A LOT of code. Its not worth my
time. Its much faster to just run the jobs on different boards and
see what the results are, than to predict them by reading the code.

Actually I have about 3x as many numbers for non-Atlas jobs, but they're
kind of useless. However, they do indicate the speedup provided by
the Thunderbirds as I managed to get both a Tbird and Duron 750, 800 and
850. I can dig up those stats if people care, but then again these are
stats for *my* particular jobs running on non-optimal (non ATLAS)

> thanks for posting the numbers!

No problem, sorry it wasnt more professionally done ;) I also apologize
for not running standard G98 tests (Im not aware what would constitute
such, or if there's a preset package of benchmarks available).

Ken Chase, math at  *  Velocet Communications Inc.  *  Toronto, CANADA 

More information about the Beowulf mailing list