ATHLON vs XEON: number crunching

Thu Jun 20 01:01:03 PDT 2002

On Wed, Jun 19, 2002 at 05:54:01PM -0700, Bill Broadley's all...
> #5 Bottlenecks outside the cpu often do not scale AT ALL with clock speed.
>    So if it's memory bandwidth a 2.0 Ghz can be 0% faster then a 1.5 Ghz
>    cpu.

A big one is network - in fact, this is huge and often goes ignored.

For things like Gromacs, which are incredibly sensitive to latency on
networks, speeding up the CPU by a factor of 1.2 can trash your scaling
entirely.

In fact, its near ridiculous to compare benchmarks on 1.33Ghz Tbirds and
1.6Ghz XEONs with the results posted on the Gromacs site - their single CPU
runs are around 1/2 the speed, and they scale so well with Scali and
Myrinet - well, gee, no wonder, they can handle alot more latency at those
speeds! I once slowed my clock down to see how the same network
scaled - quite well! I dont think I caught up to the performance loss,
but I was certainly getting alot more work done per Watt of heat dissipated!

> The golden rule of benchmarking:
> #1 Use the application that justifies the purchase of the machine to
>    compare price/performance.  Only then can you be assured of getting
>    the most performance for your money.

And beware of what happens when you think "we've had this quote sitting here
for so long, now the 1.6Ghz are out, we can get those instead of the 1.4Ghz
as was quoted, for the same price!" -- next thing you know your scaling isnt
anything like you expected. (Do you get more performance in the end even
with worse scaling? not always - if you snag a nasty side effect, you
may end up running slower in the long run -- and paying more for cooling
for nothing!)

> > Did someone encounter such a strange pattern and what can be a source of
> > this behavior?
> 
> For things with certain memory access patterns P4's enjoy a significant 
> advantage in memory bandwidth.  On the other hand the athlon enjoys
> a performance lead on many other types of scientific codes.
> 
> Take specfp2000, a collection of 14 scientific applications (NOT 
> microbenchmarks).  The 2.0 Ghz P4 gets 669, the MP2000 gets 642.  So
> they are very similar right?  Nope, the reality is that at some parts
> of the specfp2000 suite the p4 is 1.64 times faster then the Athlon.
> At other benchmarks the Athlon is 1.56 times faster then the P4.

You've got to Benchmark Benchmark Benchmark. Thats all you can do using the
EXACT application you're going to be using in production. This is why before
we put together an 80 node athlon cluster we spec'd it all out with 4 dual
tyan boards just to mess around. We also had a few dual p3 boards to test
among a couple others. (settled on the athlons mainly due to price). And we
ran exactly what was going to be in use in production. There's almost no other
way to do this. You cannot guess by looking at a spec sheet, or even using a
different mobo and the same CPU/clock or a slower clock with the same mobo.
Just doenst work. Subtle effects multiply when you start doing
latency-sensitive computations over the network.

/kc

> 
> -- 
> Bill Broadley
> Mathematics/Institute of Theoretical Dynamics
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA