[Beowulf] AMD64 results...

Bill Broadley bill at cse.ucdavis.edu
Wed Dec 15 18:29:56 PST 2004


Group reply:

On Wed, Dec 15, 2004 at 05:49:09PM -0500, Robert G. Brown wrote:
> Just for those of you who were asking after AMD64's as viable compute
> platforms, I just ran stream and the bogomflops benchmark in my renamed
> "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+

That is a s754 amd64?  

> They are all below.  Executive summary is that the AMD barely beats
> (real) clock speed scaling compared to the P2 for stream.  I suspect
> that this is not yet the end of the story, though, as I see little
> difference between the i386 benchmark results and the x86_64 results
> when running the program compiled both ways on metatron.

Double registers only help if you need them.  Most codes won't
automatically utilize native 64 bit ints or pointers to any
significant advantage.

> The INTERESTING story is in bogomflops, which includes division.  There
> metatron was a whopping 2.8x faster than lucifer, while its clock is
> only 1.33x faster.  It more than doubled its relative clockspeed
> advantage, so to speak.  One can see how having 64 bits would really
> speed up 64 bit division compared to doing it in software across
> multiple 32 bit registers...

Interesting data point.

> Hope this is interesting/useful to somebody.  I put "real stream" at the
> very end.  "real stream" uses the best time where benchmaster uses the
> average time so benchmaster results are typically a few percent lower
> (and likely just that much more realistic as well).

Similar data points for an opteron, dual (stream using 1 cpu) 2.2 GHz,
with PC3200 memory (915.5MB array).  Not sure why the timer is so lousy,
I had to make the array large to get a reasonably accurate time:

I suspect the below numbers would be higher if I had a uniprocessor system
(never have a remote memory access or wait for the memory coherency)
or with a 2.6 Kernel (which is better about insuring that pages and the
process acting on the page is on the same cpu).

Kudos for the pathscale-1.4 compiler with -O3.

gcc-3.2.3 -O1:
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2206.8823       0.3010       0.2900       0.3800
Scale:       2285.7067       0.2880       0.2800       0.3700
Add:         2285.7087       0.4140       0.4200       0.5300
Triad:       2285.7152       0.3910       0.4200       0.4700

-O2
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1777.7736       0.3240       0.3600       0.3600
Scale:       1777.7783       0.3240       0.3600       0.3600
Add:         1882.3495       0.4590       0.5100       0.5100
Triad:       1882.3530       0.4590       0.5100       0.5100

-O3
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1777.7924       0.3260       0.3600       0.3700
Scale:       1828.4723       0.3230       0.3500       0.3600
Add:         1882.3679       0.4640       0.5100       0.5200
Triad:       1846.1717       0.4720       0.5200       0.5300

gcc-3.4.3 -O1:
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1729.6823       0.3330       0.3700       0.3700
Scale:       1828.5184       0.3230       0.3500       0.3600
Add:         1846.1048       0.4680       0.5200       0.5200
Triad:       1846.1040       0.4680       0.5200       0.5200

-O2:
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2133.3337       0.2960       0.3000       0.3500
Scale:       2133.3337       0.2980       0.3000       0.3500
Add:         2232.5578       0.4270       0.4300       0.5100
Triad:       2181.8132       0.4310       0.4400       0.5100

-O3:
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2285.6561       0.2630       0.2800       0.3600
Scale:       2285.6581       0.2580       0.2800       0.3100
Add:         2341.4071       0.3800       0.4100       0.4700
Triad:       2285.6555       0.3880       0.4200       0.5200

Pathscale-1.4 -O1:
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1999.9498       0.2880       0.3200       0.3200
Scale:       2064.4625       0.2840       0.3100       0.3200
Add:         2232.5009       0.3950       0.4300       0.4400
Triad:       2232.4910       0.3930       0.4300       0.4400

-O2
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2461.5205       0.2410       0.2600       0.2700
Scale:       2285.6970       0.2530       0.2800       0.2900
Add:         2341.4466       0.3730       0.4100       0.4200
Triad:       2399.9765       0.3670       0.4000       0.4100

-O3
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        3764.6831       0.1540       0.1700       0.1800
Scale:       3764.6831       0.1530       0.1700       0.1700
Add:         4173.8781       0.2080       0.2300       0.2400
Triad:       4173.8781       0.2110       0.2300       0.2400

-- 
Bill Broadley
Computational Science and Engineering
UC Davis



More information about the Beowulf mailing list