[Beowulf] AMD64 results...

Robert G. Brown rgb at phy.duke.edu
Wed Dec 15 14:49:09 PST 2004


Just for those of you who were asking after AMD64's as viable compute
platforms, I just ran stream and the bogomflops benchmark in my renamed
"benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+
(metatron) and on a 1.8 MHz P2 (lucifer). Of course I also ran stream by
hand on them just to make sure it was giving correct results.

They are all below.  Executive summary is that the AMD barely beats
(real) clock speed scaling compared to the P2 for stream.  I suspect
that this is not yet the end of the story, though, as I see little
difference between the i386 benchmark results and the x86_64 results
when running the program compiled both ways on metatron.

The INTERESTING story is in bogomflops, which includes division.  There
metatron was a whopping 2.8x faster than lucifer, while its clock is
only 1.33x faster.  It more than doubled its relative clockspeed
advantage, so to speak.  One can see how having 64 bits would really
speed up 64 bit division compared to doing it in software across
multiple 32 bit registers...

It should also be carefully noted that metatron is running Fedora Core
3, x86_64.  In other words, blood is dripping down the installation.  I
wouldn't be terribly surprised to learn that I've screwed up the
libraries (or they were conservative with the package binaries) or
something so that I'm not getting full 64 bit speed out of it.  I'd
really expect to see a bit more of an advantage on stream relative to
clock from AMD's wide data path and faster memory (PC3200).

Hope this is interesting/useful to somebody.  I put "real stream" at the
very end.  "real stream" uses the best time where benchmaster uses the
average time so benchmaster results are typically a few percent lower
(and likely just that much more realistic as well).

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu


#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: metatron
# CPU:  AuthenticAMD  AMD Athlon(tm) 64 Processor 3400+ at  2411.773 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  4767.74
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) =  3.009
# Test: stream copy
# Test Description: d[i] = a[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 524288
# time full = 21476487.571592 (nsec)   time empty = 3.335250 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"stream copy"          2000000     1  1.07e+01  1.35e-02      1.490e+03
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: metatron
# CPU:  AuthenticAMD  AMD Athlon(tm) 64 Processor 3400+ at  2411.773 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  4767.74
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) =  3.009
# Test: stream scale
# Test Description: d[i] = xtest*d[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 524288
# time full = 22124394.180132 (nsec)   time empty = 3.336800 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"stream scale"         2000000     1  1.11e+01  1.52e-02      1.446e+03
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: metatron
# CPU:  AuthenticAMD  AMD Athlon(tm) 64 Processor 3400+ at  2411.773 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  4767.74
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) =  3.022
# Test: stream add
# Test Description: d[i] = a[i] + b[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 524288
# time full = 29229924.717210 (nsec)   time empty = 3.334787 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"stream add"           2000000     1  1.46e+01  1.42e-02      1.642e+03
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: metatron
# CPU:  AuthenticAMD  AMD Athlon(tm) 64 Processor 3400+ at  2411.773 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  4767.74
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) =  2.837
# Test: stream triad
# Test Description: d[i] = a[i] + xtest*b[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 524288
# time full = 29402273.082914 (nsec)   time empty = 3.334403 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"stream triad"         2000000     1  1.47e+01  1.40e-02      1.633e+03
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: metatron
# CPU:  AuthenticAMD  AMD Athlon(tm) 64 Processor 3400+ at  2411.773 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  4767.74
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) =  2.940
# Test: bogomflops
# Test Description: d[i] = (ad + d[i])*(bd - d[i])/d[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 524288
# time full = 17716108.582773 (nsec)   time empty = 3.333979 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"bogomflops"           2000000     1  2.21e+00  2.36e-03      4.516e+02



..........................................................................
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: lucifer
# CPU:  GenuineIntel  Intel(R) Pentium(R) 4 CPU 1.80GHz at  1804.509 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  3555.32
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.392
# Test: stream copy
# Test Description: d[i] = a[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 131072
# time full = 30751936.233069 (nsec)   time empty = 14.403535 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"stream copy"          2000000     1  1.54e+01  4.97e-02      1.041e+03
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: lucifer
# CPU:  GenuineIntel  Intel(R) Pentium(R) 4 CPU 1.80GHz at  1804.509 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  3555.32
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.363
# Test: stream scale
# Test Description: d[i] = xtest*d[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 131072
# time full = 30298036.984022 (nsec)   time empty = 12.929858 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"stream scale"         2000000     1  1.51e+01  3.61e-02      1.056e+03
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: lucifer
# CPU:  GenuineIntel  Intel(R) Pentium(R) 4 CPU 1.80GHz at  1804.509 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  3555.32
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.270
# Test: stream add
# Test Description: d[i] = a[i] + b[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 131072
# time full = 39540020.016525 (nsec)   time empty = 13.573450 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"stream add"           2000000     1  1.98e+01  3.55e-02      1.214e+03
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: lucifer
# CPU:  GenuineIntel  Intel(R) Pentium(R) 4 CPU 1.80GHz at  1804.509 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  3555.32
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.315
# Test: stream triad
# Test Description: d[i] = a[i] + xtest*b[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 131072
# time full = 39778004.853398 (nsec)   time empty = 12.493777 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"stream triad"         2000000     1  1.99e+01  5.81e-02      1.207e+03
#========================================================================
#                          Microtimer 1.0.0
#                    Copyright 2004 Robert G. Brown
#
# hostname: lucifer
# CPU:  GenuineIntel  Intel(R) Pentium(R) 4 CPU 1.80GHz at  1804.509 (MHz) 
# CPU: L2 cache: 512 KB    bogomips:  3555.32
# Memory: 0
# cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.160
# Test: bogomflops
# Test Description: d[i] = (ad + d[i])*(bd - d[i])/d[i] (8 byte double vector)
#
# full iterations = 2    empty iterations = 131072
# time full = 49586377.513218 (nsec)   time empty = 12.461900 (nsec)
#
#    test name           vlen   stride  time +/- sigma (nsec)  megarate
#========================================================================
"bogomflops"           2000000     1  6.20e+00  5.03e-03      1.613e+02


(metatron x86_64 binary)
# Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:        1563.8717       0.0205       0.0205       0.0207
Scale:       1540.5368       0.0209       0.0208       0.0213
Add:         1729.2831       0.0283       0.0278       0.0290
Triad:       1731.3502       0.0278       0.0277       0.0280

(metatron i386 binary)
# Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:        1542.3957       0.0209       0.0207       0.0213
Scale:       1525.1148       0.0213       0.0210       0.0218
Add:         1732.2291       0.0280       0.0277       0.0286
Triad:       1698.8726       0.0284       0.0283       0.0286

(lucifer i386 binary)
# Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:        1076.4977       0.0300       0.0297       0.0314
Scale:       1078.9293       0.0298       0.0297       0.0302
Add:         1231.8450       0.0392       0.0390       0.0401
Triad:       1230.7681       0.0392       0.0390       0.0403



More information about the Beowulf mailing list