[Beowulf] AMD64 results...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Dec 15 14:49:09 PST 2004
- Previous message: [Beowulf] MPI Implementations for SMP use
- Next message: [Beowulf] AMD64 results...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Just for those of you who were asking after AMD64's as viable compute platforms, I just ran stream and the bogomflops benchmark in my renamed "benchmaster" (was cpu_rate) shell on both a 2.4 GHz AMD64 3400+ (metatron) and on a 1.8 MHz P2 (lucifer). Of course I also ran stream by hand on them just to make sure it was giving correct results. They are all below. Executive summary is that the AMD barely beats (real) clock speed scaling compared to the P2 for stream. I suspect that this is not yet the end of the story, though, as I see little difference between the i386 benchmark results and the x86_64 results when running the program compiled both ways on metatron. The INTERESTING story is in bogomflops, which includes division. There metatron was a whopping 2.8x faster than lucifer, while its clock is only 1.33x faster. It more than doubled its relative clockspeed advantage, so to speak. One can see how having 64 bits would really speed up 64 bit division compared to doing it in software across multiple 32 bit registers... It should also be carefully noted that metatron is running Fedora Core 3, x86_64. In other words, blood is dripping down the installation. I wouldn't be terribly surprised to learn that I've screwed up the libraries (or they were conservative with the package binaries) or something so that I'm not getting full 64 bit speed out of it. I'd really expect to see a bit more of an advantage on stream relative to clock from AMD's wide data path and faster memory (PC3200). Hope this is interesting/useful to somebody. I put "real stream" at the very end. "real stream" uses the best time where benchmaster uses the average time so benchmaster results are typically a few percent lower (and likely just that much more realistic as well). rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 3.009 # Test: stream copy # Test Description: d[i] = a[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 21476487.571592 (nsec) time empty = 3.335250 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream copy" 2000000 1 1.07e+01 1.35e-02 1.490e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 3.009 # Test: stream scale # Test Description: d[i] = xtest*d[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 22124394.180132 (nsec) time empty = 3.336800 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream scale" 2000000 1 1.11e+01 1.52e-02 1.446e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 3.022 # Test: stream add # Test Description: d[i] = a[i] + b[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 29229924.717210 (nsec) time empty = 3.334787 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream add" 2000000 1 1.46e+01 1.42e-02 1.642e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 2.837 # Test: stream triad # Test Description: d[i] = a[i] + xtest*b[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 29402273.082914 (nsec) time empty = 3.334403 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream triad" 2000000 1 1.47e+01 1.40e-02 1.633e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: metatron # CPU: AuthenticAMD AMD Athlon(tm) 64 Processor 3400+ at 2411.773 (MHz) # CPU: L2 cache: 512 KB bogomips: 4767.74 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 2.940 # Test: bogomflops # Test Description: d[i] = (ad + d[i])*(bd - d[i])/d[i] (8 byte double vector) # # full iterations = 2 empty iterations = 524288 # time full = 17716108.582773 (nsec) time empty = 3.333979 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "bogomflops" 2000000 1 2.21e+00 2.36e-03 4.516e+02 .......................................................................... #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.392 # Test: stream copy # Test Description: d[i] = a[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 30751936.233069 (nsec) time empty = 14.403535 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream copy" 2000000 1 1.54e+01 4.97e-02 1.041e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.363 # Test: stream scale # Test Description: d[i] = xtest*d[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 30298036.984022 (nsec) time empty = 12.929858 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream scale" 2000000 1 1.51e+01 3.61e-02 1.056e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.270 # Test: stream add # Test Description: d[i] = a[i] + b[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 39540020.016525 (nsec) time empty = 13.573450 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream add" 2000000 1 1.98e+01 3.55e-02 1.214e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.315 # Test: stream triad # Test Description: d[i] = a[i] + xtest*b[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 39778004.853398 (nsec) time empty = 12.493777 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "stream triad" 2000000 1 1.99e+01 5.81e-02 1.207e+03 #======================================================================== # Microtimer 1.0.0 # Copyright 2004 Robert G. Brown # # hostname: lucifer # CPU: GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz at 1804.509 (MHz) # CPU: L2 cache: 512 KB bogomips: 3555.32 # Memory: 0 # cpu cycle counter nanotimer: clock granularity (nsec/cycle) = 98.160 # Test: bogomflops # Test Description: d[i] = (ad + d[i])*(bd - d[i])/d[i] (8 byte double vector) # # full iterations = 2 empty iterations = 131072 # time full = 49586377.513218 (nsec) time empty = 12.461900 (nsec) # # test name vlen stride time +/- sigma (nsec) megarate #======================================================================== "bogomflops" 2000000 1 6.20e+00 5.03e-03 1.613e+02 (metatron x86_64 binary) # Function Rate (MB/s) RMS time Min time Max time Copy: 1563.8717 0.0205 0.0205 0.0207 Scale: 1540.5368 0.0209 0.0208 0.0213 Add: 1729.2831 0.0283 0.0278 0.0290 Triad: 1731.3502 0.0278 0.0277 0.0280 (metatron i386 binary) # Function Rate (MB/s) RMS time Min time Max time Copy: 1542.3957 0.0209 0.0207 0.0213 Scale: 1525.1148 0.0213 0.0210 0.0218 Add: 1732.2291 0.0280 0.0277 0.0286 Triad: 1698.8726 0.0284 0.0283 0.0286 (lucifer i386 binary) # Function Rate (MB/s) RMS time Min time Max time Copy: 1076.4977 0.0300 0.0297 0.0314 Scale: 1078.9293 0.0298 0.0297 0.0302 Add: 1231.8450 0.0392 0.0390 0.0401 Triad: 1230.7681 0.0392 0.0390 0.0403
- Previous message: [Beowulf] MPI Implementations for SMP use
- Next message: [Beowulf] AMD64 results...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
