Pentium IV Xeon memory bandwidth. Any experience?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Jun 25 09:35:10 PDT 2001
- Previous message: Pentium IV Xeon memory bandwidth. Any experience?
- Next message: Pentium IV Xeon memory bandwidth. Any experience?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 25 Jun 2001, Greg Lindahl wrote: > On Mon, Jun 25, 2001 at 02:19:15PM +0200, Thomas Guignon wrote: > > > We have tested and 1.2 Ghz with PC2100 DDR with Level 1 Blas and results are > > quite nice: > > -dnrm2: (one vector read) > > 1450 10^6 B/s > > -ddot: (two vector read) > > 1040 10^6 B/s > > -daxpy (one vector read and one vector read/write) > > 1150 10^6 B/s > > - copy (one vector read and one vector write) > > 990 10^6 B/s > > These numbers look like they are for vectors that fit into cache. > > What does the STREAM benchmark report for this board? I bet it's > substantially slower, and the person asking the question wanted main > memory bandwidth, not cached bandwidth. Greg, Here is stream for a 1.33 GHz Tbird with PC2100: rgb at ganesh|T:108>stream_gcc # Function Rate (MB/s) RMS time Min time Max time Copy: 608.4571 0.0619 0.0263 0.1465 Scale: 497.4804 0.0778 0.0322 0.1523 Add: 658.1838 0.0779 0.0365 0.1368 Triad: 587.9196 0.1086 0.0408 0.1614 For comparison here is stream for a 1.33 GHz Tbird with PC133 rgb at g15|T:104>stream_gcc # Function Rate (MB/s) RMS time Min time Max time Copy: 392.4357 0.0408 0.0408 0.0412 Scale: 395.3837 0.0405 0.0405 0.0405 Add: 431.7950 0.0557 0.0556 0.0560 Triad: 430.1768 0.0559 0.0558 0.0562 About what you'd expect: 200 (2 x 100 MHz)/133 = 1.5 608/392 = 1.55 497/395 = 1.26 658/431 = 1.53 587/430 = 1.37 The PC2100 is about 50% faster than PC133 and hence so are the streaming float rates out where memory is the bottleneck. This doesn't use the Athlon prefetch, though. This might be responsible for the higher numbers seen above. Which leads to the very practical question: How does one use the Athlon prefetch? Is there a compiler option? Or does one have to code in assembler? Where would one find out -- my local AMD rep was learning about this from me instead of the other way around, so clearly "ask your AMD rep" is a bad answer to this...;-) rgb P.S. For further comparison, stream run on a 933 MHz PIII with PC133 rgb at parvati|T:103>stream_gcc # Function Rate (MB/s) RMS time Min time Max time Copy: 289.6763 0.0553 0.0552 0.0556 Scale: 325.9658 0.0492 0.0491 0.0494 Add: 360.1278 0.0667 0.0666 0.0667 Triad: 304.0168 0.0790 0.0789 0.0793 showing clearly inferior performance even though it is equipped with the same speed of memory (note that CPU clock is basically irrelevant to stream). A 933 MHz PIII equipped with RDRAM yields: rgb at b16|T:105>stream_gcc # Function Rate (MB/s) RMS time Min time Max time Copy: 441.1835 0.0369 0.0363 0.0372 Scale: 441.1966 0.0363 0.0363 0.0363 Add: 577.6035 0.0416 0.0416 0.0416 Triad: 360.9077 0.0666 0.0665 0.0667 which is a bit better than a PC133 Tbird but not as good (or anywhere near as cheap) as a PC2100 equipped Tbird. Hope somebody finds this useful. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Pentium IV Xeon memory bandwidth. Any experience?
- Next message: Pentium IV Xeon memory bandwidth. Any experience?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
