Pentium IV Xeon memory bandwidth. Any experience?

Robert G. Brown rgb at phy.duke.edu
Mon Jun 25 09:35:10 PDT 2001


On Mon, 25 Jun 2001, Greg Lindahl wrote:

> On Mon, Jun 25, 2001 at 02:19:15PM +0200, Thomas Guignon wrote:
>
> > We have tested and 1.2 Ghz with PC2100 DDR with Level 1 Blas and results are
> > quite nice:
> > -dnrm2: (one vector read)
> > 1450 10^6 B/s
> > -ddot: (two vector read)
> > 1040 10^6 B/s
> > -daxpy (one vector read and one vector read/write)
> > 1150 10^6 B/s
> > - copy  (one vector read and one vector write)
> > 990 10^6 B/s
>
> These numbers look like they are for vectors that fit into cache.
>
> What does the STREAM benchmark report for this board? I bet it's
> substantially slower, and the person asking the question wanted main
> memory bandwidth, not cached bandwidth.

Greg,

Here is stream for a 1.33 GHz Tbird with PC2100:

rgb at ganesh|T:108>stream_gcc
# Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         608.4571       0.0619       0.0263       0.1465
Scale:        497.4804       0.0778       0.0322       0.1523
Add:          658.1838       0.0779       0.0365       0.1368
Triad:        587.9196       0.1086       0.0408       0.1614

For comparison here is stream for a 1.33 GHz Tbird with PC133

rgb at g15|T:104>stream_gcc
# Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         392.4357       0.0408       0.0408       0.0412
Scale:        395.3837       0.0405       0.0405       0.0405
Add:          431.7950       0.0557       0.0556       0.0560
Triad:        430.1768       0.0559       0.0558       0.0562

About what you'd expect:  200 (2 x 100 MHz)/133 = 1.5

608/392 = 1.55
497/395 = 1.26
658/431 = 1.53
587/430 = 1.37

The PC2100 is about 50% faster than PC133 and hence so are the streaming
float rates out where memory is the bottleneck.

This doesn't use the Athlon prefetch, though.  This might be responsible
for the higher numbers seen above.

Which leads to the very practical question:  How does one use the Athlon
prefetch?  Is there a compiler option?  Or does one have to code in
assembler?  Where would one find out -- my local AMD rep was learning
about this from me instead of the other way around, so clearly "ask your
AMD rep" is a bad answer to this...;-)

   rgb

P.S.  For further comparison, stream run on a 933 MHz PIII with PC133

rgb at parvati|T:103>stream_gcc
# Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         289.6763       0.0553       0.0552       0.0556
Scale:        325.9658       0.0492       0.0491       0.0494
Add:          360.1278       0.0667       0.0666       0.0667
Triad:        304.0168       0.0790       0.0789       0.0793

showing clearly inferior performance even though it is equipped with the
same speed of memory (note that CPU clock is basically irrelevant to
stream).  A 933 MHz PIII equipped with RDRAM yields:

rgb at b16|T:105>stream_gcc
# Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         441.1835       0.0369       0.0363       0.0372
Scale:        441.1966       0.0363       0.0363       0.0363
Add:          577.6035       0.0416       0.0416       0.0416
Triad:        360.9077       0.0666       0.0665       0.0667

which is a bit better than a PC133 Tbird but not as good (or anywhere
near as cheap) as a PC2100 equipped Tbird.

Hope somebody finds this useful.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list