[Beowulf] AMD64 results...

Kozin, I (Igor) i.kozin at dl.ac.uk
Thu Dec 16 05:27:33 PST 2004


Hi Bill, very interesting results.
 
> Ah, got icc-8.1 to cooperate, dual 2.2 Ghz opteron+pc3200+2.4 kernel,
> 915.5MB array:
> -O1
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        2285.8039       0.2640       0.2800       0.3200
> Scale:       2206.9798       0.2690       0.2900       0.3000
> Add:         2341.5554       0.3740       0.4100       0.4200
> Triad:       2181.9031       0.4060       0.4400       0.4800
> 
> -O2
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        2370.4856       0.2570       0.2700       0.3400
> Scale:       2285.8280       0.2670       0.2800       0.3400
> Add:         2461.6513       0.3710       0.3900       0.4600
> Triad:       2285.8229       0.3920       0.4200       0.5000

pls note that your "average time" is sometimes less than "min time".
 
> -O3 
> Function      Rate (MB/s)   Avg time     Min time     Max time
> Copy:        2461.5867       0.2730       0.2600       0.3400
> Scale:       2370.4237       0.2910       0.2700       0.3500
> Add:         2526.3684       0.4050       0.3800       0.4800
> Triad:       2341.5151       0.4320       0.4100       0.5100
> 
> The strange thing is they are 32 bit binaries, despite being built
> on a 64 bit os on a 64 bit hardware.

how do you know they are not 64bit? From what I see it is.

quad 2.2 Opteron, 9 GB, SLES 9, 2.4.21
it seems my memory is a bit slower than yours.
      PARAMETER (n=32000000,offset=0,ndim=n+offset,ntimes=50)
i.e. using  732 MB

pathscale 1.4 -O3
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       3555.5778      0.1444      0.1440      0.1450
Scale:      3483.0084      0.1473      0.1470      0.1480
Add:        3588.8372      0.2142      0.2140      0.2150
Triad:      3605.6772      0.2134      0.2130      0.2140

ifort -O3 -xW
Copy:       3657.1588      0.1458      0.1400      0.1500
Scale:      3657.1588      0.1475      0.1400      0.1500
Add:        3490.9503      0.2273      0.2200      0.2300
Triad:      3339.1509      0.2317      0.2300      0.2400

> Not sure why the timer is so lousy,
> I had to make the array large to get a reasonably accurate time:

This is indeed another interesting point. I'd really like to understand it.
In addition when I re-run stream the rates vary quite a bit despite 
the high loop count (50) and very small std dev (min & max are pretty close).

e.g. two more times  ifort -O3 -xW
Copy:       2560.0146      0.2094      0.2000      0.2200
Scale:      2560.0146      0.2094      0.2000      0.2200
Add:        2477.4389      0.3219      0.3100      0.3300
Triad:      2400.0023      0.3285      0.3200      0.3300

Copy:       3657.1588      0.1454      0.1400      0.1500
Scale:      3657.1588      0.1473      0.1400      0.1500
Add:        3490.9503      0.2256      0.2200      0.2300
Triad:      3339.1371      0.2300      0.2300      0.2300

Igor
 
> I played around with various mentioned optimizations (including -xW)
> on the manpage, I never managed a 64 bit binary with icc-8.1 though.
> The man page has numerous i32em and em64t references.
> 
> 
> 
> 
> -- 
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

I. Kozin  (i.kozin at dl.ac.uk)
CCLRC Daresbury Laboratory
tel: 01925 603308
http://www.cse.clrc.ac.uk/disco




More information about the Beowulf mailing list