[Beowulf] AMD64 results...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Richard Walsh rbw at ahpcrc.orgThu Dec 16 07:16:45 PST 2004
- Previous message: [Beowulf] AMD64 results...
- Next message: [Beowulf] AMD64 results...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
All, Here are the data again comparing gcc, PGI, the Pathscale compilers on our cluster and Bill's Opteron with prefetching turned on in PGI and gcc as well. Our system has the same clock as Bill's, 2.2 GHz, but slower memory (PC2700). I have thrown in some X1 SSP timings are well. The numbers demonstrate the importance of explicitly asking for prefetching on the non-Pathscale compilers. Pathscale still comes out on top (at about half the X1 SSP rate) here, but the numbers are now much closer, and these differences may be somewhat accounted for by Bill's system's faster memory (PC32000 versus PC2700 for our system). I include the X1 single SSP data as well. Of course if you are focused on raw bandwidth, you should get numbers with and without prefetching otherwise you are silently including cache effects. The equivalent *one processo*r megaflop ratings for the triad data below are: gcc (noprefetch): 186 MFLOPs gcc (prefetch): 279 MFLOPs pgcc (prefetch): 300 MFLOPs pscalecc (prefetch): 347 MFLOPs x1cc (vector, 1ssp): 780 MFLOPs Dual processor ratings should be close to double these on the Opteron. So I expect one node (two CPUs) on the Opteron is almost equal one SSP on the X1. Enjoy and prefetch! rbw gcc-3.2.3 -O4 -Wall -pedantic: Function Rate (MB/s) RMS time Min time Max time Copy: 2004.8056 0.0095 0.0080 0.0099 Scale: 2044.7551 0.0099 0.0078 0.0105 Add: 2272.3092 0.0133 0.0106 0.0137 Triad: 2237.3599 0.0134 0.0107 0.0137 gcc-3.2.3 -O4 -fprefetch-loop-arrays -Wall -pedantic: Function Rate (MB/s) RMS time Min time Max time Copy: 3259.9273 0.0049 0.0049 0.0052 Scale: 3294.9803 0.0049 0.0049 0.0049 Add: 3306.7241 0.0073 0.0073 0.0073 Triad: 3349.1914 0.0072 0.0072 0.0072 pgcc -fast -Mvect=sse -Mnontemporal Function Rate (MB/s) RMS time Min time Max time Copy: 3227.6291 0.0050 0.0050 0.0052 Scale: 3210.1824 0.0050 0.0050 0.0050 Add: 3571.3935 0.0067 0.0067 0.0068 Triad: 3604.1280 0.0067 0.0067 0.0068 Pathscale-1.4 -O3 Function Rate (MB/s) Avg time Min time Max time Copy: 3764.6831 0.1540 0.1700 0.1800 Scale: 3764.6831 0.1530 0.1700 0.1700 Add: 4173.8781 0.2080 0.2300 0.2400 Triad: 4173.8781 0.2110 0.2300 0.2400 X1cc -c -h inline3,scalar3,vector3 -h stream0 Function Rate (MB/s) RMS time Min time Max time Copy: 7600.2280 0.0022 0.0021 0.0022 Scale: 7600.5529 0.0024 0.0021 0.0030 Add: 9259.1164 0.0026 0.0026 0.0027 Triad: 9360.5935 0.0026 0.0026 0.0026 Greg Lindahl wrote: >On Wed, Dec 15, 2004 at 06:29:56PM -0800, Bill Broadley wrote: > > > >>Kudos for the pathscale-1.4 compiler with -O3. >> >> > >Thank you! The not-so-secret secret is to use non-temporal stores, >which we do automagically where needed with plain -O3. > >-- greg > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > >
- Previous message: [Beowulf] AMD64 results...
- Next message: [Beowulf] AMD64 results...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
