Athlon SDR/DDR stats for *specific* gaussian98 jobs

Josip Loncaric josip at
Thu May 3 07:49:57 PDT 2001

"Robert G. Brown" wrote:
> IIRC, somebody on the list (Josip Loncaric?) inserted prefetching into
> at least parts of ATLAS for use with athlons back when they were first
> released.  It apparently made a quite significant difference in
> performance.

It was not me (we have Pentiums).  However, prefetching and SSE
instructions should make a significant difference.  For example,
Portland Group suggests compiling LAPACK and BLAS with the following
switches (using PGI compilers release 3.2-4 and a SSE-enabled Linux
kernel, i.e. version 2.2.10 or later with the appropriate patches):

Pentium III: -fast -pc 64 -Mvect=sse -Mcache_align -Kieee
Athlon:      -fast -pc 64 -Mvect=prefetch -Kieee

The only exceptions are slmach.f and dlmach.f which must be compiled
using '-O0'.  Also, the main program should be compiled using the '-pc
64' (64-bit double precision format).

PGI says thatin some cases a 23% performance benefit can be obtained
when prefetch instructions are used.  This helps with both single- and
double-precision codes.

For single-precision codes only, the Pentium III SSE instructions can
deliver about 33% benefit.  Since SSE instructions operate only on
single-precision data that is aligned on cache-line boundaries,
enforcing this alignment with '-Mcache_align' produces an even better
61% gain over the original non-SSE code (says PGI).

Finally, the PGI release 3.2-4 also supports Pentium 4 SSE2 instructions
(-tp piv -Mvect=sse ...).


Dr. Josip Loncaric, Research Fellow               mailto:josip at
ICASE, Mail Stop 132C           PGP key at
NASA Langley Research Center             mailto:j.loncaric at
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134

More information about the Beowulf mailing list