BLAS-1, AMD, Pentium, gcc
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Don Holmgren djholm at fnal.govFri Apr 12 10:36:22 PDT 2002
- Previous message: BLAS-1, AMD, Pentium, gcc
- Next message: BLAS-1, AMD, Pentium, gcc
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 12 Apr 2002, Hung Jung Lu wrote:
> Hi,
>
> I am thinking in migrating some calculation programs
> from Windows to Linux, maybe eventually using a
> Beowulf cluster. However, I am kind of worried after I
> read in the mailing list archive about lack of
> CPU-optimized BLAS-1 code in Linux systems. Currently
> I run on a Wintel (Windows+Pentium) machine, and I
> know it's substantially faster than equivalent AMD
> machine, because I use the Intel's BLAS (MKL) library.
> (I apologize for any misapprehensions in what
> follows... I am only starting to explore in this
> arena.)
>
> (1) Does anyone know when gcc will have memory
> prefetching features? Any time frame? I can notice
> very significant performance improvement on my Wintel
> machine, and I think it's due to memory prefetching.
If you mean, "when will gcc's optimizer do automatic prefetching?", I
have no idea. But, many programmers have been doing manual prefetching
with gcc for quite a while. If you don't mind defining and using
assembler macros, gcc handles it just fine now. Here's an example:
#define prefetch_loc(addr) \
__asm__ __volatile__ ("prefetchnta %0" \
: \
: \
"m" (*(((char*)(((unsigned int)(addr))&~0x7f)))))
> (2) I am a bit confused on the following issue: Intel
> does release MKL for Linux. So, does this mean that if
> I use Pentium, I still get full benefit of the
> CPU-optimized features in BLAS-1, despite of gcc does
> not do memory prefetching? How is this possible?
The Intel compiler produces object files compatible with gcc, and vice
versa. I would assume they implemented the library with the Intel
compiler, which has full SSE/SSE2 support (including prefetching). They
list the MKL for Linux as compatible with both gnu and Intel compilers.
> (3) Related to the above: for general linear algebra
> operations, is Pentium processor then better than AMD,
> since Intel has the machine-optimized BLAS library? I
> get contradictory information sometimes... I've seen
> somewhere that Pentium-4 compares unfavorably with AMD
> chips in calculation speed... Any opinions?
>
> thanks,
>
> Hung Jung Lu
For the very simple SU3 linear algebra (3X3 complex matrices and 3X1
complex vectors) used in our codes, the Pentium 4 outperforms the Athlon
on most of our SSE-assisted routines. See the table near the bottom of
http://qcdhome.fnal.gov/sse/inline.html
for Mflops per gigahertz on various routines for P-III, P4, and Athlon.
Perhaps re-coding in 3DNow! would give the Athlon a boost.
For our codes, which are bound by memory bandwidth, P4's do
significantly better than Athlons because of the faster front side bus
(400 Mhz effective). See
http://qcdhome.fnal.gov/qcdstream/compare.qcdstream
for a table comparing memory bandwidth and SU3 linear algebra
performance on a 1.2 GHz Athlon, 1.4 GHz P4, and 1.7 GHz P7 (see
http://qcdhome.fnal.gov/qcdstream/
for information about this benchmark).
Don Holmgren
Fermilab
- Previous message: BLAS-1, AMD, Pentium, gcc
- Next message: BLAS-1, AMD, Pentium, gcc
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
