rather unfortunate article on Mac

W Bauske wsb at paralleldata.com
Fri Feb 1 12:09:06 PST 2002


Bill Broadley wrote:
> 
> Just figured after hearing the 15 Gflop number I'd do a reality check
> with stream.  I happen to have a dual g4-800 around, so I ran stream:
> 
>        copy scale add triad
> cc -O1  313   307 341   342
> cc -O2  319   306 341   342
> cc -03  321   307 341   342
> cc -O4  319   305 341   342
> 
> I happen to have a 1.2 Ghz athlon (pretty slow for these days) on
> a $65 motherboard:
> gcc -O1 677   660 760   680
> 
> At this rate the "15 GFlop" g4 can add 2 arrays at 28 Mflops single
> prevision, or 14 Mflops double precision.  About 1/2 of a low end
> budget athlon.
> 

That's what I meant about using Altivec with a compiler. Your test likely 
did not use that part of the chip. There are prefetch operations that
can speed up memory access substantially that go unused. All I've seen
though indicates you have to use either macros or assembler to get to
them. (On Linux at least) Similar to SSE/SSE2 before Intel's compilers were
available. (GCC now too I think)

Wes



More information about the Beowulf mailing list