Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Vincent Diepeveen diep at xs4all.nl
Wed Sep 17 17:37:07 PDT 2008


How does all this change when you use a PGO optimized executable on  
both sides?

Vincent

On Sep 18, 2008, at 2:34 AM, Eric Thibodeau wrote:

> Vincent Diepeveen wrote:
>> Nah,
>>
>> I guess he's referring to sometimes it's using single precision  
>> floating point
>> to get something done instead of double precision, and it tends to  
>> keep
>> sometimes stuff in registers.
>>
>> That isn't a problem necessarily, but if i remember well floating  
>> point state
>> could get wiped out when switching to SSE2.
>>
>> Sometimes you lose your FPU registerset in that case.
>>
>> Main problem is that there is so many dangerous optimizations  
>> possible,
>> to speedup testsets, because in itself floating point is real slow  
>> to do at hardware,
>> from hardware viewpoint seen.
>>
>> Yet in general last generations of intel compilers that has  
>> improved really a lot.
> Well, running the same code here is the result discrepancy I got:
> FLOPS:
>    my code has to do: 7,975,847,125,000 (~8Tflops) ...takes  
> 15minutes on 8*2core Opeteron with 32 Gigs-o-RAM (thank you OpenMP ;)
>
> The running times (ran it a _few_ times...but not the statistical  
> minimum of 30):
>    ICC -> runtime == 689.249  ; summed error == 1651.78
>    GCC -> runtime == 1134.404 ; summed error == 0.883501
>
> Compiler Flags:
>    icc -xW -openmp -O3 vqOpenMP.c -o vqOpenMP
>    gcc -lm -fopenmp -O3 -march=native vqOpenMP.c -o vqOpenMP_GCC
>
> No trickery, no smoky mirrors ;) Just a _huge_ kick ASS k-Means  
> parallelized with OpenMP (thank gawd, otherwise it takes hours to  
> run) and a rather big database of 1.4 Gigs
>
> ... So this is what I meant by floating point errors. Yes, the  
> runtime was almost halved by ICC (and this is on an *opteron* based  
> system, Tyan VX50). The running time wasn't what I was actually  
> looking for rather than precision skew and that's where I fell off  
> my chair.
>
> For the ones itching for a little more specs:
>
> eric at einstein ~ $ icc -V
> Intel(R) C Compiler for applications running on Intel(R) 64,  
> Version 10.1    Build 20080602
> Copyright (C) 1985-2008 Intel Corporation.  All rights reserved.
> FOR NON-COMMERCIAL USE ONLY
>
> eric at einstein ~ $ gcc -v
> Using built-in specs.
> Target: x86_64-pc-linux-gnu
> Configured with: /dev/shm/portage/sys-devel/gcc-4.3.1-r1/work/ 
> gcc-4.3.1/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/ 
> gcc-bin/4.3.1 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/ 
> include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1 -- 
> mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/man --infodir=/ 
> usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/info --with-gxx- 
> include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include/g++-v4 -- 
> host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable- 
> altivec --enable-nls --without-included-gettext --with-system-zlib  
> --disable-checking --disable-werror --enable-secureplt --enable- 
> multilib --enable-libmudflap --disable-libssp --enable-cld -- 
> disable-libgcj --enable-languages=c,c++,treelang,fortran --enable- 
> shared --enable-threads=posix --enable-__cxa_atexit --enable- 
> clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with- 
> pkgversion='Gentoo 4.3.1-r1 p1.1'
> Thread model: posix
> gcc version 4.3.1 (Gentoo 4.3.1-r1 p1.1)
>>
>> Vincent
>>
>> On Sep 17, 2008, at 10:25 PM, Greg Lindahl wrote:
>>
>>> On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote:
>>>
>>>> Also, note that I've had issues with icc
>>>> generating really fast but inaccurate code (fp model is not IEEE  
>>>> *by
>>>> default*, I am sure _everyone_ knows this and I am stating the  
>>>> obvious
>>>> here).
>>>
>>> All modern, high-performance compilers default that way. It's  
>>> certainly
>>> the case that sometimes it goes more horribly wrong than  
>>> necessary, but
>>> I wouldn't ding icc for this default. Compare results with IEEE  
>>> mode.
>>>
>>> -- greg
>>>
>
>




More information about the Beowulf mailing list