Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mikhail Kuzminsky kus at free.net
Sat Oct 13 07:57:05 PDT 2007


In message from Mark Hahn <hahn at mcmaster.ca> (Fri, 12 Oct 2007 
16:09:05 -0400 (EDT)):
>> This means that 2 additional FP results per cycle in 
>>microarchitecture gives 
>> only about 7% of performance increase :-(
>
>the 4 flops/cycle is really for linpack-like code: it assumes you are 
>executing packed double SIMD.

Yes, but AFAIK most of the modern optimizing F9x compilers for x86 can 
generate codes w/SSEx instructions (instead of x87). And I assume that 
many real world codes, including some from SPECfp2006 set, includes 
the work w/floating point vectors. It's not necessary to have very 
long vectors - taking into account that 64 bit SSE vectors have 
length=2.
Such things may gives theoretically 2x speedup !  

>just that not all FP is SIMD-friendly, I think.
Yes, I agree w/"not all". But 7% speedup means, I beleive, "very 
seldom FP codes" ?

Yours
Mikhail

>  if your code spends 
>a lot of time in blas/lapack functions, I would expect it to see good 
>speedup.
>
>regards, mark hahn.




More information about the Beowulf mailing list