[Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?

Mark Hahn hahn at mcmaster.ca
Fri Oct 12 13:09:05 PDT 2007

> This means that 2 additional FP results per cycle in microarchitecture gives 
> only about 7% of performance increase :-(

the 4 flops/cycle is really for linpack-like code: it assumes you are 
executing packed double SIMD.

> The question is - should we wait some better results for new incoming 
> optimizing compilers versions ? Or it is the reality - that 2 additional FP 
> results per cycle gives (in average) relative small performance increase ?

just that not all FP is SIMD-friendly, I think.  if your code spends a lot 
of time in blas/lapack functions, I would expect it to see good speedup.

regards, mark hahn.

