[Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?
hahn at mcmaster.ca
Fri Oct 12 13:09:05 PDT 2007
> This means that 2 additional FP results per cycle in microarchitecture gives
> only about 7% of performance increase :-(
the 4 flops/cycle is really for linpack-like code: it assumes you are
executing packed double SIMD.
> The question is - should we wait some better results for new incoming
> optimizing compilers versions ? Or it is the reality - that 2 additional FP
> results per cycle gives (in average) relative small performance increase ?
just that not all FP is SIMD-friendly, I think. if your code spends a lot
of time in blas/lapack functions, I would expect it to see good speedup.
regards, mark hahn.
More information about the Beowulf