[Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?
    Mark Hahn 
    hahn at mcmaster.ca
       
    Fri Oct 12 13:09:05 PDT 2007
    
    
  
> This means that 2 additional FP results per cycle in microarchitecture gives 
> only about 7% of performance increase :-(
the 4 flops/cycle is really for linpack-like code: it assumes you are 
executing packed double SIMD.
> The question is - should we wait some better results for new incoming 
> optimizing compilers versions ? Or it is the reality - that 2 additional FP 
> results per cycle gives (in average) relative small performance increase ?
just that not all FP is SIMD-friendly, I think.  if your code spends a lot 
of time in blas/lapack functions, I would expect it to see good speedup.
regards, mark hahn.
    
    
More information about the Beowulf
mailing list