> But if I'll compare SPECfp2006 results w/x86-64 microarchitecture 
> w/2*64 bit FP results per cycle - previous Opteron generation - I'll 
> see some strange (IMHO) result. So, for Opteron 2222SE/3 Ghz, AMD 
> SPECfp2006 values are 15.2/14.3. But Xeon 5160, having 4 FP results 
> per cycle, w/same 3.0 Ghz gives very close values - 15.6/15.4 ! 
> This means that 2 additional FP results per cycle in microarchitecture 
> gives only about 7% of performance increase :-( 

I am not sure I fully understand what you are presenting here, but I might say that yes, at the FPU unit level the 2222 series AMD Opteron/Barcelona and the Intel Core2/Clovertown (and also Harpertown at 45 nm) are now more largely equivalent -- that is they both can execute 2, double-wide (2x64 bit) floats in certain FMA situtations simultaneously and/or in a pipeline. And this feature could be used to compute clock x 4 64-bit flop peaks if you work in the marketing department.  This was not true with the earlier Opteron which had to serialize each 64-bit piece of the 128-bit floating point operation.  
You might therefore conclude that from registers the two processors at the same clock should perform equally, but there are other issues.  One big one is instruction width and issue rate.  The Opteron (both 940 and 1207) are only three-wide processors while the Core2 is four-wide giving it a wider aperature through which to schedule two 128-bit SSEs side-by-side.  Different compilers or older revs could also make a difference as you suggest.
The philosophical message is of course that there are no two apples alike, or even more radically that the concept of identity is fundamentally flawed ... ;-) ...

"Making predictions is hard, especially about the future." 

Niels Bohr 


