[Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mikhail Kuzminsky kus at free.netSat Oct 13 07:57:05 PDT 2007
- Previous message: [Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?
- Next message: [Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In message from Mark Hahn <hahn at mcmaster.ca> (Fri, 12 Oct 2007 16:09:05 -0400 (EDT)): >> This means that 2 additional FP results per cycle in >>microarchitecture gives >> only about 7% of performance increase :-( > >the 4 flops/cycle is really for linpack-like code: it assumes you are >executing packed double SIMD. Yes, but AFAIK most of the modern optimizing F9x compilers for x86 can generate codes w/SSEx instructions (instead of x87). And I assume that many real world codes, including some from SPECfp2006 set, includes the work w/floating point vectors. It's not necessary to have very long vectors - taking into account that 64 bit SSE vectors have length=2. Such things may gives theoretically 2x speedup ! >just that not all FP is SIMD-friendly, I think. Yes, I agree w/"not all". But 7% speedup means, I beleive, "very seldom FP codes" ? Yours Mikhail > if your code spends >a lot of time in blas/lapack functions, I would expect it to see good >speedup. > >regards, mark hahn.
- Previous message: [Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?
- Next message: [Beowulf] quad-core SPECfp2006: where are 4 FPresults/cycle ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
