Intel performance with loops vs. vectors -
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Lombard david.lombard at mscsoftware.comFri Aug 11 11:36:01 PDT 2000
- Previous message: Intel performance with loops vs. vectors -
- Next message: BWBUG Meeting Aug 22nd. New Location!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Lechner, David" wrote: > > Has anyone done any direct comparison of performance of modern Intel or AMD > processors running floating point operations as both vector operations and > as simple loops? Does it make a difference any more? Some of the results > posted to this list seem to imply that loop calculations now run as fast as > vector op.s on today's microprocessors. We are considering some tests but > would appreciate any insight or comment people would be willing to provide. > I'm not quite sure what your question is. A "vector operation" implies the hardware has some sort of vector instructions. Intel has MMX and KNI instructions that provide specific operations for very short vectors, i.e., 4x32-bit, &etc. But, neither Intel nor AMD have a general purpose vector capability such as found on Cray or NEC systems. By "vector operations", do you mean, for example, calling a BLAS operation vs simply coding the loop directly? If you do, then the first point is that you've badly abused the terminology. At any rate, there could be an advantage using BLAS or other library routines if the library routines have advanced coding, either by guiding the compiler, such as is done by Atlas, or by writing the function in assembly language (as we do for our MSC.Nastran product), or both. Another possible interpretation is: are today's Intel and AMD processors as fast as vector systems? For that, one must apply the standard answer, "it depends". If you have heavy integer and scalar operations or other "poor" vector situations, then Intel can beat both Cray and NEC. If you have something that vectorizes well, you could hit 97% peak theoretical on a T90 (as do we on a matrix-matrix multiply), and Intel is very much slower. -- David N. Lombard MSC.Software
- Previous message: Intel performance with loops vs. vectors -
- Next message: BWBUG Meeting Aug 22nd. New Location!
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
