[Fwd: Benchmarking Beowulf]
Greg Lindahl
lindahl@cs.virginia.edu
Thu, 24 Jun 1999 15:27:14 -0400
> I think that the term "peak performance" has come to mean
> "theoretical performance". As an example this would be 400 Mflop/s for
> a PII 400MHz because, in _theory_, it can do one floating point
> operation pr. clock cycle.
Yes. It's handy to call that number "MachoFLOPS" in order to make this
clear. "BogoMIPS" are similar.
> I think it is generally agreed that a more relevant measure of
> performance is the number of flops/s obtained by an optimized
> matrix-matrix multiply routine (such as _GEMM in the BLAS) on large
> matrices.
I disagree. Few real programs use a lot of GEMM, and the rates you get
for GEMM are often 3x or 4x the performance that you get on real
problems. You usually use assembly-language coded routines for GEMM,
but you use a compiler for real codes. If you used GEMM as your
measure, you'd think that the new alphas are only as fast as the old
ones. No, they're often 2x faster on real problems.
It is the case that GEMM performance is frequently used to compare
systems, but I would argue that this measure is almost as meaningless
as MachoFLOPS. The only nifty thing about the large Linpack is the
problem size at which 1/2 of peak performance is achieved. That's a
very interesting number.
> The reason is that _GEMM is the fundamental building
> block in modern software for numerical linear algebra, such as LAPACK
> / ScaLAPACK, which is the workhorse of many scientific codes.
Uhuh. In my years of scientific computing, I've never worked on a code
for which this was the case. I suppose there are fields for which
large-size GEMM performance is important, but I'm hard pressed to say
which...
-- greg