[Fwd: Benchmarking Beowulf]
R. Munk Larsen
rmunk@quake.Stanford.EDU
Thu, 24 Jun 1999 15:18:08 -0400
Dear Beowulfers,
I think that the term "peak performance" has come to mean
"theoretical performance". As an example this would be 400 Mflop/s for
a PII 400MHz because, in _theory_, it can do one floating point
operation pr. clock cycle. However, I suppose most of us know that
this has nothing to do with the performance that can be obtained by a
real program. In short: the term "peak performance" has been abused to
the extent that it has become meaningless.
I think it is generally agreed that a more relevant measure of
performance is the number of flops/s obtained by an optimized
matrix-matrix multiply routine (such as _GEMM in the BLAS) on large
matrices. This number gives an indication of how well the combined
memory<->bus<->cache<->CPU subsystems perform, and moreover gives a
realistic upper limit on the performance that can be obtained by a
real program. The reason is that _GEMM is the fundamental building
block in modern software for numerical linear algebra, such as LAPACK
/ ScaLAPACK, which is the workhorse of many scientific codes.
Just as an example, I measure approximately 310 Mflops/s when
computing a 1000-by-1000 matrix multiply in double precision on our
PII 450 nodes, using Greg Henry's ASCI Red BLAS routines
(http://www.cs.utk.edu/~ghenry/distrib). In other words, the actual
performance is approx. 70% of the theoretical performance, which is
not bad when you compare with other platforms.
Best regards,
Rasmus
-----------------------------------------------------
Dr. Rasmus Munk Larsen
Postdoctoral Fellow
SCCM & SOI/MDI, HEPL Annex A206
Stanford University, Stanford, CA 94305-4085
E-mail: rmunk@solen.stanford.edu
Phone : (650) 725-0449, FAX : (650) 725-2333
-----------------------------------------------------