[Fwd: Benchmarking Beowulf]

R. Munk Larsen rmunk@quake.Stanford.EDU
Thu, 24 Jun 1999 15:18:08 -0400


Dear Beowulfers,

  I think that the term "peak performance" has come to mean
"theoretical performance". As an example this would be 400 Mflop/s for
a PII 400MHz because, in _theory_, it can do one floating point
operation pr. clock cycle. However, I suppose most of us know that
this has nothing to do with the performance that can be obtained by a
real program. In short: the term "peak performance" has been abused to
the extent that it has become meaningless.

  I think it is generally agreed that a more relevant measure of
performance is the number of flops/s obtained by an optimized
matrix-matrix multiply routine (such as _GEMM in the BLAS) on large
matrices. This number gives an indication of how well the combined
memory<->bus<->cache<->CPU subsystems perform, and moreover gives a
realistic upper limit on the performance that can be obtained by a
real program. The reason is that _GEMM is the fundamental building
block in modern software for numerical linear algebra, such as LAPACK
/ ScaLAPACK, which is the workhorse of many scientific codes.

  Just as an example, I measure approximately 310 Mflops/s when
computing a 1000-by-1000 matrix multiply in double precision on our
PII 450 nodes, using Greg Henry's ASCI Red BLAS routines
(http://www.cs.utk.edu/~ghenry/distrib). In other words, the actual
performance is approx. 70% of the theoretical performance, which is
not bad when you compare with other platforms.

Best regards,
Rasmus

-----------------------------------------------------
Dr. Rasmus Munk Larsen      
Postdoctoral Fellow 
SCCM & SOI/MDI, HEPL Annex A206
Stanford University,  Stanford, CA 94305-4085
E-mail: rmunk@solen.stanford.edu 
Phone : (650) 725-0449,  FAX   : (650) 725-2333 
-----------------------------------------------------