Benchmarking L2 cache on the Alpha 21264

Mon Jun 5 06:07:16 PDT 2000

Hi everybody,

I am having a problem benchmarking the L2 cache performance on some
Alpha 21264 systems from our clusters and wondering if anybody else
has seen this. We use a benchmark that models the kernel of our main
application (computational physics/lattice gauge theory). When running
in L1 cache or beyond L2 cache, it gives perfectly consistent readings
with deviations of 1% or less. But in L2 cache, the numbers from
different runs may be off by as much as 20%, for which I cannot find a
good explanation. If I plot performance vs. memory footprint, there is
a clear shoulder from the L1 cache (64 KB), but then a kind of
logarithmic behavior (double the memory use loses 30 MFlops).

The benchmark consists of a completely deterministic set of
floating-point operations, and I use a version that accesses memory
completely consecutively. The systems are Compaq DS10 (466 MHz single
proc.), ES40 (666 MHz 4-proc.), and API UP2000 (666 MHz dbl. proc.)
under Linux. I did not see this effect under Tru64 on a XP1000 (666
MHz single proc.).

The question is: Is there anything either in Linux or the 21264 that
could account for such behavior? Could the cache be polluted by other
processes that effectively? (The machines were basically idle during
benchmarks).

In particular, it seems that code running just inside the L2 cache (4
MB on the UP2000 and ES40) is not performing much better than code in
main memory, which would be a pity. We expect cache performance to be
a major determinant of total performance for our application: in L1
cache, the performance is about 600 MFlops, outside L2 cache it drops
to about 200 MFlops. Inside L2 it varies between 300 and 450 MFlops.

Thanks
-Chris
-- 
Christoph Best                                        c.best at computer.org
John von Neumann Institute for Computing/DESY   http://www.oche.de/~cbest