[Beowulf] Barcelona numbers

Bill Broadley bill at cse.ucdavis.edu
Mon Sep 10 12:45:23 PDT 2007


Finally the NDAs have expired and there's tons of technical info available on
many hardware websites.  I figured I'd post some more numbers.  In general
I'm impressed with the bandwidth (considering it uses the same dimms), and
with the parallelism in the memory system.

Dual socket quad core opteron 2350's (2.0 GHz) running the current McCalpin'S
STREAM compiled with pathscale-3.0 -mp -O4:
Total memory required = 228.9 MB.
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       15355.3139       0.0104       0.0104       0.0105
Scale:      15249.5885       0.0105       0.0105       0.0105
Add:        14954.2883       0.0161       0.0160       0.0162
Triad:      15061.2389       0.0160       0.0159       0.0160
-------------------------------------------------------------
Solution Validates


(no source changes except for the N= line).

I latency benchmark I wrote that for for each thread accesses
32 MB randomly via:

while (p != 0)
{
      p = a[p];
}

thread=0, 7.293 seconds, latency=108.48 ns
With 1 thread(s), max latency was 7.293 seconds, effective latency=108.48 ns.

thread=0, 7.300 seconds, latency=108.59 ns
thread=1, 7.290 seconds, latency=108.44 ns
With 2 thread(s), max latency was 7.300 seconds, effective latency=53.84 ns.

thread=0, 7.418 seconds, latency=110.35 ns
thread=1, 7.363 seconds, latency=109.53 ns
thread=2, 7.422 seconds, latency=110.41 ns
thread=3, 7.359 seconds, latency=109.47 ns
With 4 thread(s), max latency was 7.422 seconds, effective latency=26.91 ns.

thread=0, 7.417 seconds, latency=110.33 ns
thread=1, 7.394 seconds, latency=109.98 ns
thread=2, 7.433 seconds, latency=110.57 ns
thread=3, 7.382 seconds, latency=109.81 ns
thread=4, 7.448 seconds, latency=110.79 ns
thread=5, 7.379 seconds, latency=109.76 ns
thread=6, 7.443 seconds, latency=110.71 ns
thread=7, 7.411 seconds, latency=110.24 ns
With 8 thread(s), max latency was 7.448 seconds, effective latency=13.03 ns.

For comparison an opteron 270 (2.0 GHz as well):

thread=0, 6.301 seconds, latency=93.72 ns
With 1 thread(s), max latency was 6.301 seconds, effective latency=93.72 ns.

thread=0, 6.302 seconds, latency=93.73 ns
thread=1, 6.276 seconds, latency=93.35 ns
With 2 thread(s), max latency was 6.302 seconds, effective latency=46.47 ns.

thread=0, 12.071 seconds, latency=179.55 ns
thread=1, 12.084 seconds, latency=179.75 ns
thread=2, 11.952 seconds, latency=177.79 ns
thread=3, 12.055 seconds, latency=179.31 ns
With 4 thread(s), max latency was 12.488 seconds, effective latency=45.27 ns.






More information about the Beowulf mailing list