[Beowulf] Barcelona numbers

Bill Broadley bill at cse.ucdavis.edu
Mon Sep 10 17:34:42 PDT 2007


Vincent Diepeveen wrote:
> that simple C program that measures latency,
> can you try it with a more realistic working set size also
> to measure RAM latency, so with like 2GB in total or so?

I think it measures RAM latency quite well, but doesn't exercise the TLB as
hard as a 2GB dataset would.  8 Thread randomly accessing 2GB is a TLB
nightmare.  I do not believe the kernel I'm using has the 1GB pages
available on the barcelona chips.

In any case, sure I'll run 2GB numbers.

Opteron 2350 (2.0 GHz):
pathcc -O4 -mp stream.c -o stream
Total memory required = 2014.2 MB.
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       15328.3395       0.0921       0.0919       0.0922
Scale:      15297.8845       0.0921       0.0920       0.0922
Add:        14787.7337       0.1432       0.1428       0.1437
Triad:      15067.3052       0.1403       0.1402       0.1404
-------------------------------------------------------------
Solution Validates

gcc -c -O4 -Wall -pedantic plat.c
gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma
Each thread accesses 67108864 INTs in a 256 MB array.
With 1 thread(s), max latency was 9.174 seconds, effective latency=136.70 ns.
With 2 thread(s), max latency was 9.186 seconds, effective latency=68.44 ns.
With 4 thread(s), max latency was 9.763 seconds, effective latency=36.37 ns.
With 8 thread(s), max latency was 10.589 seconds, effective latency=19.72 ns.

Opteron 275 (2.2 GHz):
pathcc -O4 -mp stream.c -o stream
Total memory required = 2014.2 MB.
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        8607.2317       0.0189       0.0186       0.0215
Scale:       8637.8088       0.0186       0.0185       0.0186
Add:         8249.3994       0.0291       0.0291       0.0292
Triad:       8244.0621       0.0301       0.0291       0.0372

gcc -c -O4 -Wall -pedantic plat.c
gcc -o plat -O4 -Wall -pedantic plat.o -lpthread -lm -lnuma
Each thread accesses 67108864 INTs in a 256 MB array.
With 1 thread(s), max latency was 7.737 seconds, effective latency=115.29 ns.
With 2 thread(s), max latency was 7.722 seconds, effective latency=57.53 ns.
With 4 thread(s), max latency was 16.174 seconds, effective latency=60.25 ns.

Previously when the opteron DDR-2 systems were newish a fair number of people
posted stream numbers for the opterons and intels of the time.  My vague
memory was that intel was in the 7-9GB/sec and the ddr-2 opterons were in the
12.5-13.0GB/sec range.






More information about the Beowulf mailing list