[Beowulf] Opinions of Hyper-threading?

Bill Broadley bill at cse.ucdavis.edu
Thu Feb 28 12:22:58 PST 2008


Mattijs Janssens wrote:
> How do your Rate numbers correlate to the max bandwitdh of 32GB/s 
> (http://en.wikipedia.org/wiki/GeForce_8_Series)?
> 
> Or do these threads all operate on the same data?

My first guess was some kind of caching, after all 2M floats is only 8MB.  But 
I couldn't reproduct it on my 8600GT so I'm guessing it's a timing issue.

I downloaded the source, compiled:
/usr/local/cuda/bin/nvcc -O3 -o stream stream.cu

Ran it:
./stream
  STREAM Benchmark implementation in CUDA
  Array size (single precision)=2000000
  using 128 threads per block, 15625 blocks
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       16596.1294       0.0010       0.0010       0.0010
Scale:      16581.7649       0.0010       0.0010       0.0010
Add:        18750.8822       0.0013       0.0013       0.0013
Triad:      18736.6081       0.0013       0.0013       0.0013

I maade the array 4 times bigger:
  STREAM Benchmark implementation in CUDA
  Array size (single precision)=8000000
  using 128 threads per block, 62500 blocks
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       16706.3212       0.0039       0.0038       0.0044
Scale:      16666.2770       0.0046       0.0038       0.0100
Add:        18408.0866       0.0053       0.0052       0.0056
Triad:      18738.6603       0.0052       0.0051       0.0055

Stream numbers that are 50% of marketing numbers seem relatively common.

I'm not that familiar with CUDA, this ran on a video card that happens to be 
driving my 1920x1200 display, I might get better numbers if I turned off
compiz, let alone X11.

Kudos to Nvidia for having a linux friendly toolchain that I could find, 
download, install, and compile a code with minimal hassle.




More information about the Beowulf mailing list