[Beowulf] What class of PDEs/numerical schemes suitable for GPU clusters
hbugge at platform.com
Fri Nov 21 00:55:05 PST 2008
Guess you're too humble ;-)
At 17:23 20.11.2008, Mark Hahn wrote:
>I'm happy for you, but to me, you're stacking
>the deck by comparing to a quite old CPU. you
>could break out the prices directly, but comparing 3x
>GPU (modern? sounds like pci-express at least)
>to a current entry-level cluster node (8
>core2/shanghai cores at 2.4-3.4 GHz) be more appropriate.
>at the VERY least, honesty requires comparing one GPU against all the cores
>in a current CPU chip. with your numbers, I
>expect that would change the speedup from 117 to
>around 15. still very respectable.
I compiled the serial hmm version using the
default make file (gcc -O2 -g) and ran it on an
Opetron 2220 (2.8 GHz). Then I compiled the MPI
version using Intel compiler 10.1 (icc -axS -O3),
and ran it on a not-yet-to-be-released two socket
machine using 16 MPI process. The latter ran 145x
times faster. So soon, the 15x is below 1x...
More information about the Beowulf