Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] What class of PDEs/numerical schemes suitable for GPU clusters

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at mcmaster.ca
Thu Nov 20 08:23:31 PST 2008


> [shameless plug]
>
> A project I have spent some time with is showing 117x on a 3-GPU machine over 
> a single core of a host machine (3.0 GHz Opteron 2222).  The code is 
> mpihmmer, and the GPU version of it.  See http://www.mpihmmer.org for more 
> details.  Ping me offline if you need more info.
>
> [/shameless plug]

I'm happy for you, but to me, you're stacking the deck by comparing to a 
quite old CPU.  you could break out the prices directly, but comparing 3x
GPU (modern?  sounds like pci-express at least) to a current entry-level 
cluster node (8 core2/shanghai cores at 2.4-3.4 GHz) be more appropriate.

at the VERY least, honesty requires comparing one GPU against all the cores
in a current CPU chip.  with your numbers, I expect that would change the 
speedup from 117 to around 15.  still very respectable.

I apologize for not RTFcode, but does the host version of hmmer you're 
comparing with vectorize using SSE?

>> or more generally: fairly small data, accessed data-parallel or with very 
>> regular and limited sharing, with high work-per-data.
>
> ... not small data.  You can stream data.

can you sustain your 117x speedup if your data is in host memory?
by small, I meant the on-gpu-card memory, mainly, which is fast but 
often more limited than host memory.

sidebar: it's interesting that ram is incredibly cheap these days,
and we typically spec a middle-of-the-road machine at 2GB/core.
but even 4GB/core is not much more expensive, but to be honest,
the number of users who need that much is fairly small.

>> GP-GPU tools are currently immature, and IMO the hardware probably needs a 
>> generation of generalization before it becomes really widely used.
>
> Hrmm...  Cuda is pretty good.  Still needs some polish, but people can use 
> it, and are generating real apps from it.  We are seeing pretty wide use ... 
> I guess the issue is what one defines as "wide".

Cuda is NV-only, and forces the programmer to face a lot of limits and 
weaknesses.  at least I'm told so by our Cuda users - things like having 
to re-jigger code to avoid running out of registers.  from my perspective,
a random science prof is going to be fairly put off by that sort of thing
unless the workload is really impossible to do otherwise.  (compared to 
the traditional cluster+MPI approach, which is portable, scalable and 
at least short-term future-proof.)

thanks, mark.



More information about the Beowulf mailing list