[Beowulf] GPU question
gus at ldeo.columbia.edu
Mon Aug 31 09:28:43 PDT 2009
1. Beware of hardware requirements, specially on your existing
computers, which may or may not fit a CUDA-ready GPU.
Otherwise you may end up with a useless lemon.
A) Not all NVidia graphic cards are CUDA-ready.
NVidia has lists telling which GPUs are CUDA-ready,
which are not.
B) Check all the GPU hardware requirements in detail: motherboard,
PCIe version and slot, power supply capacity and connectors, etc.
See the various GPU models on NVidia site, and
the product specs from the specific vendor you choose.
C) You need a free PCIe slot, most likely 16x, IIRR.
D) Most GPU card models are quite thick, and take up its
own PCIe slot and cover the neighbor slot, which cannot be used.
Hence, if your motherboard is already crowded, make sure
everything will fit.
For rackmount a chassis you may need at least 2U height.
On a tower PC chassis this shouldn't be a problem.
You may need some type of riser card if you plan to mount the GPU
parallel to the motherboard.
E) If I remember right, you need PCIe version 1.5 (?)
or version 2 on your motherboard.
F) You also need a power supply with enough extra power to feed
the GPU beast.
The GPU model specs should tell you how much power you need.
Most likely a 600W PS or larger, specially if you have a dual socket
server motherboard with lots of memory, disks, etc to feed.
G) Depending on the CUDA-ready GPU card,
the low end ones require 6-pin PCIe power connectors
from the power supply.
The higher end models require 8-pin power supply PCIe connectors.
You may find and buy molex-to-PCIe connector adapters also,
so that you can use the molex (i.e. ATA disk power connectors)
if your PS doesn't have the PCIe connectors.
However, you need to have enough power to feed the GPU and the system,
no matter what.
2. Before buying a lot of hardware, I would experiment first with a
single GPU on a standalone PC or server (that fits the HW requirements),
to check how much programming it takes,
and what performance boost you can extract from CUDA/GPU.
CUDA requires quite a bit of logistics of
shipping data between memory, GPU, CPU,
It is perhaps more challenging to program than, say,
parallelizing a serial program with MPI, for instance.
Codes that are heavy in FFTs or linear algebra operations are probably
good candidates, as there are CUDA libraries for both.
At some point only 32-bit floating point arrays would take advantage of
CUDA/GPU, but not 64-bit arrays.
The latter would
require additional programming to change between 64/32 bit
when going to and coming back from the GPU.
Not sure if this still holds true,
newer GPU models may have efficient 64-bit capability,
but it is worth checking this out, including if performance for
64-bit is as good as for 32-bit.
3. PGI compilers version 9 came out with "GPU directives/pragmas"
that are akin to the OpenMPI directives/pragmas,
and may simplify the use of CUDA/GPU.
At least before the promised OpenCL comes out.
Check the PGI web site.
Note that this will give you intra-node parallelism exploring the GPU,
just like OpenMP does using threads on the CPU/cores.
4. CUDA + MPI may be quite a challenge to program.
I hope this helps,
amjad ali wrote:
> Hello all, specially Gil Brandao
> Actually I want to start CUDA programming for my |C.I have 2 options to do:
> 1) Buy a new PC that will have 1 or 2 CPUs and 2 or 4 GPUs.
> 2) Add 1 GPUs to each of the Four nodes of my PC-Cluster.
> Which one is more "natural" and "practical" way?
> Does a program written for any one of the above will work fine on the
> other? or we have to re-program for the other?
> On Sat, Aug 29, 2009 at 5:48 PM, <madskaddie at gmail.com
> <mailto:madskaddie at gmail.com>> wrote:
> On Sat, Aug 29, 2009 at 8:42 AM, amjad ali<amjad11 at gmail.com
> <mailto:amjad11 at gmail.com>> wrote:
> > Hello All,
> > I perceive following computing setups for GP-GPUs,
> > 1) ONE PC with ONE CPU and ONE GPU,
> > 2) ONE PC with more than one CPUs and ONE GPU
> > 3) ONE PC with one CPU and more than ONE GPUs
> > 4) ONE PC with TWO CPUs (e.g. Xeon Nehalems) and more than
> ONE GPUs
> > (e.g. Nvidia C1060)
> > 5) Cluster of PCs with each node having ONE CPU and ONE GPU
> > 6) Cluster of PCs with each node having more than one CPUs
> and ONE GPU
> > 7) Cluster of PCs with each node having ONE CPU and more
> than ONE GPUs
> > 8) Cluster of PCs with each node having more than one CPUs
> and more
> > than ONE GPUs.
> > Which of these are good/realistic/practical; which are not? Which
> are quite
> > “natural” to use for CUDA based programs?
> CUDA is kind of new technology, so I don't think there is a "natural
> use" yet, though I read that there people doing CUDA+MPI and there are
> papers on CPU+GPU algorithms.
> > IMPORTANT QUESTION: Will a cuda based program will be equally
> good for
> > some/all of these setups or we need to write different CUDA based
> > for each of these setups to get good efficiency?
> There is no "one size fits all" answer to your question. If you never
> developed with CUDA, buy one GPU an try it. If it fits your problems,
> scale it with the approach that makes you more comfortable (but
> remember that scaling means: making bigger problems or having more
> users). If you want a rule of thumb: your code must be
> _truly_parallel_. If you are buying for someone else, remember that
> this is a niche. The hole thing is starting, I don't thing there isn't
> many people that needs much more 1 or 2 GPUs.
> > Comments are welcome also for AMD/ATI FireStream.
> put it on hold until OpenCL takes of (in the real sense, not in
> "standards papers" sense), otherwise you will have to learn another
> technology that even fewer people knows.
> Gil Brandao
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf