[Beowulf] Re: GPU boards and cluster servers

Thu Sep 11 08:23:10 PDT 2008

Hi Jon and list

I am trying to work on a similar project here.
With tight funding, I had to go convince the director to buy one NVidia 
GeForce 9800 GTX
(512MB memory) for testing.
The card cost about $200 (before taxes). 
You may find better prices on Newegg and other places.
If your budget is tighter than this, you may buy an GeForce 8800 GT for 
less, and still do useful work.
Beware of the  number of processors and the "compute capability" (see 
below) of the card you buy.
They vary even within a same card series (8800 has a lot of different 
models).

I am hoping to get a bit of spare time to work on a few programs to make 
the case for it.
Or not, as these things seem to suck a lot of power, and the utility 
bill may drive
the sponsor of my project mad,
if we outfit all workstations here with these video cards.  :)
The CUDA API doesn't look as friendly as, say, OpenMP or MPI, and the 
the time invested
in programming may only be worth on a pilot project.
Or things may get better if a holy grail type of great unification API,
such as the OpenCL that you mentioned, comes true.
Anyway, if say, I can accelerate Matlab, it is already a big deal for a 
lot of people here
that use Matlab to do small programming projects and data analysis.
Like it or not, Matlab is the "de facto" programming language for the 
vast majority of scientists
and science students.
For code that uses FFTs or BLAS in a regular pattern / loop, porting it 
to use CUDA/GPU
shouldn't be very hard, as CUDA has libraries for both.

You need to make sure your computer supports the card you buy (unless 
you want to buy a new computer).
Requirements vary according to the card, and most vendors / manufacturers
post the specs and requirements on their web sites.
For the card I bought the minimum was a 500W power supply with two  PCIe 
6-pin connectors.
However, this card came with two molex-to-pci-e-6-pin adapters
(which I didn't use, my PS had the needed PCIe connectors).
Higher end cards (9800 GX2, GTX 260 and 280) seem to require 8-pin PCIe 
connectors,
and probably require a beefed up power supply.
I don't know if there are molex adapters for PCIe 8-pin connectors.
My card also required one PCIe 16x slot available.  I'd guess this is 
what most cards require.
However, the card is thick and knocks out the space of the next PCI(e) 
slot (on my mobo a PCIe 8x).
Make sure you have this much room to spare on your chassis.
Mine is a workstation tower, but for rackmount chassis you may need a 
riser card, etc.
I would guess these cards won't fit a 1U chassis, but I may be wrong.
The motherboard can have either a PCIe 1.1 or 2.0 bus, but the card will 
work at the lower
data rate available.
Check the FAQ on the PCIe site about this:

http://www.pcisig.com/news_room/faqs/pcie2.0_faq/

My mobo has PCIe 1.1, and only very recent ones seem to be 2.0.
So, performance may not be stellar, but hopefully it will be OK.

You may want to check the "CUDA enabled" card capabilities on the NVidia 
site.
The CUDA Programming Guide Appendix A has the details
such as number of processors and "compute capability" (in a nutshell, 
1.0 is the basic 32-bit capability,
1.1 and 1.2 add a bit of functionality, 1.3 adds double precision support).
See:

http://www.nvidia.com/object/cuda_develop.html 
under "Documentation".

Also, it is worth taking a look at the NVidia  "CUDA on Linux" forum:

http://forums.nvidia.com/index.php?s=13c4ff7c2bee768ff185e581fa17ff24&showforum=68

I posted questions about hardware requirements and software 
compatibility there
(my card is under Fedora Core 8),
and have got very helpful answers:

http://forums.nvidia.com/index.php?showtopic=72798

I hope this helps.
Gus Correa

-- 
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Jon Forrest wrote:

> Greg Lindahl wrote:
>
>>
>> Well, then, why don't you run it on a low-end card that you already
>> have (finite/free = infinity)? If you aren't going to bother to
>> constrain the problem, you're going to get bogus answers.
>
>
> Easy. Because I don't already have a low-end card.
>
> What I'm going to try to do is to be able to show
> the faculty and grad students around here how
> easy it is to get a significant performance improvement
> by using CUDA as compared to using their normal
> i386 or x86_64 processors. The actual performance
> improvement isn't that important because even if it's
> just a 2X improvement it will be easy to justify.
> I'm expecting it to be a lot more because much
> of what goes on around here has already been ported
> and summarized on the CUDA web site with >=10X improvements.
>
> Then, once I've hooked the faculty I'll get them to buy
> a high-end card to get maximum performance.
>