[Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists"
diep at xs4all.nl
Mon Jun 23 14:57:06 PDT 2008
The architecture of AMD versus Nvidia is quite different. I would
encourage each manufacturer to have their own system.
So to speak AMD is a low clocked core2 supercomputer @ 64 cores,
versus nvidia a 240 processor mips supercomputer.
I feel the real limitation is that the achievement of the GPU's only
exist on marketing paper.
I can also claim my cars engine is capable of driving 20000 miles per
Sure it is, in space!
A PC processors is just as good as its caches and memory controller is.
There is too little technical data about GPU's with respect to
bottlenecks, whereas bottlenecks will dominate such hardware of course.
If you KNOW what is a bottleneck, then in a theoretic model you can
work around it.
The few reports from individuals who work fulltime with GPU's who
tried writing some number crunching code for it,
yes even 32 bits number crunching codes, the practical odds for
succes for an individual programmer is too small currently for GPU's.
It is a fact that to program those things well, you first need to be
hell of a programmer. Those hell of programmers know very well
that you need full technical information, even if that means bad news
for Nvidia and AMD as suddenly the GPU's look a lot weaker then.
If there is technical specifications that also show the bottlenecks
then the algorithmic strong among us (trying to not look too much
outside of the window),
will find some clever solutions to get a specific thing done.
This is all paper work.
If there is on paper an algorithmic solution, or even a method of how
to get something done, then there will be programmers implementing it,
as they see no risks. they just see that solution that is gonna give
them more crunching power.
It is all about risk assesment from programmers viewpoint.
Right now the only thing he knows is big bragging stories, he of
course realizes that if you do something within the register files of
that this can be fast, other than that he knows that in the first
place it is a GPU meant for displaying graphics and not especially
just to do numbercrunching.
If you go for a platform into the deep, so without information, you
just don't do it.
At the time, if you went for SSE/SSE2 assembler code, you knew full
specs of it, every instruction, every latency of every instruction
and so on.
To take the step to CONSIDER writing something on a GPU means that
the programmer in question is already a total hardcore addict;
you really want to get the ultimate achievement out of the hardware
to achieve your numbercrunching. The same is true for SSE/SSE2.
I would argue writing SSE2 code is tougher than writing for a GPU,
from implementation viewpoint seen,
under the condition that you DO have a parallel model how to get
things done on a GPU.
The number of people who know how to write a parallel model on paper
that theoretical works and gets the maximum out of crunching
hardware that is non trivial to parallellize is just real little. If
within a specific specialism that is more than a dozen, that's a lot
The number of good programmers who can write you that code, in
whatever language, is little compared to the total number of
but real huge compared to algorithmic designers who are expert in
It will not take long until such solutions are simply posted on the
net. That might increase the number of people who toy with GPU's.
In itself Seymour Crays statement, "If you were plowing a field,
which would you rather use? Two strong oxen or 1024
chickens? " is very true from practical viewpoint; it is simpler
to work with 4 cores than with 64, let alone 240,
but objectively of course such a majority should be able to beat 2
strong oxen. Doesn't mean it is simple to do it.
So the number of persons who start writing solutions there you can
really count on a few hands. Most of them currently really are
a few students who tried at a card which delivers already in
marketing value such tiny amounts of single precision gflops,
that it really must get seen as a hobby project of a student who just
learns advanced programming a tad better,
as their quadcore with existing highly optimized free software is for
In itself that is very weird, as in itself there is not really anyone
who doubts that in the long run many tiny processors are gonna win it
for number crunching.
On Jun 23, 2008, at 6:44 PM, Bogdan Costescu wrote:
> On Wed, 18 Jun 2008, Prentice Bisbal wrote:
>> The biggest hindrance to doing "real" work with GPUs is the lack
>> of dual-precision capabilities.
> I think that the biggest hindrance is a unified API or language for
> all these accelerators (taking into account not only the GPUs !).
> Many developers are probably scared that their code depends on the
> whim of the accelerator producer in terms of long-term
> compatibility of the source code with the API or language or of the
> binary code with the available hardware; sure, you can't prevent
> the hardware being obsoleted or the company from going out of
> bussiness, but if you're only one recompilation away it's manageable.
> At the last week's ISC'08, after the ATI/AMD and NVidia talks,
> someone asked a NVidia guy about any plans of unification with at
> least ATI/AMD on this front and the answer was "we're not there
> yet"... while the ATI/AMD presentation went on to say "we learnt
> from mistakes with our past implementations and we present you now
> with OpenCL" - yet another way of programming their GPU...
> I see this situation very similar to the SSE vs. 3Dnow of some
> years ago or the one before MPI came to replace all the proprietary
> communication libraries. Anybody else shares this view ?
> Bogdan Costescu
> IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
> Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
> E-mail: bogdan.costescu at iwr.uni-heidelberg.de
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf