[Beowulf] Nvidia, cuda, tesla and... where's my double floating point?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Prentice Bisbal prentice at ias.eduMon Jun 16 08:38:44 PDT 2008
- Previous message: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?
- Next message: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Vincent Diepeveen wrote:
>
> That has to change in order to get GPU calculations more into mainstream.
>
> When i calculate on paper for some applications, a GPU can be potentially
> factor 4-8 faster than a standard quadcore 2.4ghz is right now.
>
> Getting that performance out of the GPU is more than a fulltime task
> however,
> without having indepth technical hardware data on the GPU.
Completely untrue. One of my colleagues, who does a lot of work with GPU
processors for astrophysics calculations, was able to increase the
performance of the MD5 algorithm by ~100x with about 1.5 days of work.
He called this this code that he wrote "(totally unoptimized, a straight
CUDA C implementation of Rivest's algorithm". He tinkered some more,
adding some optimizations, and I believe he ended up with 350x
performance improvement.
Here, I quote his e-mail on his first round of coding that he sent me:
<quote>
The other day in NYC on HPC-UG meeting someone mentioned that GPUs
would be perfect for password cracking, with which I wholeheartedly
agreed (on theoretical grounds). But theory is nothing without
experiment :) , so I spent the last night and this morning writing a
GPU MD5 hash routine (totally unoptimized, a straight CUDA C
implementation of Rivest's algorithm).
The results?
* GPU (single GeForce 8800 Ultra on cylon):
57,640,967.264473 hash/second
* The same algorithm on the CPU (Intel(R) Core(TM)2 Quad CPU Q6700 @
2.66GHz on cylon):
543,839.652381 hash/second
A factor of ~100 difference. Sweet.
Another point of comparison: the fastest, assembly-level optimized x86
MD5 code, running on a _dual_ 3.2 GHz Xeon (see
http://c3rb3r.openwall.net/mdcrack/) can do 42e6 hash/sec. And remember,
I wrote the CUDA code in a day and a half, with _no_ optimization. Nice.
In another words, one GPU card with an amateurishly written MD5 code can
brute-force crack an 8-character MD5 hashed password consisting of
[0-9A-Za-z] in about 6 weeks. Now imagine if someone who knew what they
were doing optimized the code, and got a cluster of Tesla's instead of a
single gaming card that I used....
Cool :-) .
</quote>
--
Prentice
- Previous message: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?
- Next message: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
