[Beowulf] crunch per kilowatt: GPU vs. CPU

Craig Tierney Craig.Tierney at noaa.gov
Mon May 18 12:51:02 PDT 2009


Bill Broadley wrote:
> Joe Landman wrote:
>> Hi David
>>
>> David Mathog wrote:
>>> Although the folks now using CUDA are likely most interested in crunch
>>> per unit time (time efficiency), perhaps some of you have measurements
>>> and can comment on the energy efficiency of GPU vs. CPU computing?  That
>>> is, which uses the fewest kilowatts per unit of computation.  My guess
>> Using theoretical rather than "actual" performance, unless you get the
>> same code doing the same computation on both units:
>>
>> 1 GPU ~ 960 GFLOP single precision, ~100 GFLOP double precision @ 160W
> 
> That sounds like the Nvidia flavor GPU, granted nvidia does seem to have a
> larger lead over ATI for such use... at least till OpenCL gains more
> popularity.  Nvidia's double precision rate is approximately 1/12th the single
> their precision rate.  ATI's is around 1/5th, which results in around 240 GFlops.
> 

Where did you get the 1/12th number for NVIDIA?  For each streaming multiprocessor (SM)
has 1 single precision FPU per thread (8 threads per SM), but only 1 double precision FPU
on the SM.  So that ratio would be 1/8.  I have demonstrated this ratio on a simple
code that required little to no memory transfers.

ATI still provides more dp flops.

Craig


> So in both cases you get a pretty hefty jump if your application is single
> precision friendly.
> 
> Of course such performance numbers are extremely application specific.  I've
> seen performance increases published that are a good bit better (and worse)
> than the GFlop numbers would indicate.  If you go to http://arxiv.org and type
> CUDA in as a search word there are 10 ish papers that talk about various uses.
> 
> So basically it depends, either AMD, Intel, Nvidia, or ATI wins depending on
> your application.  Of course there's other power efficient competition at
> well, atom, via nano[1], sci cortex (mips), bluegene, and the latest
> implmentation the PowerXCell 8i which is available in the QS22.
> 
> Assuming you have source code, and parallel friendly applications there's
> quite a few options available.  Ideally future benchmarks would include power,
> maybe add it as a requirement for future Spec benchmark submissions.
> 
> [1] http://www.theinquirer.net/inquirer/news/1137366/dell-via-nano-servers
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 


-- 
Craig Tierney (craig.tierney at noaa.gov)



More information about the Beowulf mailing list