[Beowulf] crunch per kilowatt: GPU vs. CPU

Lux, James P james.p.lux at jpl.nasa.gov
Mon May 18 12:15:22 PDT 2009



> -----Original Message-----
> From: beowulf-bounces at beowulf.org 
> [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joe Landman
> Sent: Monday, May 18, 2009 10:36 AM
> To: David Mathog
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] crunch per kilowatt: GPU vs. CPU
> 
> Hi David
> 
> David Mathog wrote:
> > Although the folks now using CUDA are likely most interested in crunch per unit time (time efficiency), perhaps some of you have measurements and can comment on the energy efficiency of GPU vs. CPU computing?  
> > That is, which uses the fewest kilowatts per unit of computation.  My guess
> 
> Using theoretical rather than "actual" performance, unless 
> you get the same code doing the same computation on both units:
> 
> 1 GPU ~ 960 GFLOP single precision, ~100 GFLOP double precision @ 160W
> 
> 1 CPU ~ 4x (3 GHz x 4 DP flops/cycle) = 48 GFLOP double 
> precision @ 75W
> 

This is an exceedingly complex issue (and has been discussed in the past, so there's grist to be gleaned from the archives).

One can develop metrics like nanoJoules/operation

Just like with any other benchmark, it depends on a lot more than the CPU and also on the instruction stream.

Some things that will have an effect on CPU (and even more on system) power dissipation:

Going "off chip" (e.g. for a memory access) will increase energy consumption because you have to charge and discharge the capacitance of the PCB traces and drive the input impedance of the memory.  This can be surprisingly large.

Example: a typical load impedance on a SINGLE input pin is 10pF, and you swing 3.3V, so you are pushing 54.45E-12 joules to charge/discharge the capacitor. Do that 66 million times a second, on a bus 64 bits wide (don't forget the address bus as well), and it's about a quarter of a watt. Since there are typically some sort of series termination resistors or similar involved, the power dissipation in those terminations is comparable.  Now multiply that by several memory banks, long PCB traces, etc.

Then, there's the power dissipation in the pin drivers themselves, not to mention the power dissipation of the memory.

It all adds up.






More information about the Beowulf mailing list