[Beowulf] Is there really a need for Exascale?
ljdursi at scinet.utoronto.ca
Fri Nov 30 12:41:52 PST 2012
On 11/30/2012 03:25 PM, Eugen Leitl wrote:
> Absolutely. CUDA is a lot like assembler that way, and assembler
> has been almost completely displaced by low-level but hardware-independant
> languages like C.
> You can't tune as much in OpenCL, but on the other hand, you
> don't have to. The achievable performance is lower, but more
> uniform across diverse platforms. The JIT knows the hardware,
> so that you don't have to.
I wish that were true. How to decompose between threads/blocks
(items/groups in OpenCL), how to balance using the speed of using shared
(local) memory vs the reduction in occupancy, etc. are all hardware
dependent things that the JIT can't and doesn't hide from you.
To extend your assembly analogy, OpenCL tried to be a perfectly general
assembly language that one could write in to target AMD GPUs *and* CUDA
GPUs *and* Intel/AMD multicore processors, *and* IBM Cells, etc. That
was always going to end badly. Not only do you not get performance
portability - a multicore processor is not very much like a GPU, JIT or
no JIT - but the generality means that "hello world" in OpenCL is 100+
lines longer than in CUDA. And that's why almost no one bothers
teaching OpenCL (Check out the count difference between "Intro to CUDA"
and "Intro to OpenCL"). I'm all for open standards, but they have to
standardize something that makes sense.
The cycle of programming for performance in new hardware is always that
the enthusiastic early adopters have to program in the hardware-specific
low-level stuff for a while, and eventually compilers and or new
programming models catch up. I'm hoping that OpenACC is the start of
that second stage. For *really* good performance on tricky problems
users will still have to fall back to CUDA or something else for AMD
GPUs; but then a number of users here and even community codes (eg,
gromacs) still have hand-coded assembly for a few architectures to make
sure the right bits of their kernels get vectorized properly, etc.
Jonathan Dursi <ljdursi at scinet.utoronto.ca> SciNet;Compute/Calcul Canada
More information about the Beowulf