[Beowulf] difference between accelerators and co-processors
hahn at mcmaster.ca
Tue Mar 12 09:17:09 PDT 2013
>> 1) Comparing a single low power APU to a single high power discrete GPU
>> doesn't make sense for HPC. Rather we should compare a rack of equipment
>> that can operate in the same power envelope.
> [Joshua] I was comparing, or the paper compares a system (APU) vs a system
and I was comparing a nominal 1200W 1U chassis. current APU chips are,
of course, mainly optimized for mobile and low-end desktops, so we have
to abstract a bit from them to find anything interesting for HPC:
- tighter integration with the CPU
- potential improvement with 2.5d integration of ram
> If you add the network to it, then you would need to add that too for both,
network interfaces, even full-on IB, are pretty minor power-wise (<10W).
>> 2) You can bolt GDDR5 onto an APU, eliminating the local bandwidth
>> advantage (AMD is doing exactly this for the PS4). Also, we should really
>> be comparing the bandwidth available to each GPU "core".
> [Joshua] I believe there are power constraints on what you can do with APUs in
> terms of high speed memory.
not really. high-speed memory is fairly cool (I recall the diagram from Phi
listing 1.2-1.3W per gddr5 chip, and that's running at 5 Mt/s!
what's expensive is long and/or complicated signals. if you're only running
a signal point-to-point, with no sockets or stubs, and a total length of
millimeters, power is much less a concern. quite a difference between
a GPU/APU and soldered-on gddr5 versus trying to drive multiple banks of
socketed ddr3 across inches of board...
> That is why you get discrete GPUs burning ~250W
> but capable of feeding the streaming cores at an aggregated ~150GB/s from
> global memory.
actually, current leading addin cards are >= 300 GB/s. but most of the 250W
is from the compute cores. though with a power envelope like that, they can
certainly they can afford to drive more gddr channels (512b wide, usually).
More information about the Beowulf