[Beowulf] Clusters just got more important - AMD's roadmap

Peter Kjellström cap at nsc.liu.se
Wed Feb 8 11:27:49 PST 2012


On Wednesday, February 08, 2012 06:15:01 PM Mark Hahn wrote:
> > The APU concept has a few interesting points but certainly also a few
> > major problems (when comparing it to a cpu + stand alone gpu setup):
> > 
> > * Memory bandwidth to all those FPUs
> 
> well, sorta.  my experience with GP-GPU programming today is that your
> first goal is to avoid touching anything offchip anyway (spilling, etc),
> so I'm not sure this is a big problem.  obviously, the integrated GPU
> is a small slice of a "real" add-in GPU, so needs proportionately
> less bandwidth.

Well yes you want to avoid touching memory on a GPU (just as you do on a CPU). 
But just as you cant completely avoid it on a CPU nor can you on a GPU. On a 
current socket (CPU) you see maybe 20 GB/s and 50 GF and the flop-wise much 
faster GPU is also alot faster in memory access (>200 GB/s).

Now I admit I'm not a GPU programmer but are you saying those 200 GB/s aren't 
needed? My assumption was that the fact that CPU-codes depend on cache for 
performance but still need good memory bandwidth held true even on GPUs.

Anyway, my point I guess was mostly that it's a lot easier to sort out 
hundreds of gigs per second to memory on a device with RAM directly on the PCB 
than on a server socket.

Also, if the APU is a "small slice of a real GPU" then I question the point 
(not much GPU power per classic core or total system foot-print).

...
> I think the real question is whether someone will produce a minimalist
> APU node.  since Llano has on-die PCIE, it seems like you'd need only
> APU, 2-4 dimms and a network chip or two.  that's going to add up to
> very little beyond the the APU's 65 or 100W TDP...  (I figure 150/node
> including PSU overhead.)

I think anything beyond early testing is a fair bit into the future. For the 
APU to become interesting I think we need a few (or all of):

 * Memory shared with the CPU in some useable way (did not say the c-word..)
 * A proper number crunching version (ecc...)
 * A fairly high tdp part on a socket with good memory bw
 * Noticeably better "host to device" bandwidth and even more, latency

And don't get me wrong, I'm not saying the above is particularly unlikely...

/Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
Url : http://www.beowulf.org/pipermail/beowulf/attachments/20120208/0ba6d0d0/attachment.bin 


More information about the Beowulf mailing list