[Beowulf] difference between accelerators and co-processors

Sun Mar 10 14:08:50 PDT 2013

On Mar 10, 2013, at 9:03 PM, Mark Hahn wrote:

>> Is there any line/point to make distinction between accelerators and
>> co-processors (that are used in conjunction with the primary CPU  
>> to boost
>> up the performance)? or these terms can be used interchangeably?
>
> IMO, a coprocessor executes the same instruction stream as the
> "primary" processor.  this was the case with the x87, for instance,
> though the distinction became less significant once the x87 came  
> onchip.
> (though you certainly notice that FPU on any of these chips is mostly
> separate - not sharing functional units or register files,  
> sometimes even
> with separate micro-op schedulers.)
>
>> Specifically, the word "accelerator" is used commonly with GPU. On  
>> the
>> other hand  the word "co-processors" is used commonly with Xeon Phi.
>
> I don't think it is a useful distinction: both are basiclly  
> independent
> computers.  obviously, the programming model of Phi is dramatically  
> more
> like a conventional processor than Nvidia.
>

Mark, that's the marketing talk about Xeon Phi.

It's surprisingly the same of course except for the cache coherency;
big vector processors.

> there is a meaningful distinction between offload and coprocessor  
> approaches.
> that is, offload means you use the device to accelerate a set of  
> libraries
> (offload matrix multiply, eig, fft, etc).  to use a coprocessor, I  
> think the
> expectation is that the main code will be very much aware of the  
> state of the
> PCIe-attached hardware.
>
> I suppose one might suggest that "accelerator" to some extent implies
> offload usage: you're accelerating a library.
>
> another interesting example is AMD's upcoming HSA concept: since  
> nearly all
> GPUs are now on-chip, AMD wants to integrate the CPU and GPU  
> programming
> models (at least to some extent).  as far as I understand it, HSA  
> is based
> on introducing a quite general intermediate ISA that can be  
> executed using
> all available hardware resources: CPU and/or GPU.  although Nvidia  
> does have
> its own intermediate ISA, they don't seem to be trying to make it  
> general,
> *and* they don't seem interested in making it work on both C/GPU.   
> (well,
> so far at least - I wouldn't be surprised if they _did_ have a PTX  
> JIT for
> their ARM-based C/GPU chips...)
>
> I think HSA is potentially interesting for HPC, too.
>   I really expect
> AMD and/or Intel to ship products this year that have a C/GPU chip  
> mounted on
> the same interposer as some high-bandwidth ram.

How can an integrated gpu outperform a gpgpu card?

Something like what is it 25 watt versus 250 watt, what will be faster?

I assume you will not build 10 nodes with 10 cpu's with integrated  
gpu in order to rival a
single card.

>   a fixed amount of very high
> performance memory sounds very tasty to me.  a surprising amount of  
> power
> in current systems is spend getting high-speed signals off-socket.
>
> imagine a package dissipating say 40W containing a, say, 4 CPU cores,
> 256 GPU ALUs and 2GB of gddr5.  the point would be to tile 32 of them
> in a 1U box.  (dropping socketed, off-package dram would probably make
> it uninteresting for memcached and some space-intensive HPC.
>
> then again, if you think carefully about the numbers, any code today
> that has a big working set is almost as anachronistic as codes that  
> use
> disk-based algorithms.  (same conceptual thing happening: capacity is
> growing much faster than the pipe.)
>
> regards, mark hahn.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf