On Thu, Sep 04, 2008 at 09:56:13AM -0600, Craig Tierney wrote:
> This is not correct.  The NVIDIA GT200 series supports IEEE DP FP in
> hardware.  NVIDIA only has 1 DP FP unit per streaming processor (24
> on the GTX280) which is 1/8 the number of units of single-precision
> floating point (each thread has its own unit).  So the max DP FP
> rate on a GTX280 is about 90 Gflops.

So has anyone taken those 8 single-precision floating point units and
tried using them to get double-precision or better accuracy?  Perhaps
using the "native-pair" and "speculative precision" approaches
discussed here:


The 2006 paper there talks about doing so on a Nvidia GeForce 6800
Ultra, on which a (c. 64 bit) native-pair calculation took about 10x
the clock cycles of a single 32 bit flop (better for sqrt).

