using graphics cards as generic FLOP crunchers

'Bryce' bryce at
Fri Mar 16 07:37:57 PST 2001

John Duff wrote:

> Hello,
> There are groups at Stanford (WireGL) and Princeton who have done work on
> parallel graphics on PC clusters.  They put a high-end PC graphics card (such
> as an NVidia card) in each slave node of a cluster, and then parallelize the
> rendering of 3D scenes across the cluster, taking advantage of the video
> hardware acceleration, and then combine the image either on a big tiled
> projecter or on a single computer's monitor.  This is all well and good,
> but it struck me that when other groups at these universities who have no
> interest in graphics use the same cluster, all that computing horsepower
> in the GPUs on the graphics cards just sits idle.  Would it be possible
> to write some sort of thin wrapper API over OpenGL that heavy-duty
> number-crunching parallel apps could use to offload some of the FLOPs from
> the main cpu(s) on each slave node to the gpu(s) on the graphics card?
> It would seem pretty obvious that the main cpu(s) would always be faster
> for generic FLOP computations, so I would think only specific apps might
> benefit from the extra cycles of the gpu(s).  Of course, the synchronization
> issues might be too much of a pain to deal with in the end as well.  Has
> anyone heard of someone trying this, or know of any showstopper issues?
> Thanks,
> John
> _______________________________________________
> Beowulf mailing list, Beowulf at
> To change your subscription (digest mode or unsubscribe) visit

Just a tidied up bit of Irc log that you can mull over


 Bx notes the beowulf geeks are getting seriously freaky, they're searching for a way to use the GFU's on high
end video cards to contribute to the processing power of the main FPU
* Bx backs away from these guys
<kx> bx, they're blowing smoke.  you can't do that
<rx> Bx: again ?
<wx> bx: that's not too insane
<rx> kx: I guess it depends on what you're trying to do
<kx> wx, er, it is quite insane.  they can do the calculations but there's no way to get the results out.
<rx> kx: for video rendering it would make sense, I guess ;)
<rx> kx: read back from that frame buffer you've got memory mapped ?
<sx> kx: Generate two textures the size of the screen.  Map them to the display using a multi-texture operation
with an alpha-blend operator between the two of them.
<kx> rx, that would be unworkable;  you'd have to scan the entire framebuffer for the pixel that is the result of

your calculation
<sx> kx: Suddenly you get something that looks suspiciously like a vector multiply.
<kx> sx, hm, possibly
<wx> kx: most fbs let you readl/writel to arbitrary locations
<kx> I still think it's impractical
<sx> kx: No, you can randomly read pixels
<kx> hm, you could do colourspace conversion quickly too
<kx> it'd be a neat hack but I suspect you'd be better off buying a faster CPU
<sx> kx: Not on all cards
<sx> kx: Some of the cards do YUV on demand through overlay.
<sx> kx: The Voodoo3 will do either overlay or texture; with texture conversion you can get the data back.  With
overlay conversion, you can't.
<kx> sx, either way I suspect you'd have to make your code highly dependant on gfx chipset;  and the rate they
are iterating right now it'd be a wasted effort
<sx> kx: A lot of the common multitexture blending modes are pretty standardised, and arithmetically useful.  But

only 32 bit fixed point.
<sx> kx: You could even do things like fast array additions by repeatedly mapping a texture down to half its size

with bilinear filtering.
<kx> sx, still, I suspect that using a faster CPU will be easier and cheaper

More information about the Beowulf mailing list