using graphics cards as generic FLOP crunchers

John Duff jfduff at
Fri Mar 16 06:21:48 PST 2001


There are groups at Stanford (WireGL) and Princeton who have done work on 
parallel graphics on PC clusters.  They put a high-end PC graphics card (such 
as an NVidia card) in each slave node of a cluster, and then parallelize the 
rendering of 3D scenes across the cluster, taking advantage of the video
hardware acceleration, and then combine the image either on a big tiled
projecter or on a single computer's monitor.  This is all well and good,
but it struck me that when other groups at these universities who have no
interest in graphics use the same cluster, all that computing horsepower
in the GPUs on the graphics cards just sits idle.  Would it be possible
to write some sort of thin wrapper API over OpenGL that heavy-duty
number-crunching parallel apps could use to offload some of the FLOPs from 
the main cpu(s) on each slave node to the gpu(s) on the graphics card?
It would seem pretty obvious that the main cpu(s) would always be faster
for generic FLOP computations, so I would think only specific apps might
benefit from the extra cycles of the gpu(s).  Of course, the synchronization
issues might be too much of a pain to deal with in the end as well.  Has
anyone heard of someone trying this, or know of any showstopper issues?


More information about the Beowulf mailing list