[Beowulf] Off-the-wall: WRF Model and PS3's (or Cell)

Brian Dobbins bdobbins at gmail.com
Fri May 16 20:08:36 PDT 2008


Hi Gerry,

  I'm by no means an expert on WRF so take the following with a grain of
salt, but I'm inclined to think that WRF wouldn't really run very well on a
cluster of PS3s.  The problem being that with < 200 MB total, giving ~ 25MB
per SPU, you're limited to a pretty small number of grid points per SPU,
which means they'll fly through all the computations on those very few grid
points,... and then very, very slowly communicate through the gigabit
network.  Even if you can get 2GB on each PS3, that's still only 256MB per
SPU, right?

  Again, my WRF experience is admittedly tremendously limited but a recent
3D run I did with a 300x300x200 domain size required a little over 12GB of
RAM, I believe.  The code had a few custom modifications, but I doubt that
changed the run-time characteristics drastically, and the resulting run took
something like 12.8 seconds on 8 processors,... and 11.8 seconds on 16
processors.  (Two nodes and four nodes in this case.)  Speeding up the
calculations through smaller grids and the very fast SPUs just means that
the communication would be, relatively speaking, even longer.

  Since we do have some people who need to run some pretty large WRF models,
I'd be happy if this *did* work, but if you're interested in novel
architectures for WRF, I would think that perhaps a GPU (or FPGA with many
FP units) connected to a PCI-Express bus with Infiniband links would be
nicer.  The IB would hopefully allow you to balance out the extremely fast
computations.   If I can, once the double precision GPUs are out, I'll be
picking one up for experimentation, but mostly for home-grown codes - WRF
may take a bit more effort.  The guys are NCAR do seem to have done some
work in this area, though, running one of WRF's physics modules on an NVIDIA
8800 card - you can read about here:
http://www.mmm.ucar.edu/wrf/WG2/michalakes_lspp.pdf

  My two cents.  :-)

(PS.  Ooh, now, if one could have a 'host system' with a large amount of RAM
to pipe in to the GPU, running very large models, I could see that
potentially working well as an *accelerator*.  Say, 32-64GB of RAM, of which
it deals with 2 x 128MB 'tiles' at a time - one being cached and written by
the GPU while the other computes - and once all the acceleration is done,
use the host to quickly synchronize via IB with other large nodes.  But
that's probably a fair amount of work!)


  - Brian

Brian Dobbins
Yale Engineering HPC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080516/7f69dc46/attachment.html>


More information about the Beowulf mailing list