[Beowulf] Multidimensional FFTs

Tue Feb 28 18:34:50 PST 2006

On Tue, Feb 28, 2006 at 01:26:51PM -0500, Bill Rankin wrote:

> There is a research group here at Duke doing some application  
> development and they are looking at implementing their codes in a  
> cluster environment.  The main problem is that 95% of their  
> processing time is taken up by medium to large sized 3D FFTs (minimum  
> 64 elements on an edge, 256k total elements).

That's a fairly small FFT on a parallel cluster. How many cpus do they
imagine using? Perhaps the easiest thing to do is to whip up some code
and invite people to benchmark it. The G-PTRANS and G-FFTE elements of
HPC Challenge are relevant but not many folks have submitted numbers.

Let's see: for 64**3, and 64 cpus with a 1D decomposition, there are
64**2 words per cpu, and a naive Alltoall will send 64 messages of 64
words each to 63 other nodes. Then the message length is 1024 bytes
(double precision complex). I would disagree with Stu's
recommendations at this size due to the short message length, but I
don't know if 2D would be a better decomposition at this size. FFTW
version 2's MPI routines only do 1D decomposition.

-- greg