[Beowulf] Multidimensional FFTs

Konstantin Kudin konstantin_kudin at yahoo.com
Wed Mar 1 10:23:26 PST 2006


> So I was wondering what the current "state of the art" is in  
> clustered 3D FFTs?  I've googled around a bit, but most off the  
> results seem a little dated.  If someone could point me to any recent
 
> papers or studies, I would be grateful.

 You can find some of the reasonably recent FFT related stuff at this
link (read the preprint on the FFT strategies):

 http://pages.unibas.ch/comphys/comphys/SOFTWARE/

 Anyway, the "alltoall" can be a real killer. If you want to use lots
of cpus with really small packets, go for something like Parastation
MPI ( http://www.parastation.com/ ). This MPI package is MPICH based
and cuts down latencies for small packets by about 30% (really !). And
the best part it is free for academics.

 For large packets, things get trickier. Like on a dual Opteron cluster
around here there is significant "choking" effect, due to unknown
reasons. Using skampi 4.1, one gets what is shown below for 64kB
packets (this is with the bleeding edge version of Open-MPI, 1.1
pre-alpha). Open-MPI developers promise to pay specific attention to
the "alltoall" function, so things might become quite good at some
point.


[ncpu ms std]

(choking at 15 cpus)
#/*@insyncol_MPI_Alltoall-nodes-long-SM.ski*/
       2     275.1      1.6      8     275.1      1.6      8
       3    1890.2     31.3      8    1890.2     31.3      8
       4    3467.1     85.0      8    3467.1     85.0      8
       5    5843.9     66.3      8    5843.9     66.3      8
       6    8720.9    110.6      8    8720.9    110.6      8
       7    9598.8     99.6      7    9598.8     99.6      7
       8   11757.9    256.4      6   11757.9    256.4      6
       9   13428.2    166.4      8   13428.2    166.4      8
      10   14623.4    176.2      8   14623.4    176.2      8
      11   16689.4    171.9      4   16689.4    171.9      4
      12   18941.4    502.9      5   18941.4    502.9      5
      13   20105.2     99.0      8   20105.2     99.0      8
      14   22731.1    155.0      2   22731.1    155.0      2
      15  123939.7  49248.4      8  123939.7  49248.4      8
      16  142048.0  43888.8      8  142048.0  43888.8      8

 If "alltoall" is not used, but rather a bunch of isend+irecv, the
choking effect shows up way earlier:

(choking at 6 cpus)
#/*@insyncol_MPI_Alltoall_Isend_Irecv-nodes-long-SM.ski*/
       2     247.4      0.8      8     247.4      0.8      8
       3    1861.8     10.1      8    1861.8     10.1      8
       4    3158.4     24.5      8    3158.4     24.5      8
       5    4270.0     75.0      2    4270.0     75.0      2
       6  225351.5  12504.5      2  225351.5  12504.5      2
       7  228399.5  14770.5      2  228399.5  14770.5      2
       8  247087.5  14448.4      2  247087.5  14448.4      2
       9  243806.7   3878.9      8  243806.7   3878.9      8
      10  248353.0   6640.9      2  248353.0   6640.9      2
      11  267541.5   5210.1      8  267541.5   5210.1      8
      12  286600.1   1665.1      2  286600.1   1665.1      2
      13  277546.5   4208.1      8  277546.5   4208.1      8
      14  364208.9  98276.9      2  364208.9  98276.9      2
      15  392139.0 101163.9      2  392139.0 101163.9      2
      16  367182.1  97711.0      2  367182.1  97711.0      2

  Konstantin


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the Beowulf mailing list