Beowulf & FFT
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Jul 24 14:05:54 PDT 2000
- Previous message: Beowulf & FFT
- Next message: Are we there yet?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 24 Jul 2000, Martin Siegert wrote: > The problem is that you have to do a matrix transpose between the two > loops. FFTW does this using MPI_Alltoall. As Tony pointed out this may not > be the best choice. So I've been busy writing a matrix transpose routine > myself - something that is trivial for serial programs is totally nontrivial > when the matrix is distributed over several processors. If somebody can > give me a hint or reference how this is done for best performance, I'll > be thankful forever :-) [right now I'm trying MPI_Type_vector to define I don't know if it will be of use to you, but on the off chance of having somebody thankful forever (which is a very long time, after all;-) check out: http://www-unix.mcs.anl.gov/dbpp/text/node43.html#SECTION02540000000000000000 This is the "Convolution" section of "Designing and Building Parallel Programs", by Ian Foster, an online book. It covers 2d FFT's in fair detail, including a comparison between transposition and pipelining algorithms. It suggests that pipelining might be better when N is "small", as it appears to be for you. http://www-unix.mcs.anl.gov/dbpp/text/node126.html#SECTION04330000000000000000 contains a discussion of tranposition algorithms appropriate for hypercubical architectures, which also might be better for small N, if you can rearrange your systems into a suitable hypercube. You'll likely want the paper copy of the book if you find this useful. I suspect that you already have found a lot of what it tells you though. A very nice book and useful reference to bookmark if you are a beowulfer, in any event. > Actually, I can: I have to average over random initial conditions. At least > in principle - I find that usually 4 samples are already enough, even a single > sample already gives you the growth exponent quite accurately. That's why > I hoped that I could run a single run in parallel - then I could go to > larger system sizes and later times. If it can't be done, I'll do exactly > as you said. Good luck -- your problem sounds like you can probably find a region and parallel algorithm where you get >>some<< parallel speedup for a relatively small number of systems, although you may not get linear scaling. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Beowulf & FFT
- Next message: Are we there yet?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
