Beowulf & FFT
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduSun Jul 23 21:17:36 PDT 2000
- Previous message: Beowulf & FFT
- Next message: Beowulf & FFT
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 18 Jul 2000, Martin Siegert wrote: > > I am not familliar with the FFT library you have, but it may be that it was > > not designed for a machine like a Beowulf, but rather for a finer grain > > machine like some of the MPPs. Thus the best approach may be a different > > bit of software, or reworking that software. All-to-All is probably NOT > > the best implementation on an MPI running over TCP/IP. > > The authors of FFTW claim that this is the fastest FFT you can get. I can only > say that from all I know this statement is correct, i.e., all other FFTs > that I tried were slower. Actually I wasted a month by writing my own FFT > only to find out that is was slower as well :-( > Furthermore, I can't really see how the necessary matrix transpose can be > done more efficiently without the Alltoall. Sorry about the very late response, but I've been out of town. a) Have you looked into the various books on parallel algorithms that are on the market (and in a few cases on the web)? I'd wager that this problem has been studied as FFT's seem like they'd be fairly important. These books also go over things like parallel matrix operations where the "best" algorithm can suddenly change to something quite nonintuitive depending on matrix scale and the various speeds. Even optimizing only for the relative speed differential of L1/L2/Main memory, ATLAS has to work quite hard and changes algorithms and blocksizes altogether several times. b) Is there a way of reformulating the problem so that it is embarrassingly parallel? That is, since you are scale limited anyway to smallish "runs" that might well fit on a single CPU UP system, is there any benefit to just running a bunch of these calculations independently in parallel? I have similar tradeoffs in my Monte Carlo code -- I could probably do really large lattices by splitting up the lattices between systems and paying a surface-to-volume network IPC penalty, but critical slowing down and the brutal quadratic scaling of statistics (need four times as many samples to halve the error) kills me long before that becomes a viable alternative. Instead I run lattices that fit easily onto single CPU's (which are also small enough that I have a chance of accumulating enough independent samples to get high precision results in my lifetime) and concentrate on accumulating decent statistics at these scales with independent parallel runs.
- Previous message: Beowulf & FFT
- Next message: Beowulf & FFT
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
