Distributed Algorithm.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduThu Dec 14 04:04:31 PST 2000
- Previous message: Distributed Algorithm.
- Next message: Distributed Algorithm.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 14 Dec 2000, Mofeed Shahin wrote: > G'day all, > > I am currently working on a program which is simulating the movements of 'n' > number of nodes. > > The problem I am having is that it doesn't take long to calculate the next > position of Node n1, n2, n3, etc...... I need all the locations for time t, > before I can start calculation the locations for t+1. > > And therefore the bottleneck becomes the IO bandwidth between the cluster > processing units, and not CPU. > > Is there a way to get rid of the bottleneck? (ie change of algorithm?) When you say calculate the positions of N nodes, do you mean in a sort of molecular dynamics sense? That is (to be picky) are you solving either N coupled ODEs or N forward difference equations? If so, then an important question becomes what the range of interaction is. That is, does the differential step for the ith node (out of N) depend on the state of ALL the other N-1 nodes or just some "local" subset (often in real MD, it might be a really local subset in that they are spatially contiguous). If the ODE's or iterated difference equations are "local" to some degree, the proper treatment is usually to partition the problem, although if the math for a step is truly small compared to the IPC time (even for only the IPC's required to transmit the "boundary" node information, which usually scales like a surface relative to a volume) it may still not parallelize very well until N becomes very large. If the ODE's are long range, and the step of the ith node depends on the state of all N-1 other nodes, well, you may have a problem that just doesn't parallelize very well, at least on the less expensive beowulf architectures. In that case, your alternatives are to spend a lot more money on communications than on computation -- buy a truly high speed network like e.g. Myrinet and use it with relatively cheap and slow processors, and you might be able to scale up to some decent limit. Greg Lindahl has presented studies of alpha/myrinet clusters that scaled well for e.g. gravitation problems (another long range interaction) that might be relevant at Linux Expo (the one in Raleigh a year and a half ago) that are probably in the proceedings. The other possibility is to make your problem embarrassingly parallel. OK, so you cannot sanely partition your problem. If you CAN run your problem with all N nodes on a single system, and need to run it many independent times (M times, in fact) for N node strings, you can run the M calculations in parallel on M computation nodes and still complete the N*M work in N time. In fact, this will scale better than just about any alternative even with cheapo ethernet, provided only that it doesn't take a huge amount of time to start a calculation and process/tabulate the results relative to the internal compute time. The final note is that there are circumstances where it is worth it to parallelize a problem even when it scales poorly (even when it scales "negatively"). If the largest computer you can sensibly afford contains e.g. 2 GB of RAM, but you need to do calculations with 10 GB worth of nodes, well, running in parallel may be slower per node than running on a single system but running >>10 GB<< worth of nodes in parallel (with any IPC penalty) is infinitely faster than not being able to run them at all. Or, more practically, than trying to run the calculation with a disk-based virtual memory space -- memory to memory transactions across the network are generally faster than a disk, which motivates e.g. the Trapeze project at Duke (efforts to create a virtual memory space across a network of systems for truly big memory projects). Again, a really fast network helps here, and you are likely better off building a carefully balanced system with a more money tied up in the network than in the processors. rgb > > Mof. > > _______________________________________________ > Beowulf mailing list > Beowulf at beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Distributed Algorithm.
- Next message: Distributed Algorithm.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
