Network Charteristics and Applications
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Greg Lindahl lindahl at conservativecomputer.comFri Jan 4 14:01:53 PST 2002
- Previous message: Network Charteristics and Applications
- Next message: charmm scalability on 2.4 kernels
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Yea! beowulf.org is back! > The Question... Which specific parallel applications/algorithms/problem > classes benefit significantly from bandwidth increases,decreased network > latency or a combination of both? Here are some gross generalizations that might help: With most algorithms, the less data per cpu, the more bandwidth and latency count. So runs with lots of cpus, or smaller datasets, are harder. Example: Climate modeling generally involves running a relatively coarse grid for a large number of timesteps. It's hard to get a good speedup unless you have a really great machine, and so there was some bruhaha recently about how the US needed to buy (Japanese) vector machines for this problem. (However, I don't think this is the case, the climate people simply need to use best practices with MPI.) Example: QCD, quantum chromodynamics. QCD computes on a 4 dimensional grid. Sometimes people want to compute large grids, sometimes small. Less data on a node means relatively more communications and lower required latencies. Steve Gottlieb has a theoretical slide demonstrating this: http://physics.indiana.edu/~sg/utah/performance_model.html If you want to build a QCD machine that sustains 10 TFlop/s over a wide range of grid sizes, this is a hard problem. For example, if I have a 200 MF/s sustained processor, I can get to a local grid size of 4^4 using Myrinet and 12^4 using fast ethernet. 12^4 is so large of a grid that it isn't so useful for fast computations. Example: Weather forecasting. Similar to climate, but there are multiple kinds of forecasts: regional, national, global, each with more data. The regional forecast is *hardest* to speed up because it has the least data. You can get a speedup of say 8x today with fast ethernet before you hit a wall. But if you're doing global forecasts, you can get much bigger. The 10x number comes from an experiment that the Utah people did for their upcoming Olympic forecasts. Meanwhile, while doing the FSL bid, I computed that an extra 100 usec of latency wouldn't hurt their 40km national forecast at all, and the average bandwidth needed was 1/3 gigabit/sec, at 40-odd cpus. 2) With other algorithms, the range of data sizes people want to use is in a fairly linear area of performance on some hardware. One example of this is CHARMM on the Cray T3E, which has a great interconnect (and a slow processor) by today's standards. I actually built a little tool using the MPI profiling interface which does some gross computations of compute/comm ratios. I'd like to turn it into a tool usable by the community; would anyone like to volunteer to help? With such a tool you could take existing MPI codes and find out how they behave in practice. greg
- Previous message: Network Charteristics and Applications
- Next message: charmm scalability on 2.4 kernels
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
