Super-scaling
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jakob Oestergaard jakob at unthought.netThu Feb 20 14:56:14 PST 2003
- Previous message: Super-scaling
- Next message: Super-scaling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Feb 19, 2003 at 01:30:22PM +0000, Simon Hogg wrote: > Is there a rough rule of thumb which dictates when a program (if ever) > shows superscaling with number of nodes. Of course, I would not expect > this to carry on ad infinitum, but does anyone see superscalar behaviour up > to, a certain number of nodes. > > What would be the conditions for this to occur? When splitting up a sequential problem to run on multiple processors, various conditions will determine whether the scalability is sub-linear, linear or super-linear: This works against scalability: *) Required communication between the sub-problem solvers (your nodes) *) Non-uniform breakdown of the problem (some nodes will wait for other nodes that take longer to complete their sub-problem) This works for scalability: *) By adding nodes, sub-problems are solved in parallel *) sub-problems are smaller than the full problem, and can thus be solved faster It is the last condition that can result in super-linear speedup. Since modern computers are *not* Von Neumann machines, solving two half problems may be significantly faster than solving one full problem. More specifically; executing an instruction takes some amount of time, but getting the data input required for executing the instruction is often what limits your performance. Modern machines have a hierarchial memory configuration - you have registers, L1 cache, L2 cache, sometimes L3 cache, RAM, disk cache, and disk. By splitting up your problem, you may be able to use less disk access and more RAM access. Or more L2 cache and less RAM access. L2 is many times slower than L1. RAM is many times slower than L2. Disk is many orders of magnitude slower than RAM. If, by splitting up your problem, your machines can keep the entire working set in RAM instead of 50% in RAM and 50% on disk (swap, scratch files, etc.) - you will see a *massive* speedup. Thus, solving two sub-problems may be much faster than solving the full problem in one go. Of course, splitting up a problem into sub-problems, may require more work (counted in numbers of instructions). It may require communication, it may result in non-uniform balancing of load. Many things work against you. I have parallelized a handfull of existing simulation programs, and only seen super-linear speedup once. It's not terribly uncommon I think, but you cannot assume that any problem will exhibit this behaviour. And for many problems, there is just no way that you can ever achieve this. The disk-bound problems are the easiest ones, I think. Because getting rid of the disk access will buy you several orders of magnitude in speed increase (thus, the overall problem will be forgiving, even introducing five times as much computational work in order to split up the problem, may still buy you super linear speedup). Getting things to fit in L2 is a lot harder, and I doubt that you will be able to parallelize many real-world problems to the extent where each sub-problem working set will fit in L2. L2 usage can be optimized using packages such as Atlas, for blocked linear algebra routines, for example. Still, going from 5% of the working-set in L2, to perhaps 20% of the working set in L2, may still provide you with a generous speedup. I don't think there are rules of thumb. The factors and conditions I've summarized here are cude at best, but should give you an idea about what the scalability game is all about :) Move the bulk of your working-set data up the hierarchy (from disk to RAM, from RAM to L2, ...), avoid communication where possible, adding more computational work to the overall problem may still aid the speedup, if it helps avoiding communication or memory consumption. Measure, think, experiment... Repeat as necessary. Everyone feel free to supplement, comment and correct ;) -- ................................................................ : jakob at unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............:
- Previous message: Super-scaling
- Next message: Super-scaling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
