Question about custers
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduFri Feb 7 10:39:32 PST 2003
- Previous message: Question about custers
- Next message: Question about custers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 7 Feb 2003, Joe Griffin wrote: > What kind of units do you want? > > MFLOPS-Bytes? > > > > KNT wrote: > > > Greetz! > >I wanted to ask if there's a way of theoreticaly calculating a cluster > >power by a mathematical formula, basing on the nodes procesor type, ram, > >etc.? Assuming also that the components of each node can be different. > > > > -Thanks from the above > > KNT There are LOTS of theoretical ways, the simplest one being a simple aggregate of the individual node "power" by whatever measure you like. This is even a reasonable one for an embarrassingly parallel task with long runtime, small system footprint, and small I/O and overhead requirements, e.g. SETI or RC5. For other tasks, this is completely meaningless. To get a USEFUL measure of the power of a cluster on YOUR PROBLEM, one can proceed theoretically, but the answer will depend on the details of the work being done and how it is parallelized. To BEGIN to understand at least the most important components of a parallelized task and how their individual timings affect the parallel scaling of the work done, split up among many nodes, you might look at the first few chapters of my online beowulf book: http://www.phy.duke.edu/brahma/beowulf_online_book/ Especially focus on Amdahl's law and its generalizations. However, many tasks are sufficiently complex that estimating parallel speedup theoretically for a given node and network design is very difficult and prone to error; it just isn't worth it. The best way to proceed is to empirically measure all sorts of things -- ideally the parallel speedup itself, but sometimes that leaves one with a chicken and egg problem if you're trying to design a cluster that will work effectively for some particular problem -- and then make your estimates from an understanding of the basic ideas in the book and the explicit measurements of task times on your possible hardware. This is still a bit risky -- there are lots of nonlinearities and superlinearities in computer performance as the size, stride, and communications pattern of a program is varied across the memory, cache, and bus subsystems, and scaling up to "production" can sometimes lead to pleasant or unpleasant surprises. A last thing to note is that there are often many ways to parallelize a given task, and some may be better than others. Some may be MUCH better than others, as in the code will scale "well" for one algorithm and "terribly" for another. One is thus advised that even the best theoretical estimate of power for a particular problem is based on the SOFTWARE implementation of that problem as well as the cluster design, and if one's problem is indeed complex one may have to study parallel programming extensively to learn enough to be able to get things to work optimally for you. In other words, sure, there is lots of theory but it isn't "simple" and YMMV significantly from task to task, network to network, node to node. Nobody ever said parallel/cluster computing was "easy"...;-) rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: Question about custers
- Next message: Question about custers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
