A beowulf for parallel instruction.
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduThu Nov 2 08:32:28 PST 2000
- Previous message: A beowulf for parallel instruction.
- Next message: A beowulf for parallel instruction.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 1 Nov 2000, J, A. Llewellyn wrote: > discussed. The experiments you suggest are enticing but > remember our primary objective is the software training > end ( even if I didn't make it very clear in my post). > The multiple small cluster suggestion is something that > has also cropped up. I wonder what size we need in > order to make the difficulty in parallelization > organization to be apparent without it overwhelming the > entire process. If we can bring this off with a half > decent lab it will be marvelous. It looks as if we need > to think in terms of under 32 nodes total, (all > flavors) and assembling a list of candidate NICS, > switches etc. Any priorities to suggest? Nothing too concrete beyond what was in my first response. If you're focusing on parallel programming instruction (semispecialized to beowul[v,f]ies) then it would be useful to illustratively teach Amdahl's Law and generalizations thereof (parallel scaling). Paradoxically, with a small cluster you DON'T want to have bleeding edge communications in order to see nice (i.e. "bad") parallel scaling curves as you distribute a task among more and more processors. That is, I'd guess that you will want to teach them to take a task that DOESN'T scale too well at least in a naive parallelization and "solve the problem" of how to make it run efficiently on the available hardware, as well as solve the matching problem of recognizing when it will NEVER run efficiently on the available hardware and what hardware modifications are required to make it run efficiently. This is the motivation for selecting hardware that supports sequential steps to improve the network and/or the swapping of the underlying hardware to illustrate how faster and slower CPU's, memory, and so forth can affect parallel program design. You thus might want switches/NICs you can force into 10BT mode for one part and then reset/reboot into 100BT mode. A 10:1 multiplier in the serial IPC fraction ought to let you come up with a set of problems that scale terribly @ 10Mbps but that scale reasonably well @100Mbps, which one can then at least mentally extrapolate to 1000Mbps even without the much more expensive hardware. This suggests manageable switches where this can be done. My bias in all of this is pretty clear. I tend to think in terms of beowulf engineering (matching the hardware to the problem) more than software engineering (matching the problem to the hardware) because the traffic on the list involves the question "How do I design a beowulf that will do well on my problem" much more often than it ask "How to I write my problem so that it runs well on my beowulf." Both are important, of course, as the answer to either one depends at least in part on the other. A moderately heterogeneous laboratory would let you explore both sides independently and as a coupled problem. On the one hand, your students could learn that increasing the processor speed on a problem that is IPC bound may not yield tremendous benefits in terms of speedup (and may be very costly) by running the same problem on "identical" networks with differing node speeds. They can also see when e.g. memory bandwidth matters by running on nodes with different memory bandwidths. Alpha nodes, for example, have exemplary memory speed, and for certain kinds of problems this is a big win. For others, their tremendously inflated cost is a big loss. In conclusion, I personally think that it is hard to teach a "pure" beowulf parallel programming course because of the tremendous range of hardware and software that can be connected together into a "beowulf". There are damn few assumptions you can make about specific rates in any given beowulf, and all sorts of "critical" performance features in the software design from the speedup curve for a given algorithm to the optimum algorithm itself can depend nonlinearly or even discontinuously on those performance and design features. Of course there are applications you can program that are relatively insensitive to design, but they are the "easy" ones -- relatively coarse grained or embarrassingly parallel. I assume you want to teach your students to do well with the harder ones. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: A beowulf for parallel instruction.
- Next message: A beowulf for parallel instruction.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
