Cluster benchmark(s)?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Martin Siegert siegert at sfu.caWed Jan 17 15:29:23 PST 2001
- Previous message: Cluster benchmark(s)?
- Next message: Cluster benchmark(s)?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 17 Jan 2001 Randy_Howard at Dell.com wrote: >> Well, my intent was not to establish specific numbers but rather to >> get an idea of "bang for the buck" factors with various hardware >> configurations. For example, I wonder if there is a way of >> predicting up front for a given application whether or not 10/100 >> ethernet would be sufficient and not become the primary bottleneck. >> I understand it is a very complex problem and this may not even >> be possible. On Wed, Jan 17, 2001 at 04:19:01PM -0500, Robert G. Brown wrote: > Oh, it's possible all right but it isn't easy. As you say, it's > (fundamentally!) a complex problem, so you have to learn to understand > and manage the complexity. A general methodology might be outlined > something like: ... <snip very valid, but necessarily lengthy procedure> Hmm. This is all very valid and correct, but unfortunately quite overwhelming, particularly if you want to build your first cluster. I'm wondering whether it wouldn't be possible to establish a database of cluster benchmarks that could provide hints (these won't be more than hints, but nevertheless these could be helpful). Here is the idea: There should be benchmarks and speedup data for different type of cluster applications: 1. Embarrassingly parallel (e.g., Monte-Carlo simulations). In this case the benchmark will be dominated by the CPUs, the interconnect is unimportant, the speedup curve will show linear scaling for (almost) unlimited number of processors. 2. Applications with "nearest neighbour" communications (e.g., finite-difference methods for PDEs). In this case there is significant communication between processors, however, since the communication is local (i.e., processor n only talks with n+1 and n-1) the scaling of the communication time with the # of processors is not so bad (constant + probably a small linear piece). In this case you should see a maximum in the speedup curve the location of which depends on you interconnect. 3. Applications with pairwise (all-to-all) communications (e.g., parallel FFT). In this case the time for communication scales proportional to the square of the # of processors. The benchmark will be dominated by the speed of the interconnect, i.e., the speedup curve will show minimal speedups (or even speedups < 1) for fast ethernet. There may be a few more cases (but probably not many more). A real application will be a mixture of these three scenarios. But if you know how, e.g., a PIII/800MHz cluster with fast ethernet scales in these cases, you at least have some hints how your own application may scale on certain architectures. Sure, there are complications: The results depend on the MPI distribution used: e.g., lam works best when small latencies are required, mpipro is good when high throughput is required, etc. But nevertheless, I'm sure something like this would have helped me when I set up my first cluster. Comments? Cheers, Martin ======================================================================== Martin Siegert Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert at sfu.ca Canada V5A 1S6 ========================================================================
- Previous message: Cluster benchmark(s)?
- Next message: Cluster benchmark(s)?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
