(no subject)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduThu Apr 11 15:58:18 PDT 2002
- Previous message: (no subject)
- Next message: (no subject)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 11 Apr 2002, Erik Paulson wrote: > > >If you've got N nodes, submit N copies of SETI at home to your queuing system, > > >and your cluster will get an N times speedup over a single node. I don't > > see > > >how you can hope to do better than that. > > > > I was aware of this possibility, but do not have the skills to implement it. > > Yes you do. Download Condor, or PBS, or Sun Grid Engine, or buy Platform LSF, > and: > A. Install it on N nodes > B. Submit N copies > > or, install Scyld or MOSIX. Type: > my_program & > > N times. And not even for SETI will you get an Nx speedup on N nodes. There is ALWAYS a serial fraction even for embarrassingly parallel applications, and the time required to send the jobs out to the nodes (relative to just looping N times on the node) is part of it. In Amdahl's Law N-fold speedup is the upper bound, not the general, practical limit. This is the basis of Eric's observation about embarassingly parallel jobs being ideal for clusters -- they're the ones that often get very close to N-fold speedup on N nodes for nearly arbitrary N. "Real" parallel jobs (ones with nontrivial communications built on MPI or PVM or raw sockets or even shared memory or some sort of specialized communications channel) almost never do this well, and more often than not will only speedup at all up to some maximum number of nodes and then actually run more slowly if further partitioned. It's also interesting that master-slave jobs were cited as being "real" parallel applications as in many cases the master is nothing more than an intelligent front end for an embarassingly parallel application core. What's the difference between using a script or Mosix or even a bunch of rsh's as the "master" that distributes the jobs and collects the results and using PVM to do exactly the same thing? Not much, really, but perhaps a small edge in network efficiency for that part of things. This may matter -- if the jobs run a short time and communicate with the master a long time it will matter -- but in cases where this paradigm makes sense at all (where the ratio of run to communication is the other way around -- lots of computation, a little communication) it won't matter much. Most of this is in any decent book on parallel computing, including at least one that is freely available on the web. Then there is my online book (which I make no claim for being "decent", but it is free:-). Lots of these resources are on or linked to various cluster sites, including: http://www.phy.duke.edu/brahma rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: (no subject)
- Next message: (no subject)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
