question about the operation of beowulf
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduFri Dec 8 09:28:30 PST 2000
- Previous message: question about the operation of beowulf
- Next message: Scyld disk in UK
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 8 Dec 2000, Geneiatakis Dimitrios wrote: > If i have to do a task in a beowulf how the computers communicate > to do this work ? Generally speaking, they use some sort of socket connection. The subtask on node 1 uses a socket to send messages/data to node 2, for example. I use the word "socket" here very generically, as it may or may not be a typical TCP or UDP socket. Having written raw socket code for a few applications at this point I can safely and authoritatively say that while it isn't terribly difficult if one has good books and documentation and a few examples to work from, it is undeniably a pain in the butt to write your own raw socket code for a parallel application. There are many things to figure out and many "wheels" to reinvent. It is also very easy to make something that almost works but breaks, leaving you with a fairly significant debugging problem. Finally, there are scaling issues with raw socket code -- it isn't too difficult to construct a pair of sockets, but a matrix of pairs of sockets between all pairs of nodes that scales transparently with the number of nodes and provides one with adequate information and controls over the node state is a horse of a different color (and a LOT more work). For this reason, most folks that write beowulfish parallel applications use a library encapsulation of some sort of raw sockets that provide a relatively high end interface to a network of interprocessor communications (IPC) links. The two most common ones are PVM and MPI. PVM stands for "parallel virtual machine" and was originally written strictly to wrap raw IP sockets and provide a daemon-level interface between parallel program components, which could be running on literally any old heterogeneous combination of hosts. It was written (IIRC) at Oak Ridge by interested volunteers and provided via "netlib" -- which might be viewed as an early realization of the web that didn't quite catch on. MPI, on the other hand, was born out of user's frustration with the programming interfaces of early parallel/SMP supercomputers. In the relatively early days of parallel supercomputers, every vendor had their own proprietary communications layer and parallel API, and the supercomputers themselves obsoleted every 2-4 years (or their users moved to a different site where a different API reigned). A lot of users found themselves investing two years parallelizing their code for a given API only to have to turn around and do it all over for a different hardware API. Parallel code (which is still very "expensive" to develop) was just not portable. The government and key industries finally decided enough was enough and insisted that although supercomputer vendors could do what they liked with their proprietary interfaces, they were only going to buy systems with a common API that permitted at least moderate movement of code from system to system. Faced with this, the key supercomputer vendors (possibly regretfully, as I'm sure each hoped that their own proprietary API would somehow get branded and trump that of its competitors and give them a corporate/government monopoly stabilized by the cost of porting code, not unlike the stabilization of the Microsoft monopoly today) formed a consortium and wrote a common "message passing interface" to serve as a common wrapper for their proprietary communications systems calls. Eventually open source versions of MPI appeared and a network interface was added that allowed it, too, to form the library foundation for IPC's over a public IP network. MPI and PVM thus became functionally equivalent (for the most part, at least) libraries with very different roots and with somewhat different superstructure support for network-distributed parallel computations. To my experience, folks coming from "big iron" supercomputers to beowulfery tend to favor MPI as it facilitates the movement of their code and is what they are used to; folks coming from distributed parallel computing in network environments probably started with PVM and tend to prefer that. Were I to endorse one (or even acknowledge the one I started with and hence prefer:-) I would stimulate a religious jihad by the faithful of the other -- which may happen anyway. You should likely look them both over carefully to see which one you are likely to find codophilosophically compatible before choosing one or the other. A few other high end tools exist to permit certain classes of "IPCs" to exist between collective parallel processes, e.g. MOSIX, but these tools provide even more insulation between the programmer and the parallel communications process, wrapping ordinary file and socket I/O calls in kernel-based network layer insertions. Or, of course, one can create a file based I/O layer on top of e.g. NFS or any other synchronous shared file system. These tend to be relatively inefficient and slow, but for some purposes this doesn't matter and I've certainly used NFS (with rsh, rcp, and expect) as an "IPC" mechanism for embarrassingly parallel master/slave task management and communications for very large clusters on different networks where PVM might have been rude. Because of resource limitations in e.g. the number of open socket connections in the kernel, sometimes this sort of trick can come in handy for other reasons as well. Let's see, to finish up: PVM and MPI are both available in most of the major linux distributions as well as from their home sites. Use a search engine or visit www.beowulf.org, the beowulf underground (linked to the main beowulf page) or nearly any other major beowulf's website e.g. www.phy.duke.edu/brahma to find links to PVM and MPI sites. Hope this helps: rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: question about the operation of beowulf
- Next message: Scyld disk in UK
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
