FW: MPI cases and PVM

Sat Mar 16 11:02:19 PST 2002

On Thu, 14 Mar 2002, Anand Singh Bisen wrote:

>  
> Hello
>  
> I dont want to start off with a flame i just want to ask that if i have
> a homogenous cluster i.e. all the Nodes are exactly same with a high
> speed interconnect then which API should i use PVM or MPI. I know there
> might be some areas where PVM must be better and some where MPI so i
> just wanted to know in which cases PVM should be used and where all MPI.
> I cant discuss my computational problem because i am not supposed to. 
>  
> Anand Singh Bisen (abisen at iupui.edu)
> Graduate Student @ Purdue School of Science (CS)

Either one or both.  There used to be a lovely white paper on PVM vs MPI
here:

  http://www.epm.ornl.gov/pvm/PVMvsMPI.ps

that compares their features and talks about when one might be preferred
over the other.  Frankly I don't think it much matters -- most people I
know use one or the other depending as much on their personal history in
parallel computing as a rational decision process.

A VERY brief recapitulation of the history and features that are apropos
to a rational decision are something like:

 PVM: Developed at ORNL to facilitate parallel supercomputing on
commodity workstations.  I'd even say that the development of PVM was
"the" critical enabling technology for beowulfery, and there were lots
of people, myself included, who did massive parallel computations on
large clusters of e.g. Sun or Digital workstations using PVM back before
Linux.  Note that even using a bunch of relatively expensive Suns or
SGIs or Decstations as nodes still beat the hell out of buying a Cray or
CM5, especially if a lot of ths Suns were already in place on desktops.

PVM was designed from the beginning to run with the (TCP and UDP IP)
network as "the" IPC channel in a heterogeneous environment.  It uses a
nifty "AI" expert system to determine architecture of hosts it is
installed on and creates a tree of binary directories so that a task can
be compiled on several architectures and run across them.  It doesn't to
automated load balancing but it isn't horribly difficult to balance
load across systems with different speeds either, depending on the type
of task.

The PVM library basically passes packed messages and provides various
routines and tools to manage e.g. task spawning and parallel process
control.

  MPI:  Back when people were spending large amounts of government money
for massive parallel supercomputers, vendors generally provided their
own proprietary API to use the parallel features of their big systems.
That meant that you first bought a supercomputer, then you spent months
to years porting your application to use its proprietary language
interface, then it became obsolete (often before you finished porting:-)
and then you bought another and started over.  Sometimes you even made
it into production for a while in between;-).

After a few multimillion dollar passes through this cycle, the
government finally decided that it had had enough and told the
supercomputer manufacturers that either they came up with a portable API
or no more government funded supercomputers would be purchased.  Faced
with that (and the fact that damn near nobody BUT the government could
afford them) a consortium was formed that wrote the MPI spec, and
vendors (all hoping for a Microsoftish monopoly fueled by the high cost
of porting out of their proprietary API's) reluctantly participated and
complied, although a lot of them still offered and touted their
proprietary interfaces as well. (Forgive me if this isn't perfectly
accurate -- I'm doing this from memory).

MPI was almost immediately turned into a PVM-like language that would
support the creation of virtual parallel supercomputers out of Unix
workstations, but I >>think<< that they came in well after PVM in this
arena more or less as a network device IPC channel in open source MPI
implementations for MP Unix systems.

MPI is obviously a message passing interface and also provides job
creation and management tools.  I personally started with PVM and am
therefore far from an MPI expert but my impression is that it hides more
of the details of the parallel supercomputer from the user and is
thereby arguable more scalable for straightforward applications although
perhaps not so well suited to custom applications or load balancing.

As I said before, which of them one uses is largely determined by one's
history and to a lesser extent where you plan to run your code.  If you
came from big iron to beowulfs, you will almost certainly have MPI based
code and will want to use MPI.  If you need to run code on BOTH a
beowulf AND big iron, you will likely want to use MPI as MPI is likely
to be the parallel API on the big iron.  If you ran on a NOW, or did odd
computations that used a Cray for vector code and a possibly
heterogeneous NOW to do parallel blocks of computations or if you just
wanted something to facilitate coarse grained to embarrassingly parallel
job distribution in a master-slave paradigm, you probably started with
PVM and use PVM to this day on beowulfs (which tend to be at least speed
heterogeneous after the first year as newer nodes get mixed in with the
old).  MPI is arguably a tiny bit better in its basic design, although
given the quality of the authors of PVM (whom I think very highly of) it
is a tiny bit indeed.

One area where MPI might hold a small advantage is in low-level network
device support, specifically Myrinet.  I think MPI has native Myrinet
drivers and can avoid TCP altogether.  I don't really know if PVM does
also at this point (although somebody on the list that uses Myrinet
probably does:-).  OTOH, MPICH at least has been plagued with TCP
problems over the years and may be yet for all that I know (again, I
expect that somebody currently expert will say a word one way or
another:-).  And then there is also LAM-MPI -- with MPI you have a bit
of a choice of implementations while with PVM there is really just one.

Hope this helps.  Although I'm a fairly satified PVM user because of MY
history, I've tried to be balanced in my treatment of the two.  As I
said, I don't think it matters terribly from the point of view of
performance (except where one or the other might have weaknesses in a
specific communication stack) or ease of programming, but it does matter
in terms of portability and maybe support of heterogeneous operation.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu