[Beowulf] MorphMPI based on fortran itf (was: MPI ABI)

Robert G. Brown rgb at phy.duke.edu
Wed Oct 12 05:53:41 PDT 2005


Ashley Pittman writes:

> Personnel I think a MPI ABI would be a good thing however this is not
> the way to do it.

And this is exactly right.  Futhermore, we all know the right way to do
it.  It is for a new governing body or consortium to be established (or
more likely the old MPI Forum body promoted) to which all/most of the
MPI makers subscribe.  Let's call this imaginary body the "MPIETF" (MPI
Engineering Task Force) in homage of sorts, since MPIABITF is a bit
long...;-)

This board will need to have a certain amount of armtwisting capability
that approaches the level of a "mandate" so that participants pretty
much have to play even when it isn't in their interest to do so.  To be
blunt, the commercial MPIs have little to gain from an ABI and much to
lose -- they generally prefer for customers to have a significant energy
barrier to changing state lest those customers discover that a competing
MPI works just as well and costs less money and hence is ultimately a
lower energy state (sorry, physicsese for a generic description of
state change).  

As Greg has pointed out, there are a few agents that have the capability
of providing such a mandate.  The government is the obvious one and is
sufficient all by itself.  However, a number of other agents have a
stake in seeing an MPI ABI develop, including the HPC consumer, and
wait!  We're major MPI consumers right here on this list!  There also
exist bully pulpits (e.g. Linux Magazine, Cluster Monkey) where we can
lean on commercial MPIs to play, if only by publishing reviews of their
products pointing out their non-compliance with an ABI adopted by the
bulk of the other MPIs.  The government may therefore not be NECESSARY
if enough non-government agents signed on by announcing that their
groups would both participate and that post-release would ONLY directly
support MPIs in compliance with the ABI.

One group that almost by themselves constitute a sufficient group is the
open source MPIs.  Presumably the open source MPIs would be charter
members of the MPIETF as they do NOT have a commercial stake or any deep
interest in locking users into (or conversely out of) their products.
If anything, an ABI would make them more easily browseable as an
alternative in the open source bazaar.  They already have a record of
cooperation and differentiate their products and seek user participation
at a much higher level than by the crude mechanism of making it
difficult for users to port code out of their MPI into somebody else's.
An ABI might well make their jobs easier, as it would make it simpler to
share code fragments and (especially) device drivers between the open
source MPIs and hence spur the genetic optimization and development
process.

If ONLY the open source MPIs established a common ABI and published it
as an open standard, it would provide considerable armtwisting power all
by itself for the commercial MPIs to convert, as companies that release
products that run on top of MPI could then easily do all the open source
MPIs with a single build and binary release, where they'd have to do
separate versions for each of the non-compliant MPIs at considerable
additional hassle and expense.  An ABI would also, I think, make it
easier (would it not?) for the manufacturers of the high-end networking
cards used in advanced clusters to release "universal drivers" that
would work for any of the ABI-compliant MPIs, and collectively they'd
very nearly constitute a "sufficient" group all by themselves, once
there were enough MPIs available that were ABI-compliant to let them
announce that they were going to no longer support non-compliant MPIs
after such and such a date without risking a loss of business as a
consequence.

Once constituted and mandated/motivated by adequate outside
participation, it SOUNDS like it is fairly straightforward to identify
the primary issues associated with defining the ABI.  The only thing
that I can see being a real problem to overcome is the issue of
backwards compatibility per each old MPI/vendor.  This could be dealt
with two ways.

One is he "usual" one of just announcing that version x.y.z will be the
last version of the old My_MPI ABI and that x+1.0.0 will comply with the
new ABI standard.  Users can choose to freeze at x.y.z with the
understanding that it will no longer be supported after such and such a
date, or they can recompile now.  ALL THAT SHOULD BE NEEDED in most
cases is a straight recompile AFAICT, presuming that the MPIs are
already in compliance with the MPI-2 standards at the API level and that
the programmers didn't bypass this interface to try to call low level or
internal functions directly.

Alternatively a vendor can choose to support both their old and the new
standard ABI and save their 'clients' the recompile, indefinitely.  I
think that it has been pointed out in this discussion that this is at
least in principle possible (or that it may be possible in some/most
cases) at the expense of some ugly hackery that makes one of the two
interfaces relatively inefficient.  However, I'd expect market forces to
force the elimination of the non-compliant interface fairly rapidly, and
this WOULD be a significant maintenance burden (and source of bugs) to
the MPI vendor.

Either way, this approach overall in the end requires no end-user
hackery or ugly #ifdef'd code or complex Makefiles or difficult ports.
It forces a very useful discussion amongst the MPI developers as to what
matters (and what does not) at the back end -- most of which sounds like
it is a matter of agreeing on the NAMES of low-level functions and
perhaps on the definitions of certain data types and constants -- all of
which have either rational "best" answers or answers that don't really
matter so pick one and forget it (MPI_ vs mpi_ vs Mpi_, TRUE = -1 vs
TRUE = 1, sheesh).  Most of the actual work of conversion sounds like it
could be dealt with, per MPI, by simply running a fairly complex sed
script on the MPI sources and include files, at least for the C
interface.  As in not, actually, horribly difficult or likely to take
too long.

   rgb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20051012/12f96240/attachment.sig>


More information about the Beowulf mailing list