[Beowulf] MorphMPI based on fortran itf (was: MPI ABI)

Wed Oct 12 11:06:54 PDT 2005

On Wed, 2005-10-12 at 10:42 -0700, Greg Lindahl wrote:
> On Wed, Oct 12, 2005 at 12:05:13PM +0100, Ashley Pittman wrote:
> 
> > Thirdly is the performance issue, any MPI vendor worth his salt tries
> > very hard to reduce the number of function calls and library's between
> > the application and the network, adding another one is a step in the
> > wrong direction. This may not matter so much for ethernet clusters but
> > certainly for some people the software stack accounts for a surprising
> > percentage of "network" latency.
> 
> OK, so that's a new item for the Technical List Of Things To Do:
> measure the overhead. I suspect it'll turn out to be small, even for
> interconnects that care about 50 nanoseconds of additional overhead.

As it turns out I'm in a position to measure this fairly easily, our MPI
sits on top of a library called libelan, this does all the tag matching
at a very low level, all MPI does is convert the communicator into a bit
pattern, calculate the length from the type and count and convert from
lrank to grank, passing the call on.  Using tping to measure libelan
directly I get 1.24 uSec for a zero byte message, using mping I get 1.34
for the same message.  The only difference is the MPI layer of the
stack.  8% is worth caring about.

Of course you could argue that our MPI library is actually doing a fair
amount of work but I'm not sure that it is, a couple of array lookups,
some bit shifting and a few multiplies.  Even with today's processors
code isn't free and it seems to be that this is amplified when you have
to jump between library's to get at that code.

Regardless of the numbers this is a *high performance* industry and
doing this would be a step in the wrong direction.

If you want figures for shared vs static vs function pointer lookup
tables there was a paper about it at the MPI conference earlier this
month, IIRC the conclusion was (the paper is at my house so not to hand)
that it made little difference but as I said before apples and oranges,
that was for function redirection.

Ashley,