[Beowulf] compilers vs mpi?

Tue Jul 20 13:43:47 PDT 2010

HI Mark

Mark Hahn wrote:
>>> between the application and MPI.  that is, I would like to be able
>>> to compile MPI (say, OpenMPI) with gcc, and expect it to work 
>>> correctly with apps compiled with other compilers.  I guess I'm 
>>> reasoning by analogy to normal distro libs.
>>>
>>
>> I haven't built OpenMPI this way,
>> but you may try to link statically with commercial compiler libraries
>> (say -static-intel, -Bstatic_pgi),
> 
> I'd rather build with gcc if possible.  I guess I'd be surprised if 
> there were compute-intensive-enough parts of MPI to justify using some
> other compiler.  

You are probably right, and gcc is so germane to Linux, so why bother 
with other C compilers, unless they are significantly faster.
I have builds with icc and pgcc to avoid trouble and too much digging
into Makefiles, etc, to fix things
when the code or the user prefers to use the commercial compiler.

> (please, if anyone has any quantitative observations on 
> the quality of current compilers, let me/list know!)
> 

I guess this candid question may flare yet another war.

I don't have direct comparisons.
I remember some discussion some time ago I don't remember where,
on how memcpy is done in gcc vs. icc, and how efficient each one is.
This is presumably important for MPI, as memcpy is likely to be on
the base of all "intra-node" MPI communication.

>> Yes, they do recommend compiler homogeneity.
>> However, I have built hybrids gcc+ifort
>> and gcc+pgf90 and both work fine.
>> (I have the homogeneous versions also.)
> 
> oh.  so the idea here is that the C part of OpenMPI has an ABI
> which is compatible with basically all the other C compilers,
> such as would be used to compile app-side code.  but that the fortran 
> side has to be matched, library and app sides?  if that's the case,
> then would it make sense to factor out the fortran interface?

I don't know the guts of OpenMPI, but I believe the Fortran 77 and 90
interfaces build on top of the C interface.
I don't really know if the OpenMPI
ABI is compatible across all C compilers.

Since most of the code here is Fortran, with a few tidbits of C,
I try to provide a variety of MPI builds for the commercial
compilers around (plus gfortran, and now openf90, which I have yet to test).
Some programs just refuse to compile with one commercial compiler,
but may compile with the other.
Short from modifying the code (which we often have to do),
this creates the need for several MPI
compiler wrappers, built with different Fortran compilers.
I haven't seen this happen with C programs, but as I said,
there is not much C code in our area.

I didn't mean to factor Fortran out, although your interpretation
of it is interesting.
The gcc+some_fortran hybrids I built were mostly because:
1) we didn't have icc for a while (only funds to buy ifort), although
more recently we bought the whole compiler suite;
2) pgcc for a while had trouble building OpenMPI, although the problem 
is now gone.
Very mundane reasons, but the hybrids work.

> 
>> Fortran77 never had these features anyway, and I guess
>> mpif77 doesn't check if you are passing an integer
>> where it should be a real, or if your argument list is shorter
>> than the function requires.
> 
> so if I have f90 code that uses an mpi header (not .mod interface),
> does that mean there's no function signature checking at all?
> as far as I know, my organization has never done .mod-based MPI,
> so maybe this is why we're facing the issue now, after 10 years and 4k 
> users ;)
> 

There is quite a bit of code here written with Fortran90 constructs,
but that has "#include mpif.h", instead of "use mpi",
and some with "use mpi".
I think this is for historic reasons, because the MPI F90 interface
may not have been very good in the past.
The mpif90 wrapper compiles both cases.
If I remember right, the mpif77
(as long as it is built with F77=FC=[ifort,pgf90,gfortran])
also compiles the first case (because the underlying compiler is 
actually a F90 compiler), but not the second.
Of course, you need to build the MPI Fortran90 interface to do
"use mpi", and you must use mpif90 in this case.

If I remember right (somebody please correct me if I am wrong),
the MPI subroutine/function calls are the same in F77 and F90,
the main difference is that the MPI F90 bindings add
type and interface checking, the OO-universe that is absent in F77.

>>> PS: we have a large and diverse user base, so tend to have to support 
>>> gcc, intel, pathscale and pgi. 
>>
>> ... and don't forget Open64!  :)
> 
> well, that's an interesting point.  I haven't quite figured out who is 
> doing
> the canonical release for Open64 nowadays (highest ver number seems to 
> be from AMD).  have you done any comparisons?
> 

Well, I built OpenMPI 1.4.2 with Open64, tested the basic functionality,
but I have yet to find time to compile and run one or two 
atmosphere/climate/ocean codes here with the Open64 OpenMPI
wrappers to see if it outperforms builds with the other compilers.
I am curious about this one because we have Opteron quad-core,
and I was wondering if the AMD-sponsored compiler would do better
than the Intel compiler (which doesn't let me use anything beyond
SSE2, -xW on the Opterons, if I remember right).

Unfortunately, this type of comparison can take quite some time,
if you try to tweak with optimization, check if the results are OK
(in IA64 I had some bad surprises with hidden/bundled
optimization flags), test also with MVAPICH2, and so on.
I can't possibly test everything, I have production runs to do,
I am also one of my users!

>>> we even have people who want to use
>>> intel's damned synthetic 128b FP over MPI :(
>>
>> It's hard to keep the customer satisfied.
>> You give them the sky, they want the universe.
> 
> for me, the real problem is knowing whether the user understands that 
> synthetic 128b FP is drastically slower than 64b hardware FP.  
 > has anyone
 > tried to do a comparison?
 >
 > thanks, mark.

 From what I observe here, the primary level of astonishment and
satisfaction for most users is: "It works!".
"It runs faster than on my laptop." comes later, if ever.
Only a few users try to compare.
If you gave them a functional synthetic 128b FP you may
have already accomplished a lot.

In the specific case of 128 bit arithmetic I wonder if you can
make it run fast in 64 bit machines.
There was a discussion, maybe here, a few weeks ago about this, right?
Or was it in one of the MPI lists?
It was about why one would need 128 bit arithmetic,
and whether this would be more of an issue with
a possibly poor/noisy algorithm/numerics.

Long ago I banned Matlab from our old cluster, because of abusive
behavior on the head node, etc.
Recently I set it up again to run in batch mode on the compute nodes.
I thought that would be a reasonable compromise, and drive some
people to run their heavy Matlab calculations in the cluster,
instead of on their desktops.
(A lot of post-processing of climate data is done in Matlab.)
Well, nobody got interested.
Trying to please users beyond their strict requests, specially when
this requires changing their habits, may not necessarily work.

My $0.02
Gus Correa