[Beowulf] problem of mpich-1.2.7p1

David Mathog mathog at caltech.edu
Thu Feb 4 10:04:34 PST 2010


Gus Correa <gus at ldeo.columbia.edu> wrote
> If you already set up passwordless ssh across the nodes
> OpenMPI will probably get you up and running faster than MPICH2.
> 
> OpenMPI is very easy to install, say, with gcc, g++, and
> gfortran (make sure you have them installed on your main machine,
> use Ubuntu apt-get, if you don't have them).

Well on Linux maybe, but since OpenMPI has been soundly kicking my butt
trying to get it installed and working on a Solaris 5.8 Sparc system for
the last day, I can't let that slide as a general statement.

OpenMPI 1.4.1 needed a few minor code mods to build at all using gcc on
this system (it expects some defines that aren't present, this is with
the sunfreeware gcc versions), and those mods were just about counting
CPUs, which wasn't an issue in this case because it is a single CPU
system. These same issues were also reported by another fellow for 1.3.1
on a Solaris 8 system:

 http://www.open-mpi.org/community/lists/users/2009/02/7994.php
 
The gcc version works so long as mpirun only sends jobs to itself.
Sadly, try to send ANYTHING to a remote machine (linux Intel, in case
that matters) and it treats one to:

 mca_oob_tcp_msg_send_handler:  writev failed:  Bad file descriptor

This on a build with no warnings or errors.  Definitely a problem on the
Solaris side, since any of the linux machines can initiate an mpirun to
another node, or all other nodes, that works with the example programs.
 So with gcc, OpenMPI not too useful for the front end of an MPI cluster.  

Today I'm trying again using Sun's Forte 7 tools, which requires a
fairly complex configure line:

./configure --with-sge  --prefix=/opt/ompi141 CFLAGS="-xarch=v8plusa"
CXXFLAGS="-xarch=v8plusa" FFLAGS="-xarch=v8plusa"
FCFLAGS="-xarch=v8plusa" CC=/opt/SUNWspro/bin/cc
CXX=/opt/SUNWspro/bin/CC F77=/opt/SUNWspro/bin/f77
FC=/opt/SUNWspro/bin/f95 CCAS=/opt/SUNWspro/bin/cc
CCASFLAGS="-xarch=v8plusa" >configure_4.log 2>&1 &

Not sure yet if that is sufficient, as none of the preceding configure
variants resulted in a set of Makefiles which would actually run to
completion, and this one is still building.

Regards,


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list