[Beowulf] perl with OpenMPI gotcha?
mathog at caltech.edu
Fri Nov 20 13:43:08 PST 2020
I'm hoping one of you has been to the end of this road already and can
point out what is going wrong.
I have some perl scripts which have been carried along for a couple of
decades now which use PVM to start simple jobs on the compute nodes, wait
for them to finish (listing jobs as they close out), and then cleans
up. Since this is the only thing which PVM is used for it seemed like it
might be (way past) time to migrate that to MPI, specifically OpenMPI
4.0.1, which is what is on the cluster.
There are apparently tricks required, either that, or the test script
does not run on a single standalone machine, or perhaps OpenMPI is not
There are already modules for OpenMPI and bioperl, and I decided to
install Parallel::MPI::Simple into the latter, since it holds all the perl
modules which were not installed with dnf on this CentOS 8 system. Like so:
module load bioperl
module load OpenMPI
cpanm -l $ROOT_BIOPERL Parallel::MPI::Simple 2>&1 \
| tee install_perl_parallel_mpi_simple_2020_11_20.log
(no errors or warnings).
There is a little test program "ic.pl" which comes with Parallel::MPI::Simple,
however just invoking it turns up that it cannot find Simple.so. I have
been down this road before with Perl and MPI with the "Maker" program -
some libraries must be preloaded or they just will not be found by Perl.
Once that is done all the missing library and symbol errors go away. But it still does not run:
[poweredge:04423] *** An error occurred in MPI_Send
[poweredge:04423] *** reported by process [603979777,0]
[poweredge:04423] *** on communicator MPI_COMM_WORLD
[poweredge:04423] *** MPI_ERR_RANK: invalid rank
[poweredge:04423] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[poweredge:04423] *** and potentially your MPI job)
Any idea what might be wrong here?
Also, searching turned up very little information on using MPI with perl.
(Lots on using MPI with other languages of course.)
The Parallel::MPI::Simple module is itself almost a decade old.
We have a batch manager but I would prefer not to use it in this case.
Is there some library/method other than MPI which people typically use
these days for this sort of compute cluster process control with Perl
from the head node?
More information about the Beowulf