[Beowulf] slow mpi init/finalize

Peter Kjellström cap at nsc.liu.se
Tue Oct 17 07:12:22 PDT 2017


On Tue, 17 Oct 2017 09:51:43 -0400
Michael Di Domenico <mdidomenico4 at gmail.com> wrote:

> On Tue, Oct 17, 2017 at 8:54 AM, Peter Kjellström <cap at nsc.liu.se>
> wrote:
> >> however, your test above fails on my machines
> >>
> >> user at n1# ib_acme -d n3
> >> service: localhost
> >> destination: n3
> >> ib_acm_resolve_ip failed: cannot assign requested address
> >> return status 0x0  
> >
> > Did this fail instantly or with the typical ~1m timeout?  
> 
> it fails instantly.

Then probably this is not the problem.

Also, I noted that you provided some contradicting data in a post to
the openfabrics users list. The output there included references to
qib0 (truescale infiniband) while in this thread you started by saying
FDR10 (which is only available on mellanox infiniband).

The two situations are quite different wrt. MPI protocol stack and as
such debugging.

On truescale IntelMPI may run on tmi that runs on psm (as opposed to
IntelMPI->dapl->daploucm).

/Peter K


More information about the Beowulf mailing list