[Beowulf] slow mpi init/finalize

Michael Di Domenico mdidomenico4 at gmail.com
Tue Oct 17 09:17:51 PDT 2017


On Tue, Oct 17, 2017 at 12:01 PM, Peter Kjellström <cap at nsc.liu.se> wrote:
>
> That is still very slow. For reference I timed 1024 rank startup on one
> of our systems with IntelMPI and dapl on ucm and it's a bit below 0.5s
> depending on how you time it (some amount of lazy init is happening).

i didn't specifically time it, so my "few seconds" might be inline
with your .5 second

> Either way, with 60s time scales and ibacm so broken it fails instantly
> I suspect you have some hostname/dns/tcp-ip-on-eth or other fundamental
> problem somewhere.

it's certainly possible.  unfortunately the documentation is lacking
and no one on the ofa list wants to help and i don't have time to
trounce through source code to figure out what's going on.  at some
point i'll figure it.

but clearly something is wonky, at least i can set aside the hardware
aspect for now.

thanks


More information about the Beowulf mailing list