[Beowulf] slow mpi init/finalize
Michael Di Domenico
mdidomenico4 at gmail.com
Tue Oct 17 09:17:51 PDT 2017
On Tue, Oct 17, 2017 at 12:01 PM, Peter Kjellström <cap at nsc.liu.se> wrote:
> That is still very slow. For reference I timed 1024 rank startup on one
> of our systems with IntelMPI and dapl on ucm and it's a bit below 0.5s
> depending on how you time it (some amount of lazy init is happening).
i didn't specifically time it, so my "few seconds" might be inline
with your .5 second
> Either way, with 60s time scales and ibacm so broken it fails instantly
> I suspect you have some hostname/dns/tcp-ip-on-eth or other fundamental
> problem somewhere.
it's certainly possible. unfortunately the documentation is lacking
and no one on the ofa list wants to help and i don't have time to
trounce through source code to figure out what's going on. at some
point i'll figure it.
but clearly something is wonky, at least i can set aside the hardware
aspect for now.
More information about the Beowulf