[Beowulf] mpi slow pairs

Michael Di Domenico mdidomenico4 at gmail.com
Fri Aug 29 08:30:09 PDT 2014


On Fri, Aug 29, 2014 at 9:32 AM, John Hearns <John.Hearns at viglen.co.uk> wrote:
> I would say the usual tool for that pair-wise comparison is Intel IBM
> https://software.intel.com/en-us/articles/intel-mpi-benchmarks
> I hope I have got your requirement correct!

John,

Close, but not exact.  IMB will test ranks, but will not tell me if a
specific pair of ranks is slower then others, only the collective of
the ranks under test.  what i'm looking for is an mpi version of this

for x in node1->node100
for y in node1->node100
if x==y then skip
else mpirun -n 2 -npernode 1 -host $x,$y bwtest > $x$y.log

unfortunately, the mpirun task takes about 3secs per iteration, and
with 10k iterations, it's going to take along time and i'm being
impatient.  i've been trying to write the mpi code myself, but my mpi
is a little rusty so it's slow going...

> Also have you run  ibdiagnet to see if anything is flagged up?

i've run a multitude of ib diags on the machines, but nothing is
popping out as wrong.  what's weird is that it's only certain pairing
of machines not any one machine in general.


More information about the Beowulf mailing list