[Beowulf] MPI + IB question

Bogdan Costescu bcostescu at gmail.com
Thu Nov 15 03:02:18 PST 2012


On Thu, Nov 15, 2012 at 11:26 AM, Jörg Saßmannshausen
<j.sassmannshausen at ucl.ac.uk> wrote:
> One of the older clusters has
> Mellanox MT23108 cards and a Voltaire sLB-24 switch, the newer cluster has
> Mellanox MT26428 with a QLogic 12300 switch.

You compare different IB cards as well, not only different switches.

> All clusters are running Debian
> Squeeze, all of them are 64 bit machines and all of them have the the required
> packages for the IB network installed.

Are the IB drivers at the same version ?

> That crashes immediately, and I have included the verbose output of that in
> the attached file.

This is not really a crash... it actually tells you politely that it
couldn't reach other ranks and terminates. The following lines:

  Process 1 ([[5187,1],1]) is on host: node24
  Process 2 ([[5187,1],0]) is on host: node32
  BTLs attempted: self sm

mean that the only qualified to continue BTLs were self and sm, none
of which allows inter-node communications. Very likely tcp (which you
disabled) was the only inter-node BTL available. So now it's up to you
to find out why openib BTL could not be selected...

> However, if I am not using the cluster with the Voltair
> switch (described above) but the one with the more recent Qlogic switch and
> _copy_ the binary just over, it is working.

You are copying the binary. Are you also copying the IB drivers/libs ?
Is IB configured the same way ? Is the OpenMPI lib compiled to
dynamically look for components ? If so, does it find the IB libs in
the right places ?

> However, from the above observation (and I got a very similar case wit NWchem)
> it appears to me that the program GAMESS-US has problems with the Voltair
> network but no problems with the Qlogic network . That is something I find a
> bit puzzling.

You can make it even simpler: are you able to run a simple MPI
hello/pi calculation/etc. program when forcing OpenMPI to use the
openib BTL and use several nodes ?

Cheers,
Bogdan



More information about the Beowulf mailing list