[Beowulf] Good IB network performance when using 7 cores, poor performance on all 8?

Greg Lindahl lindahl at pbm.com
Sat Apr 26 18:24:06 PDT 2014


On Thu, Apr 24, 2014 at 11:31:56AM -0400, Brian Dobbins wrote:
> Hi everyone,
> 
>   We're having a problem with one of our clusters after it was upgraded to
> RH6.2 (from CentOS5.5) - the performance of our Infiniband network degrades
> randomly and severely when using all 8 cores in our nodes for MPI,... but
> not when using only 7 cores per node.

Sounds to me like you aren't using the special libpsm library needed
for good MPI perf with your IB cards. It's supported in OpenMPI, and
ought to be invoked by default if present... maybe it isn't installed?

If you've got everything installed the right way, there should be a
program called ipath_checkout that examines your hardware and software
and tells you if everything is OK. We never thought our customers
should have to try to debug things by writing code!

-- greg




More information about the Beowulf mailing list