[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

John Hearns hearnsj at googlemail.com
Tue Apr 30 09:24:30 PDT 2019


Hello Faraz.  Please start by running this command    ompi_info

On Tue, 30 Apr 2019 at 15:15, Faraz Hussain <info at feacluster.com> wrote:

> I installed RedHat 7.5 on two machines with the following Mellanox cards:
>
> 87:00.0 Network controller: Mellanox Technologies MT27520 Family
> [ConnectX-3 Pro
>
> I followed the steps outlined here to verify RDMA is working:
>
>
> https://community.mellanox.com/s/article/howto-enable-perftest-package-for-upstream-kernel
>
> However, I cannot seem to get Open MPI 3.0.2 to work. When I run it, I
> get this error:
>
> --------------------------------------------------------------------------
>
> No OpenFabrics connection schemes reported that they were able to be
>
> used on a specific port. As such, the openib BTL (OpenFabrics
>
> support) will be disabled for this port.
>
>
>   Local host:      lustwzb34
>
>   Local device:     mlx4_0
>
>   Local port:      1
>
>   CPCs attempted:    rdmacm, udcm
>
> --------------------------------------------------------------------------
>
> Then it just hangs till I press control C.
>
> I understand this may be an issue with RedHat,  Open MPI or Mellanox.
> Any ideas to debug which place it could be?
>
> Thanks!
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190430/61cfdeea/attachment.html>


More information about the Beowulf mailing list