[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

Christopher Samuel chris at csamuel.org
Wed May 1 08:29:21 PDT 2019

On 5/1/19 7:05 AM, Faraz Hussain wrote:

> [hussaif1 at lustwzb34 ~]$ sminfo
> ibwarn: [10407] mad_rpc_open_port: can't open UMAD port ((null):0)
> sminfo: iberror: failed: Failed to open '(null)' port '0'

Sorry I'm late to this.

What does this say?

systemctl status rdma

You should see something along the lines of:

$ systemctl status rdma
‚óŹ rdma.service - Initialize the iWARP/InfiniBand/RDMA stack in the kernel
    Loaded: loaded (/usr/lib/systemd/system/rdma.service; disabled; 
vendor preset: disabled)
    Active: active (exited) since Wed 2019-05-01 03:55:02 AEST; 21h ago
      Docs: file:/etc/rdma/rdma.conf
   Process: 10355 ExecStart=/usr/libexec/rdma-init-kernel (code=exited, 
  Main PID: 10355 (code=exited, status=0/SUCCESS)
    CGroup: /system.slice/rdma.service

That should take take of loading the umad and mad kernel modules from 
memory and without that set up you'll see that sort of error.

All the best,
   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

More information about the Beowulf mailing list