[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?

Christopher Samuel chris at csamuel.org
Wed May 1 08:29:21 PDT 2019


On 5/1/19 7:05 AM, Faraz Hussain wrote:

> [hussaif1 at lustwzb34 ~]$ sminfo
> ibwarn: [10407] mad_rpc_open_port: can't open UMAD port ((null):0)
> sminfo: iberror: failed: Failed to open '(null)' port '0'

Sorry I'm late to this.

What does this say?

systemctl status rdma

You should see something along the lines of:

$ systemctl status rdma
‚óŹ rdma.service - Initialize the iWARP/InfiniBand/RDMA stack in the kernel
    Loaded: loaded (/usr/lib/systemd/system/rdma.service; disabled; 
vendor preset: disabled)
    Active: active (exited) since Wed 2019-05-01 03:55:02 AEST; 21h ago
      Docs: file:/etc/rdma/rdma.conf
   Process: 10355 ExecStart=/usr/libexec/rdma-init-kernel (code=exited, 
status=0/SUCCESS)
  Main PID: 10355 (code=exited, status=0/SUCCESS)
    CGroup: /system.slice/rdma.service


That should take take of loading the umad and mad kernel modules from 
memory and without that set up you'll see that sort of error.

All the best,
Chris
-- 
   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


More information about the Beowulf mailing list