[Beowulf] How to debug error with Open MPI 3 / Mellanox / Red Hat?
chris at csamuel.org
Wed May 1 08:29:21 PDT 2019
On 5/1/19 7:05 AM, Faraz Hussain wrote:
> [hussaif1 at lustwzb34 ~]$ sminfo
> ibwarn:  mad_rpc_open_port: can't open UMAD port ((null):0)
> sminfo: iberror: failed: Failed to open '(null)' port '0'
Sorry I'm late to this.
What does this say?
systemctl status rdma
You should see something along the lines of:
$ systemctl status rdma
● rdma.service - Initialize the iWARP/InfiniBand/RDMA stack in the kernel
Loaded: loaded (/usr/lib/systemd/system/rdma.service; disabled;
vendor preset: disabled)
Active: active (exited) since Wed 2019-05-01 03:55:02 AEST; 21h ago
Process: 10355 ExecStart=/usr/libexec/rdma-init-kernel (code=exited,
Main PID: 10355 (code=exited, status=0/SUCCESS)
That should take take of loading the umad and mad kernel modules from
memory and without that set up you'll see that sort of error.
All the best,
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the Beowulf