[Beowulf] IB problem/using IB diagnostics

Prentice Bisbal prentice at ias.edu
Thu Jun 18 08:20:47 PDT 2009


One of my nodes has an IB problem:

--------------------------------------------------------------------------
WARNING: There is at least on IB HCA found on host 'node36.aurora', but
there is
no active ports detected. This is most certainly not what you wanted.
Check your cables and SM configuration.
--------------------------------------------------------------------------


I've been trying to use some of the IB diagnostics tools to diagnose the
problem, but I can't seem to figure out the correct syntax of the
commands. On a node that I know is working correctly (the master node):

# ibaddr
GID 0xfe800000000000000005ad00001ffb29 LID start 0x1 end 0x1

# ibping 1
ibwarn: [14756] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)
ibwarn: [14756] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)

# ibping 0x1
ibwarn: [14758] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)
ibwarn: [14758] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)

ibping -G 0xfe800000000000000005ad00001ffb29
ibwarn: [14763] ib_path_query_via: sa call path_query failed
ibping: iberror: failed: can't resolve destination port
0xfe800000000000000005ad00001ffb29

What's am I doing wrong? How can I verify my IB configuration/SM
configuration before I start pulling IB cables?

-- 
Prentice



More information about the Beowulf mailing list