[Beowulf] How to know if infiniband network works?

Jeff Johnson jeff.johnson at aeoncomputing.com
Thu Aug 3 09:16:14 PDT 2017


Faraz,

I didn't notice any tests where you actually tested the ip layer. You
should run some iperf tests between nodes to make sure ipoib functions.
Your infiniband/rdma can be working fine and ipoib can be dysfunctional.
You need to ensure the ipoib configuration, like any ip environment, is
configured the same on all nodes (network/subnet, netmask, mtu, etc) and
that all of the nodes are configured for the same mode (connected vs
datagram). If you can't run iperf then there is something broken in the
ipoib configuration.

--Jeff

On Thu, Aug 3, 2017 at 8:41 AM, Faraz Hussain <info at feacluster.com> wrote:

> Thanks for everyone's help. Using the Ohio State tests, qperf and
> perfquery I am convinced the IB network is working. The only thing that
> still bothers me is I can not get mpirun to use the tcp network. I tried
> all combinations of --mca btl to no avail. It is not important, more just
> curiosity.
>
>
>
> Quoting Michael Di Domenico <mdidomenico4 at gmail.com>:
>
> On Thu, Aug 3, 2017 at 10:10 AM, Faraz Hussain <info at feacluster.com>
>> wrote:
>>
>>> Thanks, I installed the MPI tests from Ohio State. I ran osu_bw and got
>>> the
>>> results below. What is confusing is I get the same result if I use tcp or
>>> openib ( by doing --mca btl openib|tcp,self with my mpirun command ). I
>>> also
>>> tried changing the environment variable: export OMPI_MCA_btl=tcp,self,sm
>>> .
>>> Results are the same regardless of tcp or openib..
>>>
>>> And when I do ifconfig -a I still see zero traffic reported for the ib0
>>> and
>>> ib1 network.
>>>
>>
>> if openmpi uses RDMA for the traffic ib0/ib1 will not show traffic,
>> you have to use perfquery
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>



-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson at aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20170803/2e835b63/attachment-0001.html>


More information about the Beowulf mailing list