[Beowulf] How to know if infiniband network works?

Jon Tegner tegner at renget.se
Thu Aug 3 10:54:07 PDT 2017


Isn't latency over RDMA a bit high? When I've tested QDR and FDR I tend 
to see around 1 us (using mpitests-osu_latency) between two nodes.

/jon

On 08/03/2017 06:50 PM, Faraz Hussain wrote:
> Here is the result from the tcp and rdma tests. I take it to mean that 
> IB network is performing at the expected speed.
>
> [hussaif1 at lustwzb5 ~]$ qperf lustwzb4 -t 30 tcp_lat tcp_bw
> tcp_lat:
>     latency  =  24.2 us
> tcp_bw:
>     bw  =  1.19 GB/sec
> [hussaif1 at lustwzb5 ~]$ qperf lustwzb4 -t 30 rc_lat rc_bw
> rc_lat:
>     latency  =  7.76 us
> rc_bw:
>     bw  =  4.56 GB/sec
> [hussaif1 at lustwzb5 ~]$
>
>
> Quoting Jeff Johnson <jeff.johnson at aeoncomputing.com>:
>
>> Faraz,
>>
>> I didn't notice any tests where you actually tested the ip layer. You
>> should run some iperf tests between nodes to make sure ipoib functions.
>> Your infiniband/rdma can be working fine and ipoib can be dysfunctional.
>> You need to ensure the ipoib configuration, like any ip environment, is
>> configured the same on all nodes (network/subnet, netmask, mtu, etc) and
>> that all of the nodes are configured for the same mode (connected vs
>> datagram). If you can't run iperf then there is something broken in the
>> ipoib configuration.
>>
>> --Jeff
>>
>> On Thu, Aug 3, 2017 at 8:41 AM, Faraz Hussain <info at feacluster.com> 
>> wrote:
>>
>>> Thanks for everyone's help. Using the Ohio State tests, qperf and
>>> perfquery I am convinced the IB network is working. The only thing that
>>> still bothers me is I can not get mpirun to use the tcp network. I 
>>> tried
>>> all combinations of --mca btl to no avail. It is not important, more 
>>> just
>>> curiosity.
>>>
>>>
>>>
>>> Quoting Michael Di Domenico <mdidomenico4 at gmail.com>:
>>>
>>> On Thu, Aug 3, 2017 at 10:10 AM, Faraz Hussain <info at feacluster.com>
>>>> wrote:
>>>>
>>>>> Thanks, I installed the MPI tests from Ohio State. I ran osu_bw 
>>>>> and got
>>>>> the
>>>>> results below. What is confusing is I get the same result if I use 
>>>>> tcp or
>>>>> openib ( by doing --mca btl openib|tcp,self with my mpirun command 
>>>>> ). I
>>>>> also
>>>>> tried changing the environment variable: export 
>>>>> OMPI_MCA_btl=tcp,self,sm
>>>>> .
>>>>> Results are the same regardless of tcp or openib..
>>>>>
>>>>> And when I do ifconfig -a I still see zero traffic reported for 
>>>>> the ib0
>>>>> and
>>>>> ib1 network.
>>>>>
>>>>
>>>> if openmpi uses RDMA for the traffic ib0/ib1 will not show traffic,
>>>> you have to use perfquery
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>>>> Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>>
>>
>>
>> -- 
>> ------------------------------
>> Jeff Johnson
>> Co-Founder
>> Aeon Computing
>>
>> jeff.johnson at aeoncomputing.com
>> www.aeoncomputing.com
>> t: 858-412-3810 x1001   f: 858-412-3845
>> m: 619-204-9061
>>
>> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
>>
>> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list