[Beowulf] Poor bandwith from one compute node

Joe Landman joe.landman at gmail.com
Thu Aug 17 11:10:23 PDT 2017



On 08/17/2017 02:02 PM, Scott Atchley wrote:
> I would agree that the bandwidth points at 1 GigE in this case.
>
> For IB/OPA cards running slower than expected, I would recommend 
> ensuring that they are using the correct amount of PCIe lanes.

Turns out, there is a really nice open source tool that does this for 
you ...

https://github.com/joelandman/pcilist

:D

>
> On Thu, Aug 17, 2017 at 12:35 PM, Joe Landman <joe.landman at gmail.com 
> <mailto:joe.landman at gmail.com>> wrote:
>
>
>
>     On 08/17/2017 12:00 PM, Faraz Hussain wrote:
>
>         I noticed an mpi job was taking 5X longer to run whenever it
>         got the compute node lusytp104 . So I ran qperf and found the
>         bandwidth between it and any other nodes was ~100MB/sec. This
>         is much lower than ~1GB/sec between all the other nodes. Any
>         tips on how to debug further? I haven't tried rebooting since
>         it is currently running a single-node job.
>
>         [hussaif1 at lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
>         tcp_lat:
>             latency  =  17.4 us
>         tcp_bw:
>             bw  =  118 MB/sec
>         [hussaif1 at lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
>         tcp_lat:
>             latency  =  20.4 us
>         tcp_bw:
>             bw  =  1.07 GB/sec
>
>         This is separate issue from my previous post about a slow
>         compute node. I am still investigating that per the helpful
>         replies. Will post an update about that once I find the root
>         cause!
>
>
>     Sounds very much like it is running over gigabit ethernet vs
>     Infiniband.  Check to make sure it is using the right network ...
>
>
>         _______________________________________________
>         Beowulf mailing list, Beowulf at beowulf.org
>         <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>         To change your subscription (digest mode or unsubscribe) visit
>         http://www.beowulf.org/mailman/listinfo/beowulf
>         <http://www.beowulf.org/mailman/listinfo/beowulf>
>
>

-- 
Joe Landman
e: joe.landman at gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman



More information about the Beowulf mailing list