[Beowulf] IPoIB arp's disappearing
Michael Di Domenico
mdidomenico4 at gmail.com
Thu Jul 10 03:36:14 PDT 2008
I'm having a bit of a weird problem that i cannot figure out. If anyone can
help from the community it would be appreciated.
Here's the packet flow
cn = compute node
io = io node
pan = panasas storage network
We have 12 shelves of panasas network storage on a seperate network, which
is being fronted by bridge servers which are routing IPoIB traffic to 10G
ethernet traffic. We're using Mellanox Connect-X Ethernet/IB adapters
everwhere. We're running Ofed 1.3.1 and the latest firmwares for IB/Eth
Here's the problem. I can mount the storage on the compute nodes, but if i
try to send anything more then 50MB of data via dd. I seem to loose the ARP
entries for the compute nodes on the IO servers. This seems to happen
whether I use the filesystem or a netperf run from the compute node to the
I can run netperf between the compute node and io node and get full IPoIB
line rate with no issues
I can run netperf between the io node and the panasas storage and get full
10G ethernet line rate with no issues
When looking at the TCP traces, i can clearly see that a big chunk of data
is sent between the end-points and then it stalls. Immediately after the
stall is an ARP request and then another chunk of data, and this scenario
repeats over and over.
Any thoughts or questions?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf