ethernet channel bonding questions..

kevin james flasch kflasch at csd.uwm.edu
Mon Dec 11 13:37:29 PST 2000


Hello. 

I've been attempting to get channel bonding working on two linux boxes
for possible use in a large beowulf cluster. However, I'm not seeing any
increase in network performance/bandwidth at all. My primary source of
guidance has been these two pages:
http://www.beowulf-underground.org/doc_project/BIAA-HOWTO/Beowulf-Installation-and-Administration-HOWTO-12.html 
http://www.beowulf.org/software/bonding.html

The boxes are both running RedHat 6.2, kernel 2.2.14-5.0. I've tried
compiling bonding into the kernel and using it as a module with the same
results. They each have two 100Mbps LinkSys (tulip) cards which
are recognized by the kernel - eth1 is ifenslave'd to eth0 on each
machine. The boxes are connected via two switches (one for each
channel). They both seem to be transmitting packets (observable due to
flickering on the switches, and using tcpdump). I compared perfomance
using locally written tcp testing software that sends TCP packets of
varying lengths and by timing rcp.  The problem is that there is no
improvement in performance using channel bonding, comared to just using
a single ethernet channel between the machines.

I used tcpdump to analyze what was going on on each channel.  Each machine
seems to be transmitting packets down each both ethernet channels (hence
the lights) but one channel seems to be sending half the amount of packets
that are sent by the other channel. Actual analysis of tcpdump seems to
show the packets on one channel do not include the data that
needs to be transfered.  [Shown below are a few representative lines
from tcpdump.]

Could someone please point out what I'm doing wrong, or point me in the
direction of some useful documentation?

Thanks!

Kevin Flasch


[examples of tcpdump output: ]

channel 0 (eth0):
-------------------
15:20:50.957476 > test2.subnet.1043 > test1.subnet.2000: P 2087937:2088961(1024) ack 1 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.957516 > test2.subnet.1043 > test1.subnet.2000: P 2088961:2089985(1024) ack 1 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.958033 > test2.subnet.1043 > test1.subnet.2000: P 2089985:2091009(1024) ack 1 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.958062 > test2.subnet.1043 > test1.subnet.2000: P 2091009:2092033(1024) ack 1 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.958099 > test2.subnet.1043 > test1.subnet.2000: P 2092033:2093057(1024) ack 1 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.958395 > test2.subnet.1043 > test1.subnet.2000: P 2093057:2094081(1024) ack 1 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.958412 > test2.subnet.1043 > test1.subnet.2000: P 2094081:2095105(1024) ack 1 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.958427 > test2.subnet.1043 > test1.subnet.2000: P 2095105:2096129(1024) ack 1 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.958885 > test2.subnet.1043 > test1.subnet.2000: . 2096129:2096129(0) ack 2 win 32120 <nop,nop,timestamp 130800 134815> (DF)
15:20:50.971195 > test2.subnet.1043 > test1.subnet.2000: F 2096129:2096129(0) ack 2 win 32120 <nop,nop,timestamp 130801 134815> (DF)

channel 1 (eth1):
-------------------
15:20:50.957233 < test1.subnet.2000 > test2.subnet.1043: . 1:1(0) ack 2080770 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.957491 < test1.subnet.2000 > test2.subnet.1043: . 1:1(0) ack 2082818 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.957726 < test1.subnet.2000 > test2.subnet.1043: . 1:1(0) ack 2084866 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.957904 < test1.subnet.2000 > test2.subnet.1043: . 1:1(0) ack 2086914 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.958083 < test1.subnet.2000 > test2.subnet.1043: . 1:1(0) ack 2088962 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.958278 < test1.subnet.2000 > test2.subnet.1043: . 1:1(0) ack 2091010 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.958459 < test1.subnet.2000 > test2.subnet.1043: . 1:1(0) ack 2093058 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.958720 < test1.subnet.2000 > test2.subnet.1043: . 1:1(0) ack 2095106 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.958855 < test1.subnet.2000 > test2.subnet.1043: F 1:1(0) ack 2096130 win 31856 <nop,nop,timestamp 134815 130800> (DF)
15:20:50.971308 < test1.subnet.2000 > test2.subnet.1043: . 2:2(0) ack 2096131 win 31856 <nop,nop,timestamp 134816 130801> (DF)


--
http://www.uwm.edu/~kflasch/kflaschpubkey.gpg









More information about the Beowulf mailing list