ethernet channel bonding questions..

Martin Siegert siegert at sfu.ca
Mon Dec 11 17:43:41 PST 2000


On Mon, 11 Dec 2000, kevin james flasch wrote:

> I've been attempting to get channel bonding working on two linux boxes
> for possible use in a large beowulf cluster. However, I'm not seeing any
> increase in network performance/bandwidth at all. My primary source of
> guidance has been these two pages:
> http://www.beowulf-underground.org/doc_project/BIAA-HOWTO/Beowulf-Installation-and-Administration-HOWTO-12.html 
> http://www.beowulf.org/software/bonding.html
> 
> The boxes are both running RedHat 6.2, kernel 2.2.14-5.0. I've tried
> compiling bonding into the kernel and using it as a module with the same
> results. They each have two 100Mbps LinkSys (tulip) cards which
> are recognized by the kernel - eth1 is ifenslave'd to eth0 on each
> machine. The boxes are connected via two switches (one for each
> channel). They both seem to be transmitting packets (observable due to
> flickering on the switches, and using tcpdump). I compared perfomance
> using locally written tcp testing software that sends TCP packets of
> varying lengths and by timing rcp.  The problem is that there is no
> improvement in performance using channel bonding, comared to just using
> a single ethernet channel between the machines.
> 
> I used tcpdump to analyze what was going on on each channel.  Each machine
> seems to be transmitting packets down each both ethernet channels (hence
> the lights) but one channel seems to be sending half the amount of packets
> that are sent by the other channel. Actual analysis of tcpdump seems to
> show the packets on one channel do not include the data that
> needs to be transfered.  [Shown below are a few representative lines
> from tcpdump.]
<snip>

My tests of ethernet cards and drivers including pointers some pointers
for channel bonding can now be found on the web: 
http://www.sfu.ca/~siegert/nic-test.html

Here are a few additional hints:
1. I had a lot of problems with Linksys cards (without channel bonding):
   Autonegotiation simply did not work.
   Even two cards are labeled LNE100Tx they not necessarily use the same
   chipset. You may want to try just two cards (without channel bonding)
   first and make sure that you get full-duplex 100BaseT performance in
   the first place. If the tulip-diag program (http://www.scyld.com/diag),
   gives you messages like "autonegotiation not completed", you probably
   have a similar problem. You may want to try putting something like
   "options tulip full_duplex=1" into your /etc/conf.modules file.
2. Upgrade your kernel to at least version 2.2.16
   (honestly: it amazes me how anybody can use a 2.2 kernel with version
   < 2.2.16; all of these have working root exploits that can be downloaded
   from the internet). Use the bonding.o from the 2.2.17 kernel.
   The bonding.o from the 2.2.16 kernel leads to kernel oopses.
3. Make sure that /sbin/ifconfig reports that the Ethernet HWaddr of bond0,
   eth0, and eth1 are all the same.

Anyway, since we are discussing channel bonding with tulip cards:
Has anybody tried channel bonding using the D-Link DFE-570TX 4-port ethernet
card?

Cheers,
Martin

========================================================================
Martin Siegert
Academic Computing Services                        phone: (604) 291-4691
Simon Fraser University                            fax:   (604) 291-4242
Burnaby, British Columbia                          email: siegert at sfu.ca
Canada  V5A 1S6
========================================================================




More information about the Beowulf mailing list