Channel bonding: working combinations ?

Pfenniger Daniel daniel.pfenniger at obs.unige.ch
Sun Jan 21 23:58:55 PST 2001


Hi!

I am trying to install channel bonding on our cluster, but I meet a 
few problems that may interest people on the list. 

Linux kernel: 2.2.18 or 2.4.0, compiled with gcc 2.95.2, (RedHat 6.2)
Motherboard: ASUS P2B-D (BX chipset)
Procs: Pentium II 400 dual
Ethernet cards: with the tulip chips DS21140 and DS21143. They work well 
   when not bonded.
Switches: 2 Foundry FastIron II 
Drivers: tulip.o, or old_tulip.o as modules supplied with the official kernel
Documentation: in /usr/src/linux-2.2.18/Documentation/networking/bonding.txt 
               (BTW this file is not provided in kernel 2.4.0)

I have strictly followed the indications in bonding.txt
Every card has a distinct IRQ. 

The first problem is that ifconfig bond0 does not find any hardware
or IP address at boot or interactively (they are zero). 
I can persuade an hw address by giving it manually:

   ifconfig bond0 192.168.2.64 hw ether 00:40:05:A1:D9:09 up

Here I don't know how to automatically force the hw address in the 
ifcfg-bond0 file.

Incidentally there are a few different versions of ifenslave.c on the net 
with the same version number  (v0.07 9/9/97  Donald Becker
(becker at cesdis.gsfc.nasa.gov)).  
I have taken the version included with the bonding-0.2.tar.gz tarball.

By manually starting channel bonding I get (eth0 is assigned to another
network): 

bond0     Link encap:Ethernet  HWaddr 00:40:05:A1:D9:09  
          inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:108 errors:38 dropped:0 overruns:0 frame:0
          TX packets:6 errors:5 dropped:0 overruns:0 carrier:15
          collisions:0 txqueuelen:0 

eth1      Link encap:Ethernet  HWaddr 00:40:05:A1:D9:09  
          inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:108 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          Interrupt:18 Base address:0xb800 

eth2      Link encap:Ethernet  HWaddr 00:40:05:A1:D9:09  
          inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:38 dropped:0 overruns:0 frame:0
          TX packets:0 errors:5 dropped:0 overruns:0 carrier:15
          collisions:0 txqueuelen:100 
          Interrupt:17 Base address:0xb400 

Then a ping to another such bonded node may produce different things: 
- a complete freeze, reset required.
- ping waits, ctrl-c stops it.
- ping works, with almost double speed

When ping works netperf -H node may either be almost twice as fast (175 Mb/s) 
as single channel communications (94 Mb/s), or much slower (10, 25 Mb/s), 
despite ping indicating improved communication time.

In conclusion channel bonding with such a configuration appears unreliable. 

Since several messages have been posted on this list stating problems, 
as well as on the tulip list about tulip drivers, with the present channel 
bonding capability of the Linux kernel, it could be useful if people with 
working combinations of kernel (is 2.2.17 better), NIC/driver (which tulip
version), etc, could share their detailed working specs.  
I am sure this would be much appreciated by those wanting to bond their Beowulf. 

        Daniel Pfenniger





More information about the Beowulf mailing list