3x100Mbps bonding question

Laurent Itti itti at cco.caltech.edu
Wed Oct 4 00:31:06 PDT 2000


Hi -

about a month ago I asked for some design tips on specifying a small
16-node cluster with 3x100Mbps network.  It's here!  Thanks again to all
of you who helped!

So far, installation is going well except for some trouble with the ether
channel bonding.

- 3 ether cards per box, all RTL8139

- all eth0 go to 1 switch, all eth1 to another switch, all eth2 to third
switch. No connection between switches.

- somehow, even if I disable serial, parallel, sound, etc in the bios, it
insists on sharing IRQs among several devices, leaving plenty of low
IRQs free (e.g., put 2x ether on IRQ 11 and IDE+ether+VGA on IRQ 15 but
nothing on 3, 5, 7, etc).  It works fine, but naively I would think that
trying to assign 1 IRQ/nic would give better performance? is that
true? any tip on how to achieve that? (Abit SE6 i815 motherboards with
award bios).

- configuring the 3 NICs on separate subnets works great (all eth0 on
192.168.0.x, all eth1 on 192.168.1.x, all eth2 on 192.168.2.x).  However,
the machines get ultra sluggish when doing massive simultaneous transfers
on all 3 NICs.  I guess that's probably related to the window size
that is too small, and we get flooded with too many IRQs?  Any tip on how
to change that would be greatly appreciated!

- bonding so far not working ;-( I create an ifcfg-bond0 script with the
IP and related info; then (I am using Mandrake 7.2beta3) just configure
the ifcfg-ethX with SLAVE=yes, MASTER=bond0 (plus other stuff as per
documentation in the kernel tree), and add aliases in /etc/modules.conf,
and there we go.  It all seems to enslave and bond fine.  ifconfig gives
expected results (all have same IP & MAC addresses, etc).

the only thing that could seem strange is that I have 4 routes for my
local subnet, with devices bond0, eth0, eth1, eth2 (in that order).
"route del -net 192.168.0.0 dev eth0" did not work, so I assumed the
duplicate routes were ok?  I have no default route and no gateway in any
of the routes (so far).

pinging another bonded machine, I can see the LEDs on my switches flashing
in sequence (1 packet on eth0, then one on eth1, etc).  That looks like
the round-robin distribution of packets that I read about. Only problem is
that only eth0 replies (doing tcpdump on bond0 while also looking at the
switches: only when the switch connected to eth0 blinks do I get a reply).  
The other traffic I see are ARP requests, all for the eth0 MAC addresses,
but never for the other ones.

did I forget anything obvious? maybe put all the MAC addresses in
/etc/ethers?  or is there some kernel feature that could be compiled in
the stock kernel and would cause that?  do I need to do any configuration
on the switches? in my naive view (officially, I am a neuroscientist, so
please forgive me), if I send out on eth1 a packet for a bond0 mac address
(equal to the corresponding eth0 mac address), I would not expect
switch1 to know that in fact that packet should go to the eth1 mac address
associated with the bond0 device I am trying to send to?  (yet the LEDs do
blink by pairs - source and destination ports, so I must just be very
confused on that issue).

any suggestion/comment appreciated, and I'll keep experimenting!

thanks!

  -- laurent






More information about the Beowulf mailing list