2.2.18 with updated bonding patch acting wierd
chris at ambigc.com
Tue Mar 13 09:07:27 PST 2001
I have 2 computers with 3 3com 3c905s on each Abit KA7 (Athlon) motherboard.
One of the three cards eth0 on each node is the Interoffice connection for
direct access to the node (to use the node as a regular Linux system). The
other two are bonded on a cheap vlan'ed switch for cluster communication.
The network cards are not sharing interrupts, though they are with other
components. I have also tried it with the bonded cards sharing an irq and
got the same results.
The 2.2.18 kernel is patched with devfs, mosix, the tcp-patch-for-2.2.17-14
tcp_nodelay type patch, and the latest 2.2.18 patch for bonding.
When I build the bonding and ethernet into the kernel I get a network result
similar to http://www.beowulf.org/pipermail/beowulf/2000-October/010325.html
in that the connections seem to only accidentally see each other. The
switch has a large amount of activity.
Now what is really interesting is that I then proceeded to put all 4 bonded
Ethernets (2 per 2 computers) on to the same vlan. Presto I have connections
with 125Mbps TCP and 175Mbps UDP according to netperf.
Now my questions:
Why are they even working on the same lan? Are they falling into the mode 1
i.e.. backup and not round robin?
Why won't they work separated on by vlans?
Why is my TCP so crappy? -- the interoffice connections ran through the same
switch gives 95Mbps.
Why does the if* tools hang and the network fail to connect (though ifconfig
successfully set up the network earlier) when I make bonding and 3c59x
modules and not monolithic?
Thank you for any insights,
More information about the Beowulf