Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

problems with etherchannel and NatSemi DP83815 cards

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Anders Lennartsson anders.lennartsson at foi.se
Tue Mar 6 07:53:49 PST 2001


Hi

BACKGROUND:

I'm setting up a Debian GNU/Linux based cluster, currently with 4 nodes,
each a PPro 200 :( but there may be more/other stuff coming :).
Considering the costs, we settled for Netgear 311 ethernet cards, for
which there is support in 2.4.x kernels. Patches are available for
kernels 2.2.x,
but since 2.4 is here... 
I have checked and the driver is a slightly modified version derived
from natsemi.c
available on www.scyld.com. There are some additions in the later not
included in the
one provided in the kernel source though.

Initially I put one card in each machine and verified that everything
worked.
I tested with NTtcp (netperf derivative?) and the the throughput
asymptotically
went up to about 90Mbits per second when two cards were connected
through a 100Mbps
switch (where are the last 10?).

Then I set out for etherchannel bonding.
It was a bit tricky to find a working ifenslave.c,
the one on www.beowulf.org seemed old and I found a newer at
pdsf.nersc.gov/linux/
Then it seemed to work after doing:

ifconfig bond0 192.168.1.x netmask 255.255.255.0 up
./ifenslave bond0 eth0
(bond0 gets the MAC adress from eth0)
./ifenslave bond0 eth1 

When testing the setup by ftping a large file between two nodes
messages of the following type was output repeatedly on the console:

ethX ... Something wicked happened! 0YYY
where X was 0 or 1 and YYY was one of 500, 700, 740, 749, 749, see
below.

Same thing happened when running NPtcp as package size came above a few
kbytes, speeds approx 50MBits per second.

QUESTIONS:

Anyone got ideas as to the nature/solution of this problem?
I suppose the PCI interface on these particular motherboards may play a
significant
role. Maybe the driver itself? Or is just the processor too slow?

Does anyone have experience of this with for instance 3c905?
Otherwise a very stable card IMHO.
It is about three times more expensive which isn't that much for
one or two, although I could imagine substantial savings
for a large cluster. But if my hours are included ...

Regards,
Anders

SOME DETAILED INFO:

>From syslog, kernel identifying network cards: (eth2 is for accessing from
outside the dedicated networks)

Mar  1 21:30:53 beo101 kernel:  
http://www.scyld.com/network/natsemi.html
Mar  1 21:30:53 beo101 kernel:   (unofficial 2.4.x kernel port, version
1.0.3, January 21, 2001 Jeff Garzik, Tjeerd Mulder)
Mar  1 21:30:53 beo101 kernel: eth0: NatSemi DP83815 at 0xc4800000,
00:02:e3:03:da:87, IRQ 12.
Mar  1 21:30:53 beo101 kernel: eth0: Transceiver status 0x7869
advertising 05e1.
Mar  1 21:30:53 beo101 kernel: eth1: NatSemi DP83815 at 0xc4802000,
00:02:e3:03:de:43, IRQ 10.
Mar  1 21:30:53 beo101 kernel: eth1: Transceiver status 0x7869
advertising 05e1.
Mar  1 21:30:53 beo101 kernel: eth2: NatSemi DP83815 at 0xc4804000,
00:02:e3:03:dc:2c, IRQ 11.
Mar  1 21:30:53 beo101 kernel: eth2: Transceiver status 0x7869
advertising 05e1.

some lines of the wicked message: (above those are the two lines where
eth0 and eth1 are reported when ifenslave is run)

Mar  1 21:30:56 beo101 /usr/sbin/cron[189]: (CRON) STARTUP (fork ok)
Mar  1 21:35:26 beo101 kernel: eth0: Setting full-duplex based on
negotiated link capability.
Mar  1 21:35:32 beo101 ntpd[182]: time reset -0.474569 s
Mar  1 21:35:32 beo101 ntpd[182]: kernel pll status change 41
Mar  1 21:35:32 beo101 ntpd[182]: synchronisation lost
Mar  1 21:35:37 beo101 kernel: eth1: Setting full-duplex based on
negotiated link capability.
Mar  1 21:38:01 beo101 /USR/SBIN/CRON[211]: (mail) CMD (  if [ -x
/usr/sbin/exim -a -f /etc/exim.conf ]; then /usr/sbin/exim -q >/dev/null
2>&1; fi)
Mar  1 21:39:49 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:04 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:08 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:08 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:12 beo101 last message repeated 2 times
Mar  1 21:40:12 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:13 beo101 last message repeated 2 times
Mar  1 21:40:15 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:16 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:18 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:19 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:19 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:21 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 last message repeated 3 times
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0500.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0740.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700.
Mar  1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0500.
Mar  1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0500.

The result of ifconfig:

bond0     Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1834429 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:986886789 (941.1 Mb)

eth0      Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:907798 errors:0 dropped:0 overruns:0 frame:0
          TX packets:915439 errors:1776 dropped:0 overruns:1776
carrier:1776
          collisions:0 txqueuelen:100 
          RX bytes:435552233 (415.3 Mb)  TX bytes:491795214 (469.0 Mb)
          Interrupt:12 

eth1      Link encap:Ethernet  HWaddr 00:02:E3:03:DA:87  
          inet addr:192.168.1.101  Bcast:192.168.1.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:907768 errors:0 dropped:0 overruns:0 frame:0
          TX packets:915466 errors:1748 dropped:0 overruns:1748
carrier:1748
          collisions:0 txqueuelen:100 
          RX bytes:434992308 (414.8 Mb)  TX bytes:489766183 (467.0 Mb)
          Interrupt:10 Base address:0x2000 

eth2      Link encap:Ethernet  HWaddr 00:02:E3:03:DC:2C  
          inet addr:150.227.64.210  Bcast:150.227.64.255 
Mask:255.255.255.0
          UP BROADCAST RUNNING  MTU:1500  Metric:1
          RX packets:13122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1182 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:1032660 (1008.4 Kb)  TX bytes:943713 (921.5 Kb)
          Interrupt:11 Base address:0x4000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3904  Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:552 (552.0 b)  TX bytes:552 (552.0 b)





More information about the Beowulf mailing list