problems with etherchannel and NatSemi DP83815 cards
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Anders Lennartsson anders.lennartsson at foi.seWed Mar 7 03:56:03 PST 2001
- Previous message: Scyld/random reboots
- Next message: Power-managment of slave nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi BACKGROUND: I'm setting up a Debian GNU/Linux based cluster, currently with 4 nodes, each a PPro 200 :( but there may be more/other stuff coming :). Considering the costs, we settled for Netgear 311 ethernet cards, for which there is support in 2.4.x kernels. Patches are available for kernels 2.2.x, but since 2.4 is here... I have checked and the driver is a slightly modified version derived from natsemi.c available on www.scyld.com. There are some additions in the later not included in the one provided in the kernel source though. Initially I put one card in each machine and verified that everything worked. I tested with NTtcp (netperf derivative?) and the the throughput asymptotically went up to about 90Mbits per second when two cards were connected through a 100Mbps switch (where are the last 10?). Then I set out for etherchannel bonding. It was a bit tricky to find a working ifenslave.c, the one on www.beowulf.org seemed old and I found a newer at pdsf.nersc.gov/linux/ Then it seemed to work after doing: ifconfig bond0 192.168.1.x netmask 255.255.255.0 up ./ifenslave bond0 eth0 (bond0 gets the MAC adress from eth0) ./ifenslave bond0 eth1 When testing the setup by ftping a large file between two nodes messages of the following type was output repeatedly on the console: ethX ... Something wicked happened! 0YYY where X was 0 or 1 and YYY was one of 500, 700, 740, 749, 749, see below. Same thing happened when running NPtcp as package size came above a few kbytes, speeds approx 50MBits per second. QUESTIONS: Anyone got ideas as to the nature/solution of this problem? I suppose the PCI interface on these particular motherboards may play a significant role. Maybe the driver itself? Or is just the processor too slow? Does anyone have experience of this with for instance 3c905? Otherwise a very stable card IMHO. It is about three times more expensive which isn't that much for one or two, although I could imagine substantial savings for a large cluster. But if my hours are included ... Regards, Anders SOME DETAILED INFO: >From syslog, kernel identifying network cards: (eth2 is for accessing from outside the dedicated networks) Mar 1 21:30:53 beo101 kernel: http://www.scyld.com/network/natsemi.html Mar 1 21:30:53 beo101 kernel: (unofficial 2.4.x kernel port, version 1.0.3, January 21, 2001 Jeff Garzik, Tjeerd Mulder) Mar 1 21:30:53 beo101 kernel: eth0: NatSemi DP83815 at 0xc4800000, 00:02:e3:03:da:87, IRQ 12. Mar 1 21:30:53 beo101 kernel: eth0: Transceiver status 0x7869 advertising 05e1. Mar 1 21:30:53 beo101 kernel: eth1: NatSemi DP83815 at 0xc4802000, 00:02:e3:03:de:43, IRQ 10. Mar 1 21:30:53 beo101 kernel: eth1: Transceiver status 0x7869 advertising 05e1. Mar 1 21:30:53 beo101 kernel: eth2: NatSemi DP83815 at 0xc4804000, 00:02:e3:03:dc:2c, IRQ 11. Mar 1 21:30:53 beo101 kernel: eth2: Transceiver status 0x7869 advertising 05e1. some lines of the wicked message: (above those are the two lines where eth0 and eth1 are reported when ifenslave is run) Mar 1 21:30:56 beo101 /usr/sbin/cron[189]: (CRON) STARTUP (fork ok) Mar 1 21:35:26 beo101 kernel: eth0: Setting full-duplex based on negotiated link capability. Mar 1 21:35:32 beo101 ntpd[182]: time reset -0.474569 s Mar 1 21:35:32 beo101 ntpd[182]: kernel pll status change 41 Mar 1 21:35:32 beo101 ntpd[182]: synchronisation lost Mar 1 21:35:37 beo101 kernel: eth1: Setting full-duplex based on negotiated link capability. Mar 1 21:38:01 beo101 /USR/SBIN/CRON[211]: (mail) CMD ( if [ -x /usr/sbin/exim -a -f /etc/exim.conf ]; then /usr/sbin/exim -q >/dev/null 2>&1; fi) Mar 1 21:39:49 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:04 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:08 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:08 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:12 beo101 last message repeated 2 times Mar 1 21:40:12 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:13 beo101 last message repeated 2 times Mar 1 21:40:15 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:16 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:18 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:19 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:19 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:21 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:22 beo101 last message repeated 3 times Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0500. Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740. Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740. Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0700. Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0740. Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0740. Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0500. Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0500. The result of ifconfig: bond0 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87 inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:1834429 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:986886789 (941.1 Mb) eth0 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87 inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:907798 errors:0 dropped:0 overruns:0 frame:0 TX packets:915439 errors:1776 dropped:0 overruns:1776 carrier:1776 collisions:0 txqueuelen:100 RX bytes:435552233 (415.3 Mb) TX bytes:491795214 (469.0 Mb) Interrupt:12 eth1 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87 inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:907768 errors:0 dropped:0 overruns:0 frame:0 TX packets:915466 errors:1748 dropped:0 overruns:1748 carrier:1748 collisions:0 txqueuelen:100 RX bytes:434992308 (414.8 Mb) TX bytes:489766183 (467.0 Mb) Interrupt:10 Base address:0x2000 eth2 Link encap:Ethernet HWaddr 00:02:E3:03:DC:2C inet addr:150.227.64.210 Bcast:150.227.64.255 Mask:255.255.255.0 UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:13122 errors:0 dropped:0 overruns:0 frame:0 TX packets:1182 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:1032660 (1008.4 Kb) TX bytes:943713 (921.5 Kb) Interrupt:11 Base address:0x4000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:3904 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:552 (552.0 b) TX bytes:552 (552.0 b)
- Previous message: Scyld/random reboots
- Next message: Power-managment of slave nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
