From Paul_Walker@dell.com Fri, 30 Jun 2000 21:32:14 GMT Date: Fri, 30 Jun 2000 21:32:14 GMT From: Paul Walker Paul_Walker@dell.com Subject: [eepro100] Transmitter Timeout --------------=_4D4800E9C938450574C8 Content-Description: filename="text1.txt" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have just finished reading the archives re: what appears to be a=20 rather frustrating issue (Transmitter Timeout). The fact that I was=20 reading the archive should be clue enough it has raised its head here=20 as well. I wanted to pass on some info to the list and see if it helps=20 any of you working the issue. We have two 4 node linux clusters built from Dell power edge=20 2300/2400's, dual 500mhz, PERCII/SC raid, 1GB ram, and 2 82557's in=20 every box. The first cluster in the US has NEVER seen the timeout=20 problem and has been operational for over a year now. However, our =20 most recent deployment of an identical cluster in Asia is seeing it on=20 a regular basis. All systems currently have an in-house compiled=20 2.2.14 smp kernel and are using eepro100.c v1.06.as a loadable=20 module. These have been diff'ed many times to verify all the is same=20 everywhere. The two main differences I have identified are: First,the working cluster talks to Cisco gear while the other talks to=20 3com gear. To get everything working properly in the states (cisco=20 gear) we are disabling auto-negotiation and forcing 100mbit-FD=20 (options=3D0x30,0x30 in conf.modules) We are doing the same in Asia, but= =20 this does not appear to be helping. Second, the hardware in the states is slightly older, Dell 2300's=20 (32-bit PCI) backplane, while the hardware in Asia is the newer.Dell=20 2400's which have both 32bit and 64bit PCI slots. The Intel 82557's=20 are in the 32bit slots.=20 The interesting thing is eth0 (which goes to a 3com switch and then=20 into the core) has never had the problem in Asia. While eth1 that goes=20 directory to the core and is configured as a private vlan for=20 inter-box communication is seeing the problem (Note: I am completely=20 familiar with the details of this configuration, I am repeating what=20 the networking guys have said). My biggest problem is I have not been able to find a sufficient=20 workaround. Ifup/down does basically nothing. The TX error counters=20 continue to show the same error count after the interface is=20 re-enabled. Also, I cant very easily rmmod since that would require me=20 to down both interfaces under script contol, this makes me slightly=20 nervous since the console is about 7000 miles away from here. If anyone has any suggestions as to what I should try, what additional=20 information might be helpful, etc, it would be most appreciated. I am=20 supposed to turn this on live in a week. Considering the private vlan=20 (eth1) is the core of the inter-box communication (see=20 http://www.linuxvirtualserver.org ) and nfs mounting, I am pretty=20 much screwed if this can not be made to work like things here in the=20 US. Thanks in advance and I apologize for the excessive length but I=20 wanted to cover as much as possible in one place. Thanks again.=20 Paul Walker =20 --------------=_4D4800E9C938450574C8 Content-Description: filename="text1.html" Content-Type: text/html Content-Transfer-Encoding: quoted-printable Transmitter Timeout

I have just finished reading the archives re: what appears to be a rather frustrating issue (Transmitter Timeout). The fact that I was reading the archive should be clue enough it has raised its head here as well. I wanted to pass on some info to the list and see if it helps any of you working the issue.

We have two 4 node linux clusters built from Dell power edge 2300/2400's, dual 500mhz, PERCII/SC raid, 1GB ram, and 2 82557's in every box. The first cluster in the US has NEVER seen the timeout problem and has been operational for over a year now. However, our=20 most recent deployment of an identical cluster in Asia is seeing it on a regular basis. All systems currently have an in-house compiled 2.2.14 smp kernel and are using eepro100.c v1.06.as a loadable module. These have been diff'ed many times to verify all the is same everywhere.

The two main differences I have identified are:

First,the working cluster talks to Cisco gear while the other talks to 3com gear. To get everything working properly in the states (cisco gear) we are disabling auto-negotiation and forcing 100mbit-FD (options=3D0x30,0x30 in conf.modules) We are doing the same in Asia, but= this does not appear to be helping.

Second, the hardware in the states is slightly older, Dell 2300's (32-bit PCI) backplane, while the hardware in Asia is the newer.Dell 2400's which have both 32bit and 64bit PCI slots. The Intel 82557's are in the 32bit slots.=20

The interesting thing is eth0 (which goes to a 3com switch and then into the core) has never had the problem in Asia. While eth1 that goes directory to the core and is configured as a private vlan for inter-box communication is seeing the problem (Note: I am completely familiar with the details of this configuration, I am repeating what the networking guys have said).

My biggest problem is I have not been able to find a sufficient workaround. Ifup/down does basically nothing. The TX error counters continue to show the same error count after the interface is re-enabled. Also, I cant very easily rmmod since that would require me to down both interfaces under script contol, this makes me slightly nervous since the console is about 7000 miles away from here.

If anyone has any suggestions as to what I should try, what additional information might be helpful, etc, it would be most appreciated. I am supposed to turn this on live in a week. Considering the private vlan (eth1) is the core of the inter-box communication (see http://www.linuxvirtualse= rver.org ) and nfs mounting, I am pretty much screwed if this can not be made to work like things here in the US.



Thanks in advance and I apologize for the excessive length but I wanted to cover as much as possible in one place.

Thanks again.=20

Paul Walker =20





--------------=_4D4800E9C938450574C8-- From j_h_wang@yahoo.com Fri, 30 Jun 2000 17:07:06 -0700 (PDT) Date: Fri, 30 Jun 2000 17:07:06 -0700 (PDT) From: Jiahua Wang j_h_wang@yahoo.com Subject: crash or hang linux 2.2.5-15Re: [eepro100] Home made 82559ER NIC --- Donald Becker wrote: > > The only tricky operation is changing the station > address while the > interface is UP and the chip is operating. Doing > that creates many > problems, and is so rare that few drivers support > it. (I claim that it's > never needed.) > > What driver version? What kind of crash? We found there is hardware problem of pci bridge ground pins. After add a de-couple capacitor, I can no longer to reproduce the problems. I would like to know if I can use low level packet interface (PF_PACKET) to send raw packet with a different source MAC address with eepro100 driver. Thanks, Jiahua __________________________________________________ Do You Yahoo!? Kick off your party with Yahoo! Invites. http://invites.yahoo.com/ From conan@linuxsecurity.co.kr Sun, 2 Jul 2000 23:40:08 +0900 Date: Sun, 2 Jul 2000 23:40:08 +0900 From: ??? conan@linuxsecurity.co.kr Subject: [eepro100] Re: urgent question about linux driver I'm very sorry for my somewhat wrong infomation from my previous mail. I've tested more options and got some different result. 2.2.14 got the similar result to 2.0.36 when I configured some options. bi-directional test also got the similar result to uni-directional test, too. (below is in 2.2.14 kernel:) The default eepro100 module reported "Too much work at interrupt..." and "card reports no resources" and could not send or receive any packets. After I reloaded the module with "max_interrupt_work=800 options=0x30,0x30 rx_copybreak=10 ", the driver worked well. But It still reported "card reports no resource" and stop to work sometimes. Now it seems to me that all problems will be solved if the driver could resolve the case when "card reports no resources". I still want to hear your opinion. Have you ever heard about the result about the test with some packet generator like smartbit2000 ? ----- Original Message ----- From: "???" To: "Donald Becker" Sent: Saturday, July 01, 2000 2:03 PM Subject: Re: urgent question about linux driver > Thank you for your quick reply. > I had already configured max_interrupt_work you pointed to see what happens. > I increased it as you told, and even had removed the max_interrupt_work check in the code, > but the result changed little. > > If you don'y mind, I want to hear your opinion. > For now, I'm not sure where the problem lies...the driver, the NIC hardware, linux networking code, or the motherboard possibly ? > The test was basically testing the performance of a linux machine as a router (linux should have 2 NIC). > The phenomenon is that when the packet generator increases the number of packets (upto 100M bps with various size of raw packets from 64byte to 1024byte), the number of packets routed by linux machine decrises almost to zero. > > One considerable point is that when packets are sent in uni-direction > (smartbit slot 1 --> linux eth0 --> linux eth1 --> smartbit slot 2) > the problem does not occur and most packets are routed well. > But when smartbit generates packets bi-directianal, > (smartbit slot 1 <--> linux eth0 <--> linux eth1 <--> smartbit slot 2) > problem begins to occur and almost all packets are dropped. > > Another point is that "Too much work at interrupt ..." only coms when the kernel is 2.0.36 > and does not come when the kernel is 2.2.14. > Interesting result is that when I tested uni-directional, 2.0.36 performed almost twice than 2.2.14. > when tested bi-directional, 2.0.36 says "Too much work at interrupt ..." and routes 0 packets > while 2.2.14 says nothing but routes almost 0 packets. > > Do you have any ideas ? > > ----- Original Message ----- > From: "Donald Becker" > To: "@LA>1G" > Sent: Saturday, July 01, 2000 12:45 AM > Subject: Re: urgent question about linux driver > > > > On Fri, 30 Jun 2000, @LA>1G wrote: > > > > > I want to get some infomations about the linux driver of eepro100 or 3c59x, > > > but the web pages at cesdis.gsfc.nasa.gov does not seem to available. > > > > http://www.scyld.com/network/eepro100.html > > http://www.scyld.com/network/vortex.html > > > > > By the way, I tested that driver (or a linux machine with that driver and > > > NIC) with hard stress using a packet generater named SmartBit2000. The > > > driver said, "Too much work at interrupt...." and could hardly receive or > > > send packets. Could you explain the meaning of the error message ? > > > > The driver limits the work it does at interrupt time in order to preserve > > real-time response. > > If you are using the driver with a configuration/machine where this causes a > > problem, you can reduce the impact or turn off this feature with the > > 'max_interrupt_work' driver parameter. > > > > insmod 3c59x max_interrupt_work=200 > > > > or put the following into /etc/conf.modules > > options 3c59x max_interrupt_work=200 > > > > Donald Becker becker@scyld.com > > Scyld Computing Corporation http://www.scyld.com > > 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations > > Annapolis MD 21403 > > > > > > > From d.mueller@elsoft.ch Mon, 03 Jul 2000 16:59:44 +0200 Date: Mon, 03 Jul 2000 16:59:44 +0200 From: David =?iso-8859-1?Q?M=FCller?= (ELSOFT AG) d.mueller@elsoft.ch Subject: [eepro100] New problem with eepro100 and StrongArm board Hi again After the upgrade to Linux 2.4.0-test2, the self test of the Intel 82559 based NIC (on a StrongArm eval board) seems to work quite good, but now i have a new problem. eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html eepro100.c: $Revision: 1.33 $ 2000/05/24 Modified by Andrey V. Savochkin and others Found Intel i82557 PCI Speedo, MMIO at 0x4080000, IRQ 22. eth0: Intel Corporation 82557 [Ethernet Pro 100], 00:D0:B7:4C:36:56, IRQ 22. Receiver lock-up bug exists -- enabling work-around. Board assembly 721383-008, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). Sending BOOTP requests....<7>eth0: Link status change. .<7>eth0: Media control tick, status e050. .<7>eth0: Media control tick, status e050. .<7>eth0: Media control tick, status e050. .<7>eth0: Media control tick, status e050. .<7>eth0: Media control tick, status e050. .<7>eth0: Media control tick, status f048. .<7>eth0: Media control tick, status f048. .<7>eth0: Media control tick, status f048. .<7>eth0: Media control tick, status f048. timed out! IP-Config: Auto-configuration of network failed. My BOOTP server gets the request and responses to it, but the reply seems to get lost. As the FR bit in the status register is set, i think that the reply packet was received, but not further processed. TIA Dave From conan@linuxsecurity.co.kr Mon, 3 Jul 2000 22:35:41 +0900 Date: Mon, 3 Jul 2000 22:35:41 +0900 From: ??? conan@linuxsecurity.co.kr Subject: [eepro100] Re: urgent question about linux driver The driver is a default one which is included in RedHat6.2. In my case, the error message comes when too much packets are bursted. It is all right that the driver cannot handle traffic which is above certain threshold, but the problem is that the driver got intermitted and cannot handle any traffic when the threshold is reached. Beside that... can I get more infomations about the meaning of the options I tried ? "max_interrupt_work=800 rx_copybreak=10 " What happens when 'max_interrupt_work' increases ? What's the meaning of 'rx_copybreak' ? Anyway I'm glad to hear that you are working on the solution. Best regards conan ----- Original Message ----- From: "Andrey Savochkin" To: "???" Cc: Sent: Monday, July 03, 2000 8:58 PM Subject: Re: urgent question about linux driver > On Sun, Jul 02, 2000 at 11:40:08PM +0900, ??? wrote: > > I'm very sorry for my somewhat wrong infomation from my previous mail. > > I've tested more options and got some different result. > > 2.2.14 got the similar result to 2.0.36 when I configured some options. > > bi-directional test also got the similar result to uni-directional test, too. > > > > (below is in 2.2.14 kernel:) > > The default eepro100 module reported "Too much work at interrupt..." and > > "card reports no resources" and could not send or receive any packets. > > After I reloaded the module with "max_interrupt_work=800 options=0x30,0x30 > > rx_copybreak=10 ", the driver worked well. > > But It still reported "card reports no resource" and stop to work sometimes. > > Judging from the message "card reports no resources", you use a driver newer > than in 2.2.14. In any case, "no resources" is a real problem, and I'm > working on the solution. > > Best regards > Andrey V. > Savochkin > From saw@saw.sw.com.sg Mon, 3 Jul 2000 19:58:18 +0800 Date: Mon, 3 Jul 2000 19:58:18 +0800 From: Andrey Savochkin saw@saw.sw.com.sg Subject: [eepro100] Re: urgent question about linux driver On Sun, Jul 02, 2000 at 11:40:08PM +0900, ??? wrote: > I'm very sorry for my somewhat wrong infomation from my previous mail. > I've tested more options and got some different result. > 2.2.14 got the similar result to 2.0.36 when I configured some options. > bi-directional test also got the similar result to uni-directional test, too. > > (below is in 2.2.14 kernel:) > The default eepro100 module reported "Too much work at interrupt..." and > "card reports no resources" and could not send or receive any packets. > After I reloaded the module with "max_interrupt_work=800 options=0x30,0x30 > rx_copybreak=10 ", the driver worked well. > But It still reported "card reports no resource" and stop to work sometimes. Judging from the message "card reports no resources", you use a driver newer than in 2.2.14. In any case, "no resources" is a real problem, and I'm working on the solution. Best regards Andrey V. Savochkin From saw@saw.sw.com.sg Tue, 4 Jul 2000 10:06:11 +0800 Date: Tue, 4 Jul 2000 10:06:11 +0800 From: Andrey Savochkin saw@saw.sw.com.sg Subject: [eepro100] Re: New problem with eepro100 and StrongArm board Hello, On Mon, Jul 03, 2000 at 04:59:44PM +0200, David Müller wrote: > After the upgrade to Linux 2.4.0-test2, the self test of the Intel 82559 > based NIC (on a StrongArm eval board) seems to work quite good, but now i > have a new problem. > > eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.33 $ 2000/05/24 Modified by Andrey V. Savochkin and others > Found Intel i82557 PCI Speedo, MMIO at 0x4080000, IRQ 22. > eth0: Intel Corporation 82557 [Ethernet Pro 100], 00:D0:B7:4C:36:56, IRQ 22. > Receiver lock-up bug exists -- enabling work-around. > Board assembly 721383-008, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > Sending BOOTP requests....<7>eth0: Link status change. > .<7>eth0: Media control tick, status e050. > .<7>eth0: Media control tick, status e050. > .<7>eth0: Media control tick, status e050. > .<7>eth0: Media control tick, status e050. > .<7>eth0: Media control tick, status e050. > .<7>eth0: Media control tick, status f048. > .<7>eth0: Media control tick, status f048. > .<7>eth0: Media control tick, status f048. > .<7>eth0: Media control tick, status f048. > timed out! > IP-Config: Auto-configuration of network failed. > > My BOOTP server gets the request and responses to it, but the > reply seems to get lost. > As the FR bit in the status register is set, i think that the > reply packet was received, but not further processed. Looks like it. So, either the card didn't generate interrupt, or the driver didn't see the RX buffer status updated by the card directly in the memory. Considering your architecture, I suspect the latter. Best regards Andrey V. Savochkin From dwolsten@extremezone.com Tue, 04 Jul 2000 22:56:46 -0700 Date: Tue, 04 Jul 2000 22:56:46 -0700 From: Daniel Wolstenholme dwolsten@extremezone.com Subject: [eepro100] Problems with Intel PRO/100 CardBus II Hi, I have an Intel PRO/100 CardBus II (32 bit) on my laptop which I've been trying to get to work with RedHat 6.2 on an IBM Thinkpad 600X. First I tried using the eepro100 kernel module which was included in RH6.2, and later I tried the newest eepro100 driver version from Scyld. Both had the same results. The driver, upon trying to load, gives this error message: eth0: Invalid EEPROM checksum 0x0000, check settings before activating this device! Self test failed, status ffffffff: Failure to initialize the i82557. Verify that the card is a bus-master capable slot. PCI: Increasing latency timer of device 02:00 to 64 Am I doing something wrong or is this card not supported by this driver? Thanks, Dan -- _____________________________________________________________________ Daniel Wolstenholme email: daniel@wolstenholme.net Acura Integra Modification Page http://dwolsten.tripod.com/ features: Honda articles, rear wiper controller, and many unique mods _____________________________________________________________________ From frmb2@ukc.ac.uk Wed, 05 Jul 2000 11:06:50 +0100 Date: Wed, 05 Jul 2000 11:06:50 +0100 From: Frederick Barnes frmb2@ukc.ac.uk Subject: [eepro100] Re: 82559 and receiver lock-up bug. Hi, > Has any one ever hit the receiver lock-up bug? yup, experiencing it quite a lot at the moment :( > A multicast command to the adapter is a workaround to this > bug. If you are doing a rcp of a very big file (> 1 GB file) > does this work around help? My driver locked up many times > during a large file transfer. > > Any input on what causes this bug to appear is greatly > appreciated. I've got a user-space driver talking to the eepro100 board (82557b, 82558 flavoured boards), with flow-control on (any combination of PHY based or transmit based flow-control). If the board runs out of receive descriptors, the flow-control packets tell the sender to stop sending, but when the boards gets some more receive descriptors back, it doesn't start receiving again as it should -- it just stops dead. :-(. Definitely undesirable. Is there any way to detect that it's about to lock-up, before it actually does ? In the usual fashion, the 82558 data-sheets aren't very helpful. I'd rather not go for a time-out option, as this will make the performance (throughput) suffer somewhat. Cheers, Fred -- From becker@scyld.com Wed, 5 Jul 2000 12:02:07 -0400 (EDT) Date: Wed, 5 Jul 2000 12:02:07 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Problems with Intel PRO/100 CardBus II On Tue, 4 Jul 2000, Daniel Wolstenholme wrote: > I have an Intel PRO/100 CardBus II (32 bit) on my laptop which I've been > trying to get to work with RedHat 6.2 on an IBM Thinkpad 600X. First I > tried using the eepro100 kernel module which was included in RH6.2, and > later I tried the newest eepro100 driver version from Scyld. Both had > the same results. The driver, upon trying to > load, gives this error message: > > eth0: Invalid EEPROM checksum 0x0000, check settings before activating > this device! > Self test failed, status ffffffff: > Failure to initialize the i82557. > Verify that the card is a bus-master capable slot. > PCI: Increasing latency timer of device 02:00 to 64 What were the other messags? This is a CardBus card, so there must have been a set of messages about the card detection and mapping. The problem is the mapping -- the card doesn't exist in I/O space. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From becker@scyld.com Wed, 5 Jul 2000 12:06:45 -0400 (EDT) Date: Wed, 5 Jul 2000 12:06:45 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Re: 82559 and receiver lock-up bug. On Wed, 5 Jul 2000, Frederick Barnes wrote: > > Has any one ever hit the receiver lock-up bug? > > yup, experiencing it quite a lot at the moment :( > > > A multicast command to the adapter is a workaround to this > > bug. If you are doing a rcp of a very big file (> 1 GB file) > > does this work around help? My driver locked up many times > > during a large file transfer. You are confusing two problems: the hardware bug, and the bug in the modified driver that causes the receiver to report "no resources". The hardware bug is the one that is restarted by the set-rx-mode command. It's very rarely triggered. The new driver bug cannot be cleared by CU (Command Unit) operations. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From linuxguy@houston.rr.com Wed, 05 Jul 2000 16:52:58 -0500 Date: Wed, 05 Jul 2000 16:52:58 -0500 From: John Cagle linuxguy@houston.rr.com Subject: [eepro100] mii-diag program - slight inaccuracy I've recently been using Donald's mii-diag program and I've come across something that's not quite accurate. With my NIC & Switch combination, the BMSR register has a "1" in the bit position for "Link Jabber" (bmsr & 0x0002). This causes the mii-diag program to print out "*** Link Jabber! ***". However, according to the National DP83840A specification, the Link Jabber bit only has meaning in 10 Mb/s mode. I'm running at 100 Mb/s, so this bit should be ignored. It's not a big deal, but I thought I would explain it to the list in case someone else comes across this error message. Regards, John Cagle Compaq ProLiant Linux Team From becker@scyld.com Wed, 5 Jul 2000 21:35:35 -0400 (EDT) Date: Wed, 5 Jul 2000 21:35:35 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] mii-diag program - slight inaccuracy On Wed, 5 Jul 2000, John Cagle wrote: > I've recently been using Donald's mii-diag program and I've come across > something that's not quite accurate. > > With my NIC & Switch combination, the BMSR register has a "1" in the bit > position for "Link Jabber" (bmsr & 0x0002). This causes the mii-diag > program to print out "*** Link Jabber! ***". > > However, according to the National DP83840A specification, the Link > Jabber bit only has meaning in 10 Mb/s mode. I'm running at 100 Mb/s, > so this bit should be ignored. Hmmm, curious. I've seen false indication of link jabber at 100baseTx before, but I don't believe that it adheres to the standard. Does anyone have a copy of the standard nearby to check? I happened to have two transceiver datasheets open when this message arrived. Both are from the same company, Lucent. The LU6612 doesn't specify when the bit is valid, implying that it always is. The LU3X31 datasheet says "During 10baseT operation..." The minimal standard does not provide a way to tell what speed the transceiver selects if autonegotiation fails and autosense takes over(1). So it's pretty much unreasonable for the bit to be set in 100baseTx mode unless there is a data jabber(2). (1) Many transceiver do report the speed in register 5, leaving the autonegotiation-complete bit unset. A few report the speed in register 0. (2) Unlike 10baseT, 100baseTx is constantly sending data symbols. But the transceiver does know when it's sending data vs. idle symbols. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From saw@saw.sw.com.sg Thu, 6 Jul 2000 10:19:40 +0800 Date: Thu, 6 Jul 2000 10:19:40 +0800 From: Andrey Savochkin saw@saw.sw.com.sg Subject: [eepro100] Re: 82559 and receiver lock-up bug. On Wed, Jul 05, 2000 at 12:06:45PM -0400, Donald Becker wrote: > You are confusing two problems: the hardware bug, and the bug in the > modified driver that causes the receiver to report "no resources". > > The hardware bug is the one that is restarted by the set-rx-mode command. > It's very rarely triggered. The new driver bug cannot be cleared by > CU (Command Unit) operations. Could you explain what the bug consists of? :-) Best regards Andrey V. Savochkin From dwolsten@extremezone.com Sun, 09 Jul 2000 00:04:11 -0700 Date: Sun, 09 Jul 2000 00:04:11 -0700 From: Daniel Wolstenholme dwolsten@extremezone.com Subject: [eepro100] Problems with Intel PRO/100 CardBus II Here's the messages I get in /var/log/messages when I insert the card: cardmgr[433]: initializing socket 0 cardmgr[433]: unsupported card in socket 0 kernel: cs: cb_alloc(bus 2): vendor 0x8086, device 0x1229 cardmgr[433]: product info: "INTEL(R)", "PRO/100 CARDBUS II", "MBLA3300", "1.00" cardmgr[433]: manfid: 0x0089, 0x0103 function: 6 (network) Is there anything else to look for? How do I verify the mapping? Thanks, Dan Donald Becker wrote: > > On Tue, 4 Jul 2000, Daniel Wolstenholme wrote: > > > I have an Intel PRO/100 CardBus II (32 bit) on my laptop which I've been > > trying to get to work with RedHat 6.2 on an IBM Thinkpad 600X. First I > > tried using the eepro100 kernel module which was included in RH6.2, and > > later I tried the newest eepro100 driver version from Scyld. Both had > > the same results. The driver, upon trying to > > load, gives this error message: > > > > eth0: Invalid EEPROM checksum 0x0000, check settings before activating > > this device! > > Self test failed, status ffffffff: > > Failure to initialize the i82557. > > Verify that the card is a bus-master capable slot. > > PCI: Increasing latency timer of device 02:00 to 64 > > What were the other messags? This is a CardBus card, so there must have > been a set of messages about the card detection and mapping. > > The problem is the mapping -- the card doesn't exist in I/O space. > > Donald Becker becker@scyld.com > Scyld Computing Corporation http://www.scyld.com > 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations > Annapolis MD 21403 -- _____________________________________________________________________ Daniel Wolstenholme email: daniel@wolstenholme.net Acura Integra Modification Page http://dwolsten.tripod.com/ features: Honda articles, rear wiper controller, and many unique mods _____________________________________________________________________ From becker@scyld.com Sun, 9 Jul 2000 12:10:17 -0400 (EDT) Date: Sun, 9 Jul 2000 12:10:17 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Problems with Intel PRO/100 CardBus II On Sun, 9 Jul 2000, Daniel Wolstenholme wrote: > Here's the messages I get in /var/log/messages when I insert the card: > > cardmgr[433]: initializing socket 0 > cardmgr[433]: unsupported card in socket 0 > kernel: cs: cb_alloc(bus 2): vendor 0x8086, device 0x1229 > cardmgr[433]: product info: "INTEL(R)", "PRO/100 CARDBUS II", > "MBLA3300", "1.00" > cardmgr[433]: manfid: 0x0089, 0x0103 function: 6 (network) The PCMCIA code didn't recognize the card. Did you use the directions at http://www.scyld.com/network/updates.html and add the following lines to the /etc/pcmcia/config.opts # The few Intel eepro100 designs. device "eepro100" class "network" module "cb_enabler", "pci-scan", "cb_shim", "eepro100" card "Intel Pro/100 CardBus II" manfid 0x0089, 0x0103 bind "eepro100" card "Intel Pro/100 LAN+Modem56 CardBus II" manfid 0x0089, 0x1103 bind "eepro100" You will need three kernel modules: pci-scan.o, Handles device detection, hot-swap and ACPI power control. cb_shim.o Translates between David Hinds' interface and pci-scan.o and eepro100.o The device driver. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From linuxguy@houston.rr.com Fri, 07 Jul 2000 13:31:39 -0500 Date: Fri, 07 Jul 2000 13:31:39 -0500 From: John Cagle linuxguy@houston.rr.com Subject: [eepro100] mii-diag program - slight inaccuracy Donald Becker wrote: > > On Wed, 5 Jul 2000, John Cagle wrote: > > > I've recently been using Donald's mii-diag program and I've come across > > something that's not quite accurate. > > > > With my NIC & Switch combination, the BMSR register has a "1" in the bit > > position for "Link Jabber" (bmsr & 0x0002). This causes the mii-diag > > program to print out "*** Link Jabber! ***". > > > > However, according to the National DP83840A specification, the Link > > Jabber bit only has meaning in 10 Mb/s mode. I'm running at 100 Mb/s, > > so this bit should be ignored. > > Hmmm, curious. I've seen false indication of link jabber at 100baseTx > before, but I don't believe that it adheres to the standard. Does anyone > have a copy of the standard nearby to check? Well I just dusted off the old IEEE 802.3u-1995 standard, and I found the following references to Jabber: __________________________ 22.2.4.2.12 Jabber Detect ... PHYs specified for 100 Mb/s operation (100BASE-X and 100BASE-T4) do not incorporate a Jabber Detect function, as this function is defined to be performed in the repeater unit in 100 Mb/s systems. Therefore, 100BASE-X [meaning 100BASE-TX and 100BASE-FX] and 100BASE-T4 PHYs shall always return a value of zero in bit 1.1. __________________________ 30.5.1.1.6 aJabber [consists of JabberFlag and JabberCounter] ... Note that this counter will not increment for a 100 Mb/s PHY, as there is no defined JABBER state.; __________________________ > I happened to have two transceiver datasheets open when this message > arrived. Both are from the same company, Lucent. The LU6612 doesn't > specify when the bit is valid, implying that it always is. The LU3X31 > datasheet says "During 10baseT operation..." > > The minimal standard does not provide a way to tell what speed the > transceiver selects if autonegotiation fails and autosense takes over(1). So > it's pretty much unreasonable for the bit to be set in 100baseTx mode unless > there is a data jabber(2). So it appears that while Jabber is meaningless for 100 Mb/s PHYs, that the DP83840 is in violation of the standard in that it sometimes (always?) returns 1 for the Jabber bit instead of 0. FYI, for the National PHY, you can read register 25's PMDSpeed bit to determine the current link speed. > > (1) Many transceiver do report the speed in register 5, leaving the > autonegotiation-complete bit unset. A few report the speed in register 0. > > (2) Unlike 10baseT, 100baseTx is constantly sending data symbols. But the > transceiver does know when it's sending data vs. idle symbols. > > Donald Becker becker@scyld.com > Scyld Computing Corporation http://www.scyld.com > 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations > Annapolis MD 21403 Regards, John Cagle Compaq Computer Corporation http://www.compaq.com/linux/ aka john.cagle@compaq.com From nraju@erols.com Wed, 12 Jul 2000 22:24:00 -0400 Date: Wed, 12 Jul 2000 22:24:00 -0400 From: Naga R Narayanaswamy nraju@erols.com Subject: [eepro100] Endian Question Hello Everyone, This is a general question on the 82559 chip. I want to know if one can change the endian operating mode in 82559 chip. For e.g., in 21143, through CSR0 (Bus Mode Register), descriptor byte ordering mode can be changed. When the bit DBO is set, 21143 operates in big endian byte ordering mode. Also by setting BLE the data buffer is affected. Is it possible to do the same in 82559. I want to be able to set the 82559 to operate in big endian format. My understanding is Intel chips are little endian and PCI bus operates in little endian. Thanks in advance! -Naga nraju@erols.com From becker@scyld.com Thu, 13 Jul 2000 01:13:37 -0400 (EDT) Date: Thu, 13 Jul 2000 01:13:37 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Endian Question On Wed, 12 Jul 2000, Naga R Narayanaswamy wrote: > This is a general question on the 82559 chip. > > I want to know if one can change the endian operating mode in 82559 > chip. No. This shouldn't be an issue, though. The recent versions of the driver have explicit macros that swap the descriptor fields for big-endian machines. The run-time overhead for explicit byte swap exists only for big-endian machines, and it's a single-cycle instruction on most machines. The drivers are written so that byte swapping happens at compile time wherever possible. The greater overhead is in uglier driver code, but it's not too bad. Especially not compared to e.g. Solaris drivers. > For e.g., in 21143, through CSR0 (Bus Mode Register), descriptor byte > ordering mode can be changed. When the bit DBO is set, 21143 operates > in big endian byte ordering mode. Also by setting BLE the data buffer > is affected. About a year ago I converted the Tulip driver to do explicit byte-swaps as well. The problem is that few work-alike chips support descriptor byte swapping, and neither does the 21040. Converting the driver was far easier than answering the "this will work, that will not" documentation, and the zillion emails that say "yes, the web page is correct". > Is it possible to do the same in 82559. I want to be able to set the > 82559 > to operate in big endian format. My understanding is > Intel chips are little endian and PCI bus operates in little endian. All Intel processor chips are little endian, since the beginning of time (Unix time, 1970 ;-> ). Intel helped defined the PCI specs, and PCI is unabashedly little endian. Even if it were trivial to put big-endian support in the i82559 (and it's not), Intel would not have done it. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From becker@scyld.com Thu, 13 Jul 2000 01:18:52 -0400 (EDT) Date: Thu, 13 Jul 2000 01:18:52 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] mii-diag program - slight inaccuracy On Fri, 7 Jul 2000, John Cagle wrote: > Donald Becker wrote: > > On Wed, 5 Jul 2000, John Cagle wrote: > > > With my NIC & Switch combination, the BMSR register has a "1" in the bit > > > position for "Link Jabber" (bmsr & 0x0002). This causes the mii-diag > > > program to print out "*** Link Jabber! ***". > > > > > > However, according to the National DP83840A specification, the Link > > > Jabber bit only has meaning in 10 Mb/s mode. I'm running at 100 Mb/s, > > > so this bit should be ignored. > > > > Hmmm, curious. I've seen false indication of link jabber at 100baseTx > > before, but I don't believe that it adheres to the standard. Does anyone > > have a copy of the standard nearby to check? > > Well I just dusted off the old IEEE 802.3u-1995 standard, and I found > the following references to Jabber: > 22.2.4.2.12 Jabber Detect > ... > PHYs specified for 100 Mb/s operation (100BASE-X and 100BASE-T4) do > not incorporate a Jabber Detect function, as this function is defined to > be performed in the repeater unit in 100 Mb/s systems. Therefore, > 100BASE-X [meaning 100BASE-TX and 100BASE-FX] and 100BASE-T4 PHYs shall > always return a value of zero in bit 1.1. ... > So it appears that while Jabber is meaningless for 100 Mb/s PHYs, that > the DP83840 is in violation of the standard in that it sometimes > (always?) returns 1 for the Jabber bit instead of 0. So much for the 83840 being the gold standard of MII implementations. > FYI, for the > National PHY, you can read register 25's PMDSpeed bit to determine the > current link speed. People should keep in mind that doing this "doesn't count". Reg 25 is a vendor specific register, and that information doesn't exist on other chips. There is no way, other than having a very large and always obsolete vendor exception list, of using that info in a driver. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From ljun@eastcom.com Thu, 13 Jul 2000 15:32:28 +0800 Date: Thu, 13 Jul 2000 15:32:28 +0800 From: Liu Jun ljun@eastcom.com Subject: [eepro100] Help me! This is a multi-part message in MIME format. ------=_NextPart_000_0007_01BFECDF.8D5519D0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: base64 V2hvIGNhbiBwcm92aWRlIG1lIHdpdGggdGhlIDgyNTU5IFNvZnR3YXJlIFVzZXIgTWFudWFsID8N CkkgY2FuJ3QgZmluZCBpdCENCg== ------=_NextPart_000_0007_01BFECDF.8D5519D0 Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: base64 PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMCBUcmFuc2l0aW9uYWwv L0VOIj4NCjxIVE1MPjxIRUFEPg0KPE1FVEEgY29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PWdi MjMxMiIgaHR0cC1lcXVpdj1Db250ZW50LVR5cGU+DQo8TUVUQSBjb250ZW50PSJNU0hUTUwgNS4w MC4yOTIwLjAiIG5hbWU9R0VORVJBVE9SPg0KPFNUWUxFPjwvU1RZTEU+DQo8L0hFQUQ+DQo8Qk9E WSBiZ0NvbG9yPSNmZmZmZmY+DQo8RElWPldobyBjYW4gcHJvdmlkZSBtZSB3aXRoIHRoZSA8RU0+ ODI1NTkgU29mdHdhcmUgVXNlciBNYW51YWwgPC9FTT4/PC9ESVY+DQo8RElWPkkgY2FuJ3QgZmlu ZCBpdCE8L0RJVj48L0JPRFk+PC9IVE1MPg0K ------=_NextPart_000_0007_01BFECDF.8D5519D0-- From paulsen@texas.net Mon, 17 Jul 2000 15:49:13 -0500 Date: Mon, 17 Jul 2000 15:49:13 -0500 From: Robert C. Paulsen, Jr. paulsen@texas.net Subject: [eepro100] Command unit failed to mark command 00000000 as complete -- what does it mean? My var log messages file has a few hundred of the following messages. This started about 3 days ago. Jul 17 14:46:21 home kernel: eth0: Command unit failed to mark command 00000000 as complete at 78644. Does this mean my network card is going bad? Perhaps it is due to excessive activity on the network -- see the numbers in ifconfig. # fconfig eth0 Link encap:Ethernet HWaddr 00:A0:C9:4B:AD:24 inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::a0:c94b:ad24/10 Scope:Link inet6 addr: fe80::2a0:c9ff:fe4b:ad24/10 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:9022484 errors:1 dropped:0 overruns:0 frame:107 TX packets:430402 errors:2 dropped:0 overruns:21319 carrier:556 collisions:3367701 txqueuelen:100 Interrupt:10 Base address:0x5000 -- ____________________________________________________________________ Robert Paulsen paulsen@texas.net From ser.wee.kwek@intel.com Tue, 18 Jul 2000 00:22:16 -0700 Date: Tue, 18 Jul 2000 00:22:16 -0700 From: Kwek, Ser Wee ser.wee.kwek@intel.com Subject: [eepro100] Does eepro100 support Load Balancing Feature for Intel PRO/100 fa mily under LINUX OS? hi, Does eepro100 support Load Balancing Feature for Intel PRO/100 family under LINUX OS? thanks, ah kwek. From becker@scyld.com Tue, 18 Jul 2000 11:08:56 -0400 (EDT) Date: Tue, 18 Jul 2000 11:08:56 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Command unit failed to mark command 00000000 as complete -- what does it mean? On Mon, 17 Jul 2000, Robert C. Paulsen, Jr. wrote: > Subject: [eepro100] Command unit failed to mark command 00000000 as complete -- what does it mean? > > My var log messages file has a few hundred of the following messages. > This started about 3 days ago. What driver version are you using? > Jul 17 14:46:21 home kernel: eth0: Command unit failed to mark command 00000000 as complete at 78644. This message indicates that the eepro100 you are using has a bug where it skipped marking a command as complete. When this occurs it means that the chip has corrupted its internal state. The driver can reset the chip, but the same problem will recur almost immediately. The driver recovers from this problem, but the recovery is slower than normal operation. The only full recovery seems to be a hard reset or powering off the system. This bug appears on no errata list that I have seen. It seems to affect only a few chip versions, and be triggered by only some motherboards. This bug was a nasty problem, and it gave me a bad reputation. It's the kind of bug where it would happen to someone, they would make a random change to the driver, and their updated driver would run reliably for a week. They would submit the change as a "bug fix". When I stated that their change didn't fix any obvious bug, they would stomp off and call me names. After all, they had seen my driver stop repeated in the span of a few minutes, and their driver just ran for a whole week without a problem. This very situation happened to Linus, and he never admitted that his changes to eepro100 didn't fix the problem. He just believed that I had some other hidden flaw in the driver. In v1.09s I added an explicit check for this case. Here is that change log entry -- look at entry #7. At this point I still wasn't certain that descriptor skipping was A Bug: ________________ date: 1999/09/30 00:55:38; author: becker; state: Exp; lines: +283 -222 eepro100.c v1.09s 9/29/99 Updated to track the "kern-2.3" version. Added TX_QUEUE_UNFULL, the queue length where we once again accept Tx packets. Shuffled the kernel version compatibility code around and added local version of the pci-scan routines. Added a new PCI device ID 0x1029, reported by Russ Nelson. Changed clear_suspend() to use a byte write rather than an atomic bit op. Changed the Tx-timeout check to avoid false triggers. This included adding a last_cmd_time variable. Changed to struct net_device from struct device. Always write SCBCmd as byte-wide rather than word-wide. Added explicit descriptor-skipped check when scavenging the command list. Reset the chip when shutting down the interface, rather than just stopping it, to disable flow control packets that might be sent. Changed the ordering of command queue operations to eliminate the window where sp->cur_tx points to a net-yet-valid command. We should no longer need a lock in the interrupt routine, and the locked regions when adding a command are shorter. (Note: the locks have not been moved to take advantage of this.) ---------------------------- Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From becker@scyld.com Tue, 18 Jul 2000 11:34:52 -0400 (EDT) Date: Tue, 18 Jul 2000 11:34:52 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Does eepro100 support Load Balancing Feature for Intel PRO/100 fa mily under LINUX OS? On Tue, 18 Jul 2000, Kwek, Ser Wee wrote: > Subject: [eepro100] Does eepro100 support Load Balancing Feature for Intel PRO/100 fa mily under LINUX OS? > Does eepro100 support Load Balancing Feature for Intel PRO/100 > family under LINUX OS? Load balancing, AKA channel bonding or channel aggregation, should not be a function of the driver. We commonly channel bond with various adapter types, and I've even bonded 10Mbps and 100Mbps channels just to verify that it would work. The reason it is packaged with drivers, and tied to specific hardware, in the Microsoft world is that it's a "value add". The adapter vendor doesn't want people using it with low-cost hardware, or with other vendor's hardware. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From vic@ibas-labs.de Tue, 18 Jul 2000 17:48:20 +0200 Date: Tue, 18 Jul 2000 17:48:20 +0200 From: Peter Franck vic@ibas-labs.de Subject: [eepro100] Does eepro100 support Load Balancing Feature for Intel PRO/100 fa mily under LINUX OS? > Load balancing, AKA channel bonding or channel aggregation, should not be a > function of the driver. We commonly channel bond with various adapter > types, and I've even bonded 10Mbps and 100Mbps channels just to verify > that it would work. How do I configure this channel bonding ? (just an idiot question) From egidy@deam.de Wed, 19 Jul 2000 00:17:18 +0200 Date: Wed, 19 Jul 2000 00:17:18 +0200 From: Gerd v. Egidy egidy@deam.de Subject: [eepro100] Does eepro100 support Load Balancing Feature forIntel PRO/100 fa mily under LINUX OS? > > Subject: [eepro100] Does eepro100 support Load Balancing Feature for Intel > PRO/100 fa mily under LINUX OS? > > Does eepro100 support Load Balancing Feature for Intel PRO/100 > > family under LINUX OS? > > Load balancing, AKA channel bonding or channel aggregation, should not be a > function of the driver. We commonly channel bond with various adapter > types, and I've even bonded 10Mbps and 100Mbps channels just to verify > that it would work. I've heard that this currently doesn't work the way you expect it under Linux: You get two net devices (eg. eth0 and eth1) and set up a special balance routing through both of them. Then you have to apply a patch which causes the kernel to delete the routing cache after every packet. The result is a poor performance because of the patch. I think i heard of a eepro100 driver from Intel which contains this feature but bound to the driver and not usable for all adapter types. This driver lacks a bit of performance compared to the standard eepro100 driver for linux. Donald, can you confirm this information or is this just rumors or outdated? Regards, Gerd From linuxguy@houston.rr.com Tue, 18 Jul 2000 17:54:04 -0500 Date: Tue, 18 Jul 2000 17:54:04 -0500 From: John Cagle linuxguy@houston.rr.com Subject: [eepro100] 2.2.16 eepro100 module? Does anyone have a pre-built eepro100.o (Donald Becker's version 1.10a) for the 2.2.16 kernel? I keep having the "eth0: no receive resources" problem with version 1.09j-t modified by Andrey. Thanks, John From paulsen@texas.net Tue, 18 Jul 2000 19:50:18 -0500 Date: Tue, 18 Jul 2000 19:50:18 -0500 From: Robert C. Paulsen, Jr. paulsen@texas.net Subject: [eepro100] Command unit failed to mark command 00000000 ascomplete -- what does it mean? Donald, Thanks for the reply. The version of the driver (from the source) is: eepro100.c:v1.09r2 10/15/99. This is from a SuSE 6.4 distribution. The card itself has the following markings on the chip: 582557 L7233192 SL24Z (c) 1989 1995 I have swapped out the eepro100 for a RealTek RTL8139 and am now using your driver: rtl8139.c:v1.08 6/25/99. So far, so good! (And your reputation is just fine with me!) Donald Becker wrote: > > On Mon, 17 Jul 2000, Robert C. Paulsen, Jr. wrote: > > > Subject: [eepro100] Command unit failed to mark command 00000000 as complete > -- what does it mean? > > > > My var log messages file has a few hundred of the following messages. > > This started about 3 days ago. > > What driver version are you using? > > > Jul 17 14:46:21 home kernel: eth0: Command unit failed to mark command 00000000 as complete at 78644. > > This message indicates that the eepro100 you are using has a bug where it > skipped marking a command as complete. > > When this occurs it means that the chip has corrupted its internal state. > The driver can reset the chip, but the same problem will recur almost > immediately. The driver recovers from this problem, but the recovery is > slower than normal operation. The only full recovery seems to be a hard > reset or powering off the system. > > This bug appears on no errata list that I have seen. It seems to affect > only a few chip versions, and be triggered by only some motherboards. > > This bug was a nasty problem, and it gave me a bad reputation. It's the > kind of bug where it would happen to someone, they would make a random > change to the driver, and their updated driver would run reliably for a > week. They would submit the change as a "bug fix". When I stated that > their change didn't fix any obvious bug, they would stomp off and call me > names. After all, they had seen my driver stop repeated in the span of a > few minutes, and their driver just ran for a whole week without a problem. > This very situation happened to Linus, and he never admitted that his > changes to eepro100 didn't fix the problem. He just believed that I had > some other hidden flaw in the driver. > > In v1.09s I added an explicit check for this case. Here is that change > log entry -- look at entry #7. At this point I still wasn't certain that > descriptor skipping was A Bug: > > ________________ > date: 1999/09/30 00:55:38; author: becker; state: Exp; lines: +283 -222 > eepro100.c v1.09s 9/29/99 > Updated to track the "kern-2.3" version. > > Added TX_QUEUE_UNFULL, the queue length where we once again accept Tx packets. > > Shuffled the kernel version compatibility code around and added local version > of the pci-scan routines. > > Added a new PCI device ID 0x1029, reported by Russ Nelson. > > Changed clear_suspend() to use a byte write rather than an atomic bit op. > > Changed the Tx-timeout check to avoid false triggers. This included adding > a last_cmd_time variable. > > Changed to struct net_device from struct device. > > Always write SCBCmd as byte-wide rather than word-wide. > > Added explicit descriptor-skipped check when scavenging the command list. > > Reset the chip when shutting down the interface, rather than just stopping it, > to disable flow control packets that might be sent. > > Changed the ordering of command queue operations to eliminate the window > where sp->cur_tx points to a net-yet-valid command. We should no longer need > a lock in the interrupt routine, and the locked regions when adding a command > are shorter. (Note: the locks have not been moved to take advantage of this.) > ---------------------------- > -- ____________________________________________________________________ Robert Paulsen paulsen@texas.net From becker@scyld.com Wed, 19 Jul 2000 01:09:09 -0400 (EDT) Date: Wed, 19 Jul 2000 01:09:09 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Does eepro100 support Load Balancing Feature forIntel PRO/100 fa mily under LINUX OS? On Wed, 19 Jul 2000, Gerd v. Egidy wrote: > > > Does eepro100 support Load Balancing Feature for Intel PRO/100 > > > family under LINUX OS? > > > > Load balancing, AKA channel bonding or channel aggregation, should not be a > > function of the driver. We commonly channel bond with various adapter > > types, and I've even bonded 10Mbps and 100Mbps channels just to verify > > that it would work. > > I've heard that this currently doesn't work the way you expect it under Linux: > You get two net devices (eg. eth0 and eth1) and set up a special balance routing > through both of them. Then you have to apply a patch which causes the kernel to > delete the routing cache after every packet. ACCCKKK!! NOOoooo. > The result is a poor performance because of the patch. Any patch that does that would result in *really* poor performance. The way channel bonding works is by copying the master station address to the slaves, and distributing transmit packets. Transmit packets are queued to the interfaces in a round-robin fashion. If one of the driver Tx queues fills, it is dropped out of the round-robin sequence. This is very similar to the way switch trunking, developed subsequent to channel bonding, works. Switch trunking is just bonding to a single switch, rather than parallel isolated networks. The complication with switch trunking is automatically detecting that trunking should be done. > I think i heard of a eepro100 driver from Intel which contains this feature but bound > to the driver and not usable for all adapter types. This driver lacks a bit of > performance compared to the standard eepro100 driver for linux. > > Donald, can you confirm this information or is this just rumors or outdated? We've used channel bonding since 1994, and the scheme has remained almost unchanged since then. There was some prior work (on other OSes) with bonding at the IP level, and even one NFS-specific approach at the UDP level. Ron Minnich probably remembers that one... Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From d.mueller@elsoft.ch Thu, 20 Jul 2000 12:06:30 +0200 Date: Thu, 20 Jul 2000 12:06:30 +0200 From: David =?iso-8859-1?Q?M=FCller?= (ELSOFT AG) d.mueller@elsoft.ch Subject: [eepro100] Re: kernel: eth0: card reports no resources Hello Andrey Savochkin wrote: > > Hello, > > On Wed, Jun 28, 2000 at 10:07:40AM -0500, John Cagle wrote: > > I'm getting the same error periodically on an 800 Mhz system, so in my > > case, I don't think it's a CPU deficiency. > > > > When it starts happening on my system, I get the error for just about > > every packet the NIC tries to receive on my network. To clear it up, I > > have to "ifdown eth0; ifup eth0". > > > > This is with eepro100.c v1.09j-t 9/29/99, Revision 1.20.2.10 modified by > > Andrey. > > I can confirm that there is such a problem, and it doesn't depend on CPU. > I'm working on it now. > > Best regards > Andrey V. > Savochkin > Sorry if i have missed something, but what's the status of this problem. After fixing some IRQ problems on my StrongArm board, the "card reports no resources" bug seems to be the next one. ;-) TIA Dave From jon@advercast.com Thu, 20 Jul 2000 16:44:25 -0700 Date: Thu, 20 Jul 2000 16:44:25 -0700 From: Jon Oringer jon@advercast.com Subject: [eepro100] (problems!!) EtherExpress PRO/100 Mobile 32 bit cardbus -- I just installed Redhat 6.2 -- Im having problems getting the eepro100 32 bit PCMCIA cardbus to work.. any network activity after ifup eth0 causes the following message "eth0": 21140 transmit timed out, status ... eth0: Tx hung, 15 vs. 9 Facts: - link light on PCMCIA dongle is ON, link light on HUB is OFF - EEPROM is reported missing.. - exclude irq 3 is commented out on /etc/pcmcia/config.opts - my laptop is a Sony PCG-505TX I found other questions like this on the list -- but couldn't find an answer.. does anybody have one? or could point me to the answer? thanks! -Jon Oringer SurfSecret Software http://www.surfsecret.com From becker@scyld.com Thu, 20 Jul 2000 18:15:08 -0400 (EDT) Date: Thu, 20 Jul 2000 18:15:08 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] (problems!!) EtherExpress PRO/100 Mobile 32 bit cardbus -- On Thu, 20 Jul 2000, Jon Oringer wrote: > Subject: [eepro100] (problems!!) EtherExpress PRO/100 Mobile 32 bit cardbus -- > > I just installed Redhat 6.2 -- Im having problems getting the eepro100 32 > bit > PCMCIA cardbus to work.. > > any network activity after ifup eth0 causes the following message > "eth0": 21140 transmit timed out, status ... > eth0: Tx hung, 15 vs. 9 This is the Tulip driver, not the eepro100 driver. The Intel "version II" card is based on the i82559 and uses the eepro100 driver. (Yes, it's confusing.) You should post this to the tulip list. See http://www.scyld.com/network/tulip.html You should include the actual status, since the report above is almost useless. The 'tulip-diag -af' output would also be useful, and may just directly tell you the problem. (IRQ blocked, bus parity error, etc.) Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From jasonw@liberate.com Fri, 21 Jul 2000 11:04:35 -0400 Date: Fri, 21 Jul 2000 11:04:35 -0400 From: Jason Williams jasonw@liberate.com Subject: [eepro100] eepro problem: Transmit timed out... I having some trouble with several Dell 2450 servers. The ethernet interfaces go down periodically with the following error. "eth0: Transmit timed out: status 0090 0080 at 31383422/31383450 command 000c0000." I have at least 3 Dell 2450's are displaying this problem. Machines are: Dell 2450's RedHat 6.2 Dual port Intel Ethernet Express Pro 100 Driver version: eepro v1.09j-t 9/29/99 Problem is difficult to reproduce. Some machines eth interfaces fail regularily, others can take a week or more for the problem to show up. Once they go down, ifdown/ifup fails to fix the problem. After reboot, eth interfaces come back ok. I'm looking for a good stress test case to run on the machines to reproduce this problem. Is there a known issue with these cards that causes this error? -- Jason Williams From vic@ibas-labs.de Fri, 21 Jul 2000 17:29:12 +0200 Date: Fri, 21 Jul 2000 17:29:12 +0200 From: Peter Franck vic@ibas-labs.de Subject: [eepro100] Does eepro100 support Load Balancing Feature forIntel PRO/100 fa mily under LINUX OS? Thank you, Donald, for your explanation. However, my question was much more "user" level: How can I switch on channel bonding in my Linux Server? Peter Donald Becker wrote: ... > The way channel bonding works is by copying the master station address to > the slaves, and distributing transmit packets. Transmit packets are queued > to the interfaces in a round-robin fashion. If one of the driver Tx queues > fills, it is dropped out of the round-robin sequence. ... > We've used channel bonding since 1994, and the scheme has remained almost > unchanged since then. There was some prior work (on other OSes) with > bonding at the IP level, and even one NFS-specific approach at the UDP > level. Ron Minnich probably remembers that one... ... From linux_play@excite.com Sat, 22 Jul 2000 20:30:41 -0700 (PDT) Date: Sat, 22 Jul 2000 20:30:41 -0700 (PDT) From: Hans Williams linux_play@excite.com Subject: [eepro100] Problems with EEPRO100 I have a slackware 7 box with the newest kernel installed (2.2.16). I downloaded the latest eepro100 driver from scyld.com and compiled as directed (including pci-lib). After getting eepro100.o, I did: insmod ./eepro100.o after unloading the current module. However, I get this: unresolved symbol acpi_set_pwr_state unresolved symbol pci_drv_unregister unresolved symbol pci_drv_register However this does not happen when I insmod eepro100.o that came with the 2.2.16 source (from make modules; make modules_install) The reason I'm getting into this is because when I run tcpdump on this box, I get TONS of "truncated-ip missing 22 bytes..." I can access networks, but a lot of these messages come across. The number of bytes varies quite a bit as well. Any help would be greatly appreciated. Thank You. Hans linux_play@excite.com _______________________________________________________ Say Bye to Slow Internet! http://www.home.com/xinbox/signup.html From Andreas.Fey@t-online.de Mon, 24 Jul 2000 09:28:04 +0200 Date: Mon, 24 Jul 2000 09:28:04 +0200 From: Andreas Fey Andreas.Fey@t-online.de Subject: [eepro100] card reports no resources / RX buffers Hi, I have problems using two Pro / 100 + Cards (Board assembly 721383-009) on Redhat 6.2 with kernel 2.2.16 The error in syslog is: kernel: eth0: card reports no resources. kernel: eth0: card reports no RX buffers. After one or more reboots, the cards seem to work until the next reboot. Probably there is a problem with the driver ? BTW, should I use the intel driver instead of Donald Beckers' one ? Thanx, Andy. From Fredrik.P.Persson@era.ericsson.se Mon, 24 Jul 2000 10:03:17 +0200 Date: Mon, 24 Jul 2000 10:03:17 +0200 From: Fredrik Persson P (QRA) Fredrik.P.Persson@era.ericsson.se Subject: [eepro100] card reports no resources / RX buffers This is a known bug. As I recall, Andrey S, maintainer of eepro100.c (not Donald, at least I *think* so) stated the 20:th this month on this list that he is aware of this bug and that he will fix it. You might want to try the driver that intel published. I've heard about people beeing successful with that one. Here is the link (which you might have, but I provide it anyway): http://support.intel.com/support/network/adapter/pro100/30504.htm /Fredrik Persson > -----Original Message----- > From: Andreas.Fey@t-online.de [SMTP:Andreas.Fey@t-online.de] > Sent: den 24 juli 2000 09:28 > To: eepro100@scyld.com > Subject: [eepro100] card reports no resources / RX buffers > > Hi, > > I have problems using two Pro / 100 + Cards (Board assembly 721383-009) on Redhat 6.2 with kernel 2.2.16 > > The error in syslog is: > > kernel: eth0: card reports no resources. > kernel: eth0: card reports no RX buffers. > > After one or more reboots, the cards seem to work until the next reboot. > > Probably there is a problem with the driver ? BTW, should I use the intel driver instead of Donald Beckers' one ? > > Thanx, Andy. > > > > _______________________________________________ > eepro100 mailing list > eepro100@scyld.com > http://www.scyld.com/mailman/listinfo/eepro100 From RKrawl@microtest.com Mon, 24 Jul 2000 11:33:12 -0700 Date: Mon, 24 Jul 2000 11:33:12 -0700 From: Krawl, Roeland RKrawl@microtest.com Subject: [eepro100] card reports no resources / RX buffers This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01BFF59D.9FB5FFE0 Content-Type: text/plain If the "No resources" and "No Rx buffers" messages occur occasionally during driver initialization (even when no network cable is attached) then that is the symptom that I reported to this mail list and Andrey in March 2000. Donald Becker did not respond to any of the emails sent to him. The following is a summary of the symptom and fix that was posted to this list in March 2000. With no network cable attached, then obviously the "No RX buffers" message is not caused as a result of receiving too many packets. In our case, it was the result of the eepro100 ver. 1.09t driver not properly initializing the 82559ER chip. The "wait_for_cmd_done()" routine (in the eepro100 driver) falsely gives the impression that the routine waits for command completion. After each call to "wait_for_cmd_done()" we added an additional delay to ensure that the command has been completed before changing the contents of the System Control Block General Pointer in preparation for the next command. Apparently, the 82559ER was reporting "no Rx bufs" as a result of obtaining a bogus pointer to the Rx Ring because the real Rx ring was known to be properly initialized and Rx resources were definitely available. As a result of improper initialization, continuous "flow control paused" interrupts were causing our Linux system to hang. I believe that the eepro100 driver does not acknowledge the flow control paused interrupts due to an incorrect status mask. The driver is using a SCB status mask of 0xFC00. The Linux driver from Intel (recently released) correctly uses a status mask of 0xF300. The status mask of 0xFC00 prevents the driver from acknowledging the flow control and early receive interrupts in the 82559 chip. > -----Original Message----- > From: Fredrik Persson P (QRA) [SMTP:Fredrik.P.Persson@era.ericsson.se] > Sent: Monday, July 24, 2000 1:03 AM > To: 'eepro100@scyld.com' > Subject: RE: [eepro100] card reports no resources / RX buffers > > This is a known bug. As I recall, Andrey S, maintainer of eepro100.c (not > Donald, at least I *think* so) stated the 20:th this month on this list > that he is aware of this bug and that he will fix it. You might want to > try the driver that intel published. I've heard about people beeing > successful with that one. > > Here is the link (which you might have, but I provide it anyway): > > http://support.intel.com/support/network/adapter/pro100/30504.htm > > /Fredrik Persson > > > -----Original Message----- > > From: Andreas.Fey@t-online.de [SMTP:Andreas.Fey@t-online.de] > > Sent: den 24 juli 2000 09:28 > > To: eepro100@scyld.com > > Subject: [eepro100] card reports no resources / RX buffers > > > > Hi, > > > > I have problems using two Pro / 100 + Cards (Board assembly 721383-009) > on Redhat 6.2 with kernel 2.2.16 > > > > The error in syslog is: > > > > kernel: eth0: card reports no resources. > > kernel: eth0: card reports no RX buffers. > > > > After one or more reboots, the cards seem to work until the next reboot. > > > > Probably there is a problem with the driver ? BTW, should I use the > intel driver instead of Donald Beckers' one ? > > > > Thanx, Andy. > > > > > > > > _______________________________________________ > > eepro100 mailing list > > eepro100@scyld.com > > http://www.scyld.com/mailman/listinfo/eepro100 > > > _______________________________________________ > eepro100 mailing list > eepro100@scyld.com > http://www.scyld.com/mailman/listinfo/eepro100 ------_=_NextPart_001_01BFF59D.9FB5FFE0 Content-Type: text/html Content-Transfer-Encoding: quoted-printable RE: [eepro100] card reports no resources / RX buffers

If the "No = resources" and "No Rx buffers" messages occur = occasionally during driver initialization (even when no network cable = is attached) then that is the symptom that I reported to this mail list = and Andrey in March 2000.

Donald Becker did = not respond to any of the emails sent to him.=20

The following is a = summary of the symptom and fix that was posted to this list in March = 2000.

With no network cable attached, then = obviously the "No RX buffers" message is not caused as
 a result of receiving too many = packets. In our case, it was the result of the eepro100 ver. 1.09t = driver
 not properly initializing the = 82559ER chip.

The "wait_for_cmd_done()" = routine (in the eepro100 driver) falsely gives the impression that = the
 routine waits for command = completion. After each call to "wait_for_cmd_done()"
we added an additional delay to = ensure that the command has been completed before
changing the contents of the System = Control Block General Pointer in
preparation for the next = command.
 
Apparently, the = 82559ER was reporting "no Rx bufs" as a result of obtaining a = bogus pointer to the Rx Ring because the real Rx ring was known to be = properly initialized and Rx resources were definitely available. =

As a result of = improper initialization, continuous "flow control paused" = interrupts were causing our Linux system to hang. I believe that the = eepro100 driver does not acknowledge the flow control paused interrupts = due to an incorrect status mask.

The driver is using = a SCB status mask of 0xFC00. The Linux driver from Intel (recently = released) correctly uses a status mask of  0xF300. The status mask = of 0xFC00 prevents the driver from acknowledging the flow control and = early receive interrupts in the 82559 chip.


    -----Original Message-----
    From:   Fredrik Persson P (QRA) = [SMTP:Fredrik.P.Persson@era.ericsson.se]
    Sent:   Monday, July 24, 2000 1:03 AM
    To:     'eepro100@scyld.com'
    Subject:       = RE: [eepro100] card reports no = resources / RX buffers

    This is a known bug. As I recall, = Andrey S, maintainer of eepro100.c (not Donald, at least I *think* so) = stated the 20:th this month on this list that he is aware of this bug = and that he will fix it. You might want to try the driver that intel = published. I've heard about people beeing successful with that = one.

    Here is the link (which you might = have, but I provide it anyway):

    http://support.intel.com/support/network/adapter/pro10= 0/30504.htm

    /Fredrik Persson

    > -----Original Message-----
    > From: Andreas.Fey@t-online.de = [SMTP:Andreas.Fey@t-online.de]
    > Sent: den 24 juli 2000 = 09:28
    > To:   = eepro100@scyld.com
    > = Subject:      [eepro100] card reports no = resources / RX buffers
    >
    > Hi,
    >
    > I have problems using two Pro / = 100 + Cards (Board assembly 721383-009) on Redhat 6.2 with kernel = 2.2.16
    >
    > The error in syslog is:
    >
    > kernel: eth0: card reports no = resources.
    > kernel: eth0: card reports no RX = buffers.
    >
    > After one or more reboots, the = cards seem to work until the next reboot.
    >
    > Probably there is a problem with = the driver ? BTW, should I use the intel driver instead of Donald = Beckers' one ?
    >
    > Thanx, Andy.
    >
    >
    >
    > = _______________________________________________
    > eepro100 mailing list
    > eepro100@scyld.com
    > http://www.scyld.com/mailman/listinfo/eepro100


    _______________________________________________
    eepro100 mailing list
    eepro100@scyld.com
    http://www.scyld.com/mailman/listinfo/eepro100

------_=_NextPart_001_01BFF59D.9FB5FFE0-- From becker@scyld.com Mon, 24 Jul 2000 16:12:10 -0400 (EDT) Date: Mon, 24 Jul 2000 16:12:10 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] card reports no resources / RX buffers On Mon, 24 Jul 2000, Krawl, Roeland wrote: > If the "No resources" and "No Rx buffers" messages occur occasionally during > driver initialization (even when no network cable is attached) then that is > the symptom that I reported to this mail list and Andrey in March 2000. > Donald Becker did not respond to any of the emails sent to him. > ... > The driver is using a SCB status mask of 0xFC00. The Linux driver from Intel > (recently released) correctly uses a status mask of 0xF300. The status mask > of 0xFC00 prevents the driver from acknowledging the flow control and early > receive interrupts in the 82559 chip. This has been responded to several time: the documentation I have indicates that 0xF300 is not a correct mask for the way we are using the chip. Perhaps the 'ER' has some special firmware -- I don't have a sample to test with. The v1.09t driver was the first to detect the 'ER' chip, and there was no production implementation at the time. > The following is a summary of the symptom and fix that was posted to this > list in March 2000. The v1.10 driver was current as of March 2000, and the v1.11 driver is currently available. You are referencing the older v1.09t driver, which was from 9/29/99. There was problem with flow control, but it wasn't a driver bug. The i82559 chip fails to disable flow control on half duplex links, and can spew flow control frames when the machine is shut down. The v1.10 driver works around this chip/documentation bug by handling the autonegotation settings itself instead of relying on the chip's firmware. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From mark@idrive.com Mon, 24 Jul 2000 17:22:27 -0700 Date: Mon, 24 Jul 2000 17:22:27 -0700 From: Mark Cox mark@idrive.com Subject: [eepro100] eepro problem: Transmit timed out... I have the same problem. I have not received positive responses from this list about it. Currently I automate a ping check on the server's default router. When it fails for 5 seconds I ifconfig the card down, unload the module, reload the module, and re-ifconfig it. Makes for a crappy fix, but I have yet to see a better one. ============================================ Mark Cox | UNIX sysadmin | i-drive.com T: 415.551.2307 | F: 415.551.7599 | E: mark@idrive.com | I: www.idrive.com/mark ============================================ "Top Five Applications on the Web" CNet.com Save, Access & Share at www.idrive.com -----Original Message----- From: eepro100-admin@scyld.com [mailto:eepro100-admin@scyld.com]On Behalf Of Jason Williams Sent: Friday, July 21, 2000 8:05 AM To: eepro100@scyld.com Subject: [eepro100] eepro problem: Transmit timed out... I having some trouble with several Dell 2450 servers. The ethernet interfaces go down periodically with the following error. "eth0: Transmit timed out: status 0090 0080 at 31383422/31383450 command 000c0000." I have at least 3 Dell 2450's are displaying this problem. Machines are: Dell 2450's RedHat 6.2 Dual port Intel Ethernet Express Pro 100 Driver version: eepro v1.09j-t 9/29/99 Problem is difficult to reproduce. Some machines eth interfaces fail regularily, others can take a week or more for the problem to show up. Once they go down, ifdown/ifup fails to fix the problem. After reboot, eth interfaces come back ok. I'm looking for a good stress test case to run on the machines to reproduce this problem. Is there a known issue with these cards that causes this error? -- Jason Williams _______________________________________________ eepro100 mailing list eepro100@scyld.com http://www.scyld.com/mailman/listinfo/eepro100 From andrewm@uow.edu.au Tue, 25 Jul 2000 13:41:33 +1000 Date: Tue, 25 Jul 2000 13:41:33 +1000 From: Andrew Morton andrewm@uow.edu.au Subject: [eepro100] eepro problem: Transmit timed out... Jason Williams wrote: > > I having some trouble with several Dell 2450 servers. > > The ethernet interfaces go down periodically with the following error. > "eth0: Transmit timed out: status 0090 0080 at 31383422/31383450 > command 000c0000." > This could be an APIC problem, not a driver problem. Under Linux APICs sometimes just forget how to generate interrupts. Does an rmmod/insmod restore operation? If not, suspect the APIC. If you're running 2.2, try booting with the `noapic' option. If the problem goes away, suspect the APIC. From saw@saw.sw.com.sg Tue, 25 Jul 2000 10:39:20 +0800 Date: Tue, 25 Jul 2000 10:39:20 +0800 From: Andrey Savochkin saw@saw.sw.com.sg Subject: [eepro100] Re: card reports no resources / RX buffers Hello, On Mon, Jul 24, 2000 at 11:33:12AM -0700, Krawl, Roeland wrote: > The "wait_for_cmd_done()" routine (in the eepro100 driver) falsely gives the > impression that the > routine waits for command completion. After each call to > "wait_for_cmd_done()" > we added an additional delay to ensure that the command has been completed > before > changing the contents of the System Control Block General Pointer in > preparation for the next command. You permanently repeat that "wait_for_cmd_done falsely gives the impression that the routine waits for command completion". I don't understand your point. It waits for the chip to decode the command and be ready to accept new one. Linux driver does the wait as any other driver (Intel, BSD). If you think that something is done wrong, could you point it out? I definitely don't believe that arbitrary delays inserted in different places in the driver fix any problem. Any delay must be properly justified by the statements from documentation and/or examples from other existing drivers. Best regards Andrey V. Savochkin From napier@napiersys.bc.ca Mon, 24 Jul 2000 21:26:30 -0700 (PDT) Date: Mon, 24 Jul 2000 21:26:30 -0700 (PDT) From: Duncan Napier napier@napiersys.bc.ca Subject: [eepro100] Odd IEE Pro/100+ problem on Linux 2.2.12-20 Hello, I've just joined your mailing list with the purpose of finding a solution to my problem: I'm using RedHat 6.1, 2.2.12 kernel recompiled with FreeSWAN IPSec. I have an odd problem. I have 2 identical boxes, Dell Dimension Pentium 133 MHz machines with 32 Mb of RAM. Both each have 2 Intel Ether Express Pro/100+ cards in them (total of 4 cards, dual NICs in each machine). They both use the kernel and the eepro100 modules sources that came with the RedHat 6.1 distribution. Each machine has eth0 as a DHCP IP address, while eth1 is a static, internal (RFC1918) IP address. The first machine works flawlessly, the second one freezes on the bootup on card eth1 (eth0 passes fine). Oddly enough, the second one will boot just fine if the network connection from eth1 is unplugged! (ie, you just yank out the RJ45 connector, and all is well. I have tested it with a 3Com TP800 100 mbps hub and a Linksys EFAH05W 10/100 mbps hub). The second problem machine will then carry on working just fine when the network is plugged in again after it has passed eth1. After that, it too runs flawlessly! The machines are Firewalling VPN gateways and once booted, work just fine. I set up and tested the machines offsite with 2 static IP addresses and everything worked fine. Once I shipped the second one onsite to a site with DHCP-assigned IP addresses on eth0, it would lock up on the boot pass through eth1. This almost seems like a hardware problem to me, but can anyone explain this? It is a real pain troubleshooting the machine now, especially when it refuses to boot. It has already been installed and is running some distance away (I have to fly or take a boat to get there :-), and there is no onsite tech help). I read somewhere on the 'Net that for machine with identical NICs, the ordering of the MAC addresses for the cards asssigned to eth0 and eth1 can cause problems. It stated that if eth0 had a larger MAC address that eth1, there could be problems. Now this sounds like a load of baloney to me, but I have noticed that the machine that works has a lower MAC address on eth0 than eth1, but the one that doesn't has the reverse case .... of course the probablility that this coincidence is 50%. Both machines appear completely identical in all other respects, eg: /etc/conf.modules: alias eth0 eepro100 /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE="eth0" BOOTPROTO="dhcp" IPADDR="" NETMASK="" ONBOOT="yes" IPXNETNUM_802_2="" ..... /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE="eth1" BOOTPROTO="none" IPADDR="192.168.2.154" NETMASK="255.255.255.0" ONBOOT="yes" IPXNETNUM_802_2="" .... /sbin/insmod eepro100 dmesg : eth0: Intel EtherExpress Pro 10/100 at 0xff00, 00:D0:B7:73:3E:CA, IRQ 11. Receiver lock-up bug exists -- enabling work-around. Board assembly 721383-008, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). eth1: Intel EtherExpress Pro 10/100 at 0xfe80, 00:D0:B7:73:09:51, IRQ 11. Receiver lock-up bug exists -- enabling work-around. Board assembly 721383-008, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). Best Regards, Duncan Napier. ---------------------------------------------------------------------------- Duncan Napier Napier Systems Research From napier@napiersys.bc.ca Mon, 24 Jul 2000 21:27:58 -0700 (PDT) Date: Mon, 24 Jul 2000 21:27:58 -0700 (PDT) From: Duncan Napier napier@napiersys.bc.ca Subject: [eepro100] Odd IEE Pro/100+ problem on Linux 2.2.12-20 (fwd) Hello, I've just joined your mailing list with the purpose of finding a solution to my problem: I'm using RedHat 6.1, 2.2.12 kernel recompiled with FreeSWAN IPSec. I have an odd problem. I have 2 identical boxes, Dell Dimension Pentium 133 MHz machines with 32 Mb of RAM. Both each have 2 Intel Ether Express Pro/100+ cards in them (total of 4 cards, dual NICs in each machine). They both use the kernel and the eepro100 modules sources that came with the RedHat 6.1 distribution. Each machine has eth0 as a DHCP IP address, while eth1 is a static, internal (RFC1918) IP address. The first machine works flawlessly, the second one freezes on the bootup on card eth1 (eth0 passes fine). Oddly enough, the second one will boot just fine if the network connection from eth1 is unplugged! (ie, you just yank out the RJ45 connector, and all is well. I have tested it with a 3Com TP800 100 mbps hub and a Linksys EFAH05W 10/100 mbps hub). The second problem machine will then carry on working just fine when the network is plugged in again after it has passed eth1. After that, it too runs flawlessly! The machines are Firewalling VPN gateways and once booted, work just fine. I set up and tested the machines offsite with 2 static IP addresses and everything worked fine. Once I shipped the second one onsite to a site with DHCP-assigned IP addresses on eth0, it would lock up on the boot pass through eth1. This almost seems like a hardware problem to me, but can anyone explain this? It is a real pain troubleshooting the machine now, especially when it refuses to boot. It has already been installed and is running some distance away (I have to fly or take a boat to get there :-), and there is no onsite tech help). I read somewhere on the 'Net that for machine with identical NICs, the ordering of the MAC addresses for the cards asssigned to eth0 and eth1 can cause problems. It stated that if eth0 had a larger MAC address that eth1, there could be problems. Now this sounds like a load of baloney to me, but I have noticed that the machine that works has a lower MAC address on eth0 than eth1, but the one that doesn't has the reverse case .... of course the probablility that this coincidence is 50%. Both machines appear completely identical in all other respects, eg: /etc/conf.modules: alias eth0 eepro100 /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE="eth0" BOOTPROTO="dhcp" IPADDR="" NETMASK="" ONBOOT="yes" IPXNETNUM_802_2="" ..... /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE="eth1" BOOTPROTO="none" IPADDR="192.168.2.154" NETMASK="255.255.255.0" ONBOOT="yes" IPXNETNUM_802_2="" .... /sbin/insmod eepro100 dmesg : eth0: Intel EtherExpress Pro 10/100 at 0xff00, 00:D0:B7:73:3E:CA, IRQ 11. Receiver lock-up bug exists -- enabling work-around. Board assembly 721383-008, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). eth1: Intel EtherExpress Pro 10/100 at 0xfe80, 00:D0:B7:73:09:51, IRQ 11. Receiver lock-up bug exists -- enabling work-around. Board assembly 721383-008, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). Best Regards, Duncan Napier. ---------------------------------------------------------------------------- Duncan Napier Napier Systems Research From ptlymcc@hotmail.com Tue, 25 Jul 2000 15:15:10 GMT Date: Tue, 25 Jul 2000 15:15:10 GMT From: Chester Chee ptlymcc@hotmail.com Subject: [eepro100] Intel PRO 100 Server NIC Linux driver Hi, I am using RedHat 6.2. My server is using Intel PRO 100 Server NIC, and Linux does not seems to recognize the card. I tried the driver downloaded from Intel, but then I get a bunch of unresolve symbol error when I do 'insmod'. I have already tried to use the 'eepro100.o' but then I get "Device is busy or no resource" during init_module(). Any help or pointer to resolve this is greatly appreciated. Thanks in advance. ________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com From alhaz@xmission.com Tue, 25 Jul 2000 09:22:30 -0600 (MDT) Date: Tue, 25 Jul 2000 09:22:30 -0600 (MDT) From: Eric Jorgensen alhaz@xmission.com Subject: [eepro100] Intel PRO 100 Server NIC Linux driver > I am using RedHat 6.2. My server is using Intel PRO 100 Server NIC, and > Linux does not seems to recognize the card. I tried the driver downloaded that's weird. I never figured out what was "different" about the varying 82559 cards. > from Intel, but then I get a bunch of unresolve symbol error when I do > 'insmod'. I have already tried to use the 'eepro100.o' but then I get > "Device is busy or no resource" during init_module(). Any help or pointer to > resolve this is greatly appreciated. Thanks in advance. Make sure you edit the makefile that comes with intel's source, it enables SMP by default. if you're on a UP system it won't work. Also make sure that the kernel source in /usr/src/linux matches your current running kernel. - Eric From becker@scyld.com Tue, 25 Jul 2000 12:37:36 -0400 (EDT) Date: Tue, 25 Jul 2000 12:37:36 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Intel PRO 100 Server NIC Linux driver On Tue, 25 Jul 2000, Chester Chee wrote: > I am using RedHat 6.2. My server is using Intel PRO 100 Server NIC, and > Linux does not seems to recognize the card. I tried the driver downloaded This adapter has been documented as unsupported for years. See http://www.scyld.com/network/index.html#notsupported Intel will not release the programming interface for this board. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From becker@scyld.com Tue, 25 Jul 2000 12:39:50 -0400 (EDT) Date: Tue, 25 Jul 2000 12:39:50 -0400 (EDT) From: Donald Becker becker@scyld.com Subject: [eepro100] Intel PRO 100 Server NIC Linux driver On Tue, 25 Jul 2000, Eric Jorgensen wrote: > > I am using RedHat 6.2. My server is using Intel PRO 100 Server NIC, and > > Linux does not seems to recognize the card. I tried the driver downloaded > > that's weird. I never figured out what was "different" about the > varying 82559 cards. This isn't a i82559 chip. It has on an on-board processor with firmware download, and is really designed for older (slow) systems. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf Clusters / Linux Installations Annapolis MD 21403 From alhaz@xmission.com Tue, 25 Jul 2000 10:40:09 -0600 (MDT) Date: Tue, 25 Jul 2000 10:40:09 -0600 (MDT) From: Eric Jorgensen alhaz@xmission.com Subject: [eepro100] Intel PRO 100 Server NIC Linux driver > > On Tue, 25 Jul 2000, Eric Jorgensen wrote: > > > > I am using RedHat 6.2. My server is using Intel PRO 100 Server NIC, and > > > Linux does not seems to recognize the card. I tried the driver downloaded > > > > that's weird. I never figured out what was "different" about the > > varying 82559 cards. > > This isn't a i82559 chip. It has on an on-board processor with firmware > download, and is really designed for older (slow) systems. Oh, I've seen those. large, funky card with an i960? - Eric From dschmitz@pp1.usuhs.mil Tue, 25 Jul 2000 12:50:43 -0400 (EDT) Date: Tue, 25 Jul 2000 12:50:43 -0400 (EDT) From: dschmitz@pp1.usuhs.mil dschmitz@pp1.usuhs.mil Subject: [eepro100] eepro100 problems I am running a PC164 DEC Alpha with an SRM console and kernel 2.4 test5-pre3. When I boot the system with an Intel EtherExpress Pro100 card, the system locks after about a minute or so. When I boot the system without any network cards, it boots fine and doesn't lock. The network driver version I am using is included in the kernel, Revision 1.33 2000/05/24 by Andrey V. Savochkkin. I have tried to compile the 1.10a 4/15/00 by Donald Becker driver in hopes that it will work, but I get many errors because of undefined functions. I have included kern_compact.h, pci-scan.c, and pci-scan.h files in the directory as it said on the page where I got the drivers. I would appreciate any input, advice, or help anyone can give me. Thank you. David Schmitz The following errors occur: eepro100.c: In function 'speedo_open': eepro100.c:851: structure has no member named 'tbusy' eepro100.c:852: structure has no member named 'interrupt' eepro100.c:853: structure has no member named 'start' eepro100.c:867: warning: unsigned int format, different type arg (arg 3) eepro100.c: In function 'speedo_tx_timeout: eepro100.c:1106: warning: unsigned int format, different type arg (arg 4) eepro100.c: In function 'speedo_start_xmit': eepro100.c:1160: structure has no member named 'tbusy' eepro100.c:1209: structure has no member named 'tbusy' eepro100.c: In function 'speedo_interrupt': eepro100.c:1245: structure has no member named 'interrupt' eepro100.c:1331: structure has no member named 'tbusy' eepro100.c:1333: 'NET_BH' undeclared (first use in this function) eepro100.c:1333: (Each undeclared identifier is reported only once eepro100.c:1333: for each function it apears in.) eepro100.c:1349: warning: unsigned int format, different type arg (arg 3) eepro100.c:1351: structure has no member named 'interrupt' eepro100.c: In function 'speedo_close'; eepro100.c:1474: structure has no memeber named 'start' eepro100.c:1475: structure has no memeber named 'tbusy' eepro100.c:1479: warning: unsigned int format, different type arg (arg 3) eepro100.c: In function 'speedo_get_stats': eepro100.c:1556: structure has no member named 'start' From danisoto@uol.es Tue, 25 Jul 2000 18:59:05 +0200 Date: Tue, 25 Jul 2000 18:59:05 +0200 From: Daniel Soto Alvarez danisoto@uol.es Subject: [eepro100] eepro problem: Transmit timed out... Hi Mark! >I have the same problem. I have not received positive responses from this >list about it. Currently I automate a ping check on the server's default >router. When it fails for 5 seconds I ifconfig the card down, unload the >module, reload the module, and re-ifconfig it. Makes for a crappy fix, but I >have yet to see a better one. You can put your config/program/shell-script of this? Anyone can help me in this TUX-Group. I have the same problem with a EEPRO100+ and a 3COM 905B in my Linux server, and I like a simple, and robust, solution. Thanks! From mark@idrive.com Tue, 25 Jul 2000 10:23:34 -0700 Date: Tue, 25 Jul 2000 10:23:34 -0700 From: Mark Cox mark@idrive.com Subject: [eepro100] eepro problem: Transmit timed out... Sure. Like I said --its not pretty, but it keeps the servers up. ------->8 Snip 8<--------------------------- #!/bin/sh # Grab a few server-specific values. # If this does not do it for you, eyeball your # /etc/sysconfig/network file and change the # section below appropriately . /etc/sysconfig/network IP=$IP_ADDR MASK=$NETMASK hostname=`hostname` # The actual ping test pinger () { (sleep 5)| ping -c 2 ${GATEWAY} 2>&1 | grep transmitted \ | awk '{print $4}' | sed 's/%//' } # Change this to apply to your device # This forces 100Mb/full-duplex uplink () { ifconfig eth0 down rmmod eepro100 insmod eepro100 options="0x30" ifconfig eth0 inet ${IP} netmask ${MASK} up route add default gw ${GATEWAY} # Add any static routes specific to the machine if [ -e /etc/rc.d/rc3.d/S40route ]; then /etc/rc.d/rc3.d/S40route fi } # This is for notification to a newsgroup post () { export NNTPSERVER=news TEMPFILE=/tmp/link-check.$$ DATE=`date +'%m/%d/%y %H:%M'` INEWS=/usr/bin/inews cat > $TEMPFILE < Newsgroups: idrive.site-log MIME-Version: 1.0 Content-Type: text/html Subject: joggled link on ${hostname} ${date} $1:$2 joggled link in ${hostname} ${date} EOF $INEWS -h $TEMPFILE rm $TEMPFILE echo $date >> /var/log/link.log } while :; do SECS=1 pinger > /tmp/ping.$$ & OPID="$!" while :; do if [ "${SECS}X" = "X" ]; then SECS=1 fi echo Top of ping check loop -SECS=${SECS} >> /dev/stderr if [ "${SECS}" -ge 5 ]||[ -s /tmp/ping.$$ ]; then sleep 1 COUNT=`cat /tmp/ping.$$` echo Got $COUNT for count >> /dev/stderr # Had to start checking for COUNT to be unset # If the pinger() is unable to fork due to system # resource saturation, we never made it out of # this loop. if [ "${COUNT}X" = "X" ]||[ "${COUNT}" -lt 1 ]; then echo "Joggling ethernet adapter..." >> /dev/stderr uplink post break else echo Looking good... >> /dev/stderr break fi else SECS=`echo $SECS + 1 |bc` sleep 1 fi done kill -KILL ${OPID} >> /dev/null 2>&1 > /tmp/ping.$$ done ------->8 Snip! 8<---------------------------------- ============================================ Mark Cox | UNIX sysadmin | i-drive.com T: 415.551.2307 | F: 415.551.7599 | E: mark@idrive.com | I: www.idrive.com/mark ============================================ "Top Five Applications on the Web" CNet.com Save, Access & Share at www.idrive.com -----Original Message----- From: Daniel Soto Alvarez [mailto:danisoto@uol.es] Sent: Tuesday, July 25, 2000 9:59 AM To: eepro100@scyld.com Cc: mark@idrive.com Subject: RE: [eepro100] eepro problem: Transmit timed out... Hi Mark! >I have the same problem. I have not received positive responses from this >list about it. Currently I automate a ping check on the server's default >router. When it fails for 5 seconds I ifconfig the card down, unload the >module, reload the module, and re-ifconfig it. Makes for a crappy fix, but I >have yet to see a better one. You can put your config/program/shell-script of this? Anyone can help me in this TUX-Group. I have the same problem with a EEPRO100+ and a 3COM 905B in my Linux server, and I like a simple, and robust, solution. Thanks! From RKrawl@microtest.com Tue, 25 Jul 2000 13:13:50 -0700 Date: Tue, 25 Jul 2000 13:13:50 -0700 From: Krawl, Roeland RKrawl@microtest.com Subject: [eepro100] RE: card reports no resources / RX buffers This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01BFF674.DF100D60 Content-Type: text/plain Andrey, Since our solution to this problem has been rock solid, I have not examined the eepro100 driver or read the chip documentation since March. However I clearly recollect that the "wait_for_cmd_done()" routine actually waits for command acceptance, not command completion. Are you absolutely sure that the contents of the System Control Block General Pointer can be harmlessly overwritten after the chip has accepted the command but not yet executed it? I saw evidence to the contrary. The driver does not "wait for command done" before overwriting the System Control Block General Pointer in preparation for issuing the next command. I have heard that you already have a fix for this problem. That is great. No more dialog is necessary. The Linux community should be grateful to be rid of this difficult and elusive problem. Roeland Krawl > -----Original Message----- > From: Andrey Savochkin [SMTP:saw@saw.sw.com.sg] > Sent: Monday, July 24, 2000 7:39 PM > To: Krawl, Roeland; 'eepro100@scyld.com' > Subject: Re: card reports no resources / RX buffers > > Hello, > > On Mon, Jul 24, 2000 at 11:33:12AM -0700, Krawl, Roeland wrote: > > The "wait_for_cmd_done()" routine (in the eepro100 driver) falsely gives > the > > impression that the > > routine waits for command completion. After each call to > > "wait_for_cmd_done()" > > we added an additional delay to ensure that the command has been > completed > > before > > changing the contents of the System Control Block General Pointer in > > preparation for the next command. > > You permanently repeat that "wait_for_cmd_done falsely gives the > impression > that the routine waits for command completion". > I don't understand your point. > It waits for the chip to decode the command and be ready to accept new > one. > Linux driver does the wait as any other driver (Intel, BSD). > > If you think that something is done wrong, could you point it out? > I definitely don't believe that arbitrary delays inserted in different > places > in the driver fix any problem. Any delay must be properly justified by > the > statements from documentation and/or examples from other existing drivers. > > Best regards > Andrey V. > Savochkin ------_=_NextPart_001_01BFF674.DF100D60 Content-Type: text/html Content-Transfer-Encoding: quoted-printable RE: card reports no resources / RX buffers

Andrey,

Since our solution = to this problem has been rock solid, I have not examined the eepro100 = driver or read the chip documentation since March. However I clearly = recollect that the "wait_for_cmd_done()" routine actually = waits for command acceptance, not command completion. Are you = absolutely sure that the contents of the System Control Block General Pointer can be harmlessly = overwritten after the chip has accepted the command but not yet = executed it? I saw evidence to the contrary. The driver does not = "wait for command done" before overwriting the System Control = Block General Pointer in preparation for issuing the next = command.

I have heard that you already have a = fix for this problem. That is great. No more dialog is necessary. The = Linux community should be grateful to be rid of this difficult and = elusive problem.


Roeland Krawl



    -----Original Message-----
    From:   Andrey Savochkin = [SMTP:saw@saw.sw.com.sg]
    Sent:   Monday, July 24, 2000 7:39 PM
    To:     Krawl, Roeland; 'eepro100@scyld.com'
    Subject:       = Re: card reports no resources / RX = buffers

    Hello,

    On Mon, Jul 24, 2000 at 11:33:12AM = -0700, Krawl, Roeland wrote:
    > The = "wait_for_cmd_done()" routine (in the eepro100 driver) = falsely gives the
    > impression that the
    >  routine waits for command = completion. After each call to
    > "wait_for_cmd_done()" =
    > we added an additional delay to = ensure that the command has been completed
    > before
    > changing the contents of the = System Control Block General Pointer in
    > preparation for the next = command.

    You permanently repeat that = "wait_for_cmd_done falsely gives the impression
    that the routine waits for command = completion".
    I don't understand your point.
    It waits for the chip to decode the = command and be ready to accept new one.
    Linux driver does the wait as any = other driver (Intel, BSD).

    If you think that something is done = wrong, could you point it out?
    I definitely don't believe that = arbitrary delays inserted in different places
    in the driver fix any problem.  = Any delay must be properly justified by the
    statements from documentation and/or = examples from other existing drivers.

    Best regards
            =         =         =         =         Andrey V.
            =         =         =         =         Savochkin

------_=_NextPart_001_01BFF674.DF100D60-- From linux_play@excite.com Tue, 25 Jul 2000 15:17:45 -0700 (PDT) Date: Tue, 25 Jul 2000 15:17:45 -0700 (PDT) From: Hans Williams linux_play@excite.com Subject: [eepro100] Truncated IP and Unresolved Symbols I have a slackware 7 box with the newest kernel installed (2.2.16). I downloaded the latest eepro100 driver from scyld.com and compiled as directed (including pci-lib). After getting eepro100.o, I did: insmod ./eepro100.o after unloading the current module. However, I get this: unresolved symbol acpi_set_pwr_state unresolved symbol pci_drv_unregister unresolved symbol pci_drv_register However this does not happen when I insmod eepro100.o that came with the 2.2.16 source (from make modules; make modules_install) The reason I'm getting into this is because when I run tcpdump on this box, I get TONS of "truncated-ip missing 22 bytes..." I can access networks, but a lot of these messages come across. The number of bytes varies quite a bit as well. Any help would be greatly appreciated. Any further comments/questions please let me know. Thank You. Hans linux_play@excite.com _______________________________________________________ Say Bye to Slow Internet! http://www.home.com/xinbox/signup.html From saw@saw.sw.com.sg Wed, 26 Jul 2000 10:09:34 +0800 Date: Wed, 26 Jul 2000 10:09:34 +0800 From: Andrey Savochkin saw@saw.sw.com.sg Subject: [eepro100] Re: eepro100 problems Hello, On Tue, Jul 25, 2000 at 12:50:43PM -0400, dschmitz@pp1.usuhs.mil wrote: > I am running a PC164 DEC Alpha with an SRM console and kernel 2.4 > test5-pre3. When I boot the system with an Intel EtherExpress Pro100 > card, the system locks after about a minute or so. When I boot the system That's strange. Could you elaborate about the lockup? Does the whole system hangs or just network? > without any network cards, it boots fine and doesn't lock. The network > driver version I am using is included in the kernel, Revision 1.33 > 2000/05/24 by Andrey V. Savochkkin. I have tried to compile the 1.10a Best regards Andrey V. Savochkin From saw@saw.sw.com.sg Wed, 26 Jul 2000 10:18:34 +0800 Date: Wed, 26 Jul 2000 10:18:34 +0800 From: Andrey Savochkin saw@saw.sw.com.sg Subject: [eepro100] Re: card reports no resources / RX buffers Hello Roeland, On Tue, Jul 25, 2000 at 01:13:50PM -0700, Krawl, Roeland wrote: > Since our solution to this problem has been rock solid, I have not examined > the eepro100 driver or read the chip documentation since March. However I > clearly recollect that the "wait_for_cmd_done()" routine actually waits for > command acceptance, not command completion. Are you absolutely sure that the > contents of the System Control Block General Pointer can be harmlessly > overwritten after the chip has accepted the command but not yet executed it? No, I'm not sure. There are reports about card/driver misbehavior which may be explained as a bad consequence of overwriting the general pointer. So far, all workarounds, including your own, have consisted of an additional delay somewhere. The problems also go away in PIO instead of MMIO mode, which slowdown the operations a bit. The accurate wait for command completion depends on the command. For example, for CUStart command it is waiting for CU leaving idle state. I'm sure that Intel's driver doesn't do it. Nevertheless, I'll take an additional look on all existing drivers to get the idea what may be done for the proper initialization. > I saw evidence to the contrary. The driver does not "wait for command done" > before overwriting the System Control Block General Pointer in preparation > for issuing the next command. > > I have heard that you already have a fix for this problem. That is great. No No, I have only workarounds. > more dialog is necessary. The Linux community should be grateful to be rid > of this difficult and elusive problem. Best regards Andrey V. Savochkin From dschmitz@pp1.usuhs.mil Wed, 26 Jul 2000 08:07:13 -0400 (EDT) Date: Wed, 26 Jul 2000 08:07:13 -0400 (EDT) From: dschmitz@pp1.usuhs.mil dschmitz@pp1.usuhs.mil Subject: [eepro100] Re: eepro100 problems The whole system just stops responding, keyboard and all. This only happens with the Intel EtherExpress Pro100 cards in though, it does not happen when I just have a 3com 509b card or without any network cards. Also, this does not happen with another PC164 system I'm running on an ARC console, booting with MILO. This problem occurs with the SRM console booting with aboot. On Wed, 26 Jul 2000, Andrey Savochkin wrote: > Hello, > > On Tue, Jul 25, 2000 at 12:50:43PM -0400, dschmitz@pp1.usuhs.mil wrote: > > I am running a PC164 DEC Alpha with an SRM console and kernel 2.4 > > test5-pre3. When I boot the system with an Intel EtherExpress Pro100 > > card, the system locks after about a minute or so. When I boot the system > > That's strange. > Could you elaborate about the lockup? > Does the whole system hangs or just network? > > > without any network cards, it boots fine and doesn't lock. The network > > driver version I am using is included in the kernel, Revision 1.33 > > 2000/05/24 by Andrey V. Savochkkin. I have tried to compile the 1.10a > > Best regards > Andrey V. > Savochkin > > _______________________________________________ > eepro100 mailing list > eepro100@scyld.com > http://www.scyld.com/mailman/listinfo/eepro100 > From saw@saw.sw.com.sg Thu, 27 Jul 2000 16:50:31 +0800 Date: Thu, 27 Jul 2000 16:50:31 +0800 From: Andrey Savochkin saw@saw.sw.com.sg Subject: [eepro100] Re: eepro100 problems On Wed, Jul 26, 2000 at 08:07:13AM -0400, dschmitz@pp1.usuhs.mil wrote: > The whole system just stops responding, keyboard and all. This only > happens with the Intel EtherExpress Pro100 cards in though, it does not > happen when I just have a 3com 509b card or without any network cards. > Also, this does not happen with another PC164 system I'm running on an ARC > console, booting with MILO. This problem occurs with the SRM console > booting with aboot. Well, I didn't check the driver on Alpha systems myself, and I don't have many ideas what may be wrong. It is supposed to work. Try to increase the driver verbosity level (via debug= if the driver is a module or changing speedo_debug in the source). Best regards Andrey V. Savochkin From skipper@performance.gr Thu, 27 Jul 2000 12:35:19 +0300 Date: Thu, 27 Jul 2000 12:35:19 +0300 From: Skipper skipper@performance.gr Subject: [eepro100] Device is busy or no resource This is a multi-part message in MIME format. ------=_NextPart_000_0005_01BFF7C7.1FC94560 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, I am using RedHat 6.2. My server is using Intel PRO 100 Server NIC integrated on Intel Desktop = Motherboard D815eea,=20 and Linux does not seems to recognize the card. I have tried to use the 'eepro100.o' but then I get=20 "Device is busy or no resource" during init_module().=20 Any help or pointer to resolve this is greatly appreciated.=20 Thanks in advance. Skipper ------=_NextPart_000_0005_01BFF7C7.1FC94560 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi,

I am using RedHat = 6.2.
 
My server is using Intel PRO 100 Server = NIC=20 integrated on Intel Desktop Motherboard D815eea,
and Linux does not seems to recognize = the=20 card.
I have tried to use the 'eepro100.o' = but then I get=20
"Device is busy or no resource" during init_module().
 
Any help or pointer to resolve this is = greatly=20 appreciated.
 
Thanks in advance.
 
Skipper
------=_NextPart_000_0005_01BFF7C7.1FC94560-- From chris@soma.978.org Sat, 29 Jul 2000 02:03:37 -0700 Date: Sat, 29 Jul 2000 02:03:37 -0700 From: chris chris@soma.978.org Subject: [eepro100] Transmitter Timeout Just as an addednum to Paul's empirical data for the eepro100 transmitter lock-up, here is what I found: I have a quad-eepro (82557 v5) 64-bit PCI card in a PC164 DEC Alpha motherboard ( v1.06 of the driver, linux 2.2.14). . . When any of the ports on the card is connected to a BayNetworks 350T switch ( formerly NetICs ) or x-overed to a DEC-tulip card they function fine, however, when they are x-overed to 3com 905b s the transmitter locks right up about every 15 seconds when straming MPEG video and about every two minues for MP3s (rather annoying while listening to in-a-gagga-davida :) ). The thing that confuses me is if I put a hub between the 905 and the eepro100, the transmitter lock-up persists, however if I put a switch (aka the 350T) between then it functions fine. . . I remember having a LOT of trouble getting any 3com product to auto-negotiate properly with Bay products, so I am starting to think that this problem may be another oddity of 3com's low-level hardware implementation of ethernet seeing Paul's predicament with his 3com switch. . . I am a educated but inexperienced programmer and am willing and able to contribute to a fix for this. . . I would like to avoid duplicating anyone's efforts, so before I take a shot at this, has anyone taken a stab at this, and if so, what can I do to aide them in their efforts? Thanks, Chris From chris@soma.978.org Sun, 30 Jul 2000 06:04:26 -0700 Date: Sun, 30 Jul 2000 06:04:26 -0700 From: chris chris@soma.978.org Subject: [eepro100] Transmitter Timeout -- addednum A quick re-cap of my hardware: * i82557 quad 64-bit PCI (33Mhz) Ethernet card * DEC PC164 Motherboard with 21164 EV56 processor. I've been messing with eepro100 drivers for about 32 hours straight now (with a few hours off for pizza), and as an addednum to my last e-mail, this is what I have tried and found thus far: * The TX-timeout is not dependant on what the card is connected to afterall. Regardless of whether it is connected to a 3c905, Bay 350T, UB 100-tx hub, or tulip card the "TX-timeout" still happens. The timeout just happens a little quicker when connected via X-over to a 905b. . . * All cabling is tried and true on other network cards. * The TX-timout occurs on just about all heavy-traffic. . . the initial (initial meaing the first timeout since boot) timeout takes a little while to happen, but afterwards the successive time-outs come quicker. Here is a quick table of the occurence of the timeouts in regards to the different driver versions: Traffic Driver Version Kernel Version Initial-Timeout(sec) Successive Time-outs(sec) Recovery Time(sec) heavy NFS read/writes 1.06 2.2.14 25-30 8-10 1-2 mpeg streaming vis SAMBA 1.06 2.2.14 35-40 12-15 1-2 HEAVY FTP 1.06 2.2.14 IMMEDIATE 1-2 4-5 telnet/ssh/http 1.06 2.2.14 NONE - - heavy NFS read/writes 1.09 2.2.14 30-45 10-12 8-10 mpeg streaming vis SAMBA 1.09 2.2.14 115-140 15-20 8-10 HEAVY FTP 1.09 2.2.14 IMMEDIATE <1 1-2 telnet/ssh/http 1.09 2.2.14 NONE - - heavy NFS read/writes 1.09 2.2.16 30-45 10-12 8-10 mpeg streaming vis SAMBA 1.09 2.2.16 115-140 15-20 8-10 HEAVY FTP 1.09 2.2.16 IMMEDIATE <1 1-2 telnet/ssh/http 1.09 2.2.16 30minutes ??? a long time. ALL 1.09 2.4.0-test5 N/A* *=OS locks IMMEDIATELY after reaching the eepro100 code when compiled in the kernel, or upon ismod when running as a module with NO ERROR MESSAGES. MESSAGES: On v1.06 of the driver, this is what /var/log/messages says: Jul 25 09:59:12 fosters kernel: eth0: Transmit timed out: status 0050 0000 at 322796/322810 command 000c0000. Jul 25 09:59:12 fosters kernel: eth0: Trying to restart the transmitter... On v1.09 of the driver this is what /var/log/messages says: Jul 30 03:25:26 fosters kernel: eth0: Transmit timed out: status 0050 0c00 at 107640/107670 command 200c0000. BOOT MESSAGE: Jul 29 22:39:31 fosters kernel: eth0: OEM i82557/i82558 10/100 Ethernet at 0x9000, 00:08:C7:91:08:72, IRQ 17. Jul 29 22:39:31 fosters kernel: Board assembly 009542-001, Physical connectors present: RJ45 Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1. Jul 29 22:39:31 fosters kernel: General self-test: passed. Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed. Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed. Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed (0x24c9f043). Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated. Jul 29 22:39:31 fosters kernel: eth1: OEM i82557/i82558 10/100 Ethernet at 0x9800, 00:08:C7:91:08:73, IRQ 24. Jul 29 22:39:31 fosters kernel: Board assembly 009542-001, Physical connectors present: RJ45 Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1. Jul 29 22:39:31 fosters kernel: General self-test: passed. Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed. Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed. Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed (0x24c9f043). Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated. Jul 29 22:39:31 fosters kernel: eth2: OEM i82557/i82558 10/100 Ethernet at 0xa000, 00:08:C7:66:80:F7, IRQ 28. Jul 29 22:39:31 fosters kernel: Board assembly 009545-001, Physical connectors present: RJ45 Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1. Jul 29 22:39:31 fosters kernel: General self-test: passed. Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed. Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed. Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed (0x24c9f043). Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated. Jul 29 22:39:31 fosters kernel: eth3: OEM i82557/i82558 10/100 Ethernet at 0xa800, 00:08:C7:66:80:0F, IRQ 32. Jul 29 22:39:31 fosters kernel: Board assembly 009545-001, Physical connectors present: RJ45 Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1. Jul 29 22:39:31 fosters kernel: General self-test: passed. Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed. Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed. Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed (0x24c9f043). Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated. PCI: There doesn't seem to be any PCI conflicts and I tried both enabling and disabling "PCI quirks" in the kernel with no avail. . . Here is a cat of my /proc/pci: PCI devices found: Bus 0, device 7, function 0: PCI bridge: DEC DC21154 (rev 2). Medium devsel. Fast back-to-back capable. Master Capable. Latency=32. Min Gnt=4. Bus 0, device 8, function 0: Non-VGA device: Intel 82378IB (rev 67). Medium devsel. Master Capable. No bursts. Bus 0, device 9, function 0: VGA compatible controller: Matrox Millennium (rev 1). Medium devsel. Fast back-to-back capable. IRQ 19. Non-prefetchable 32 bit memory at 0x9000000 [0x9000000]. Non-prefetchable 32 bit memory at 0x9800000 [0x9800000]. Bus 0, device 11, function 0: IDE interface: CMD 646 (rev 1). Medium devsel. Fast back-to-back capable. IRQ 21. Master Capable. Late ncy=64. Min Gnt=2.Max Lat=4. I/O at 0x8000 [0x8001]. Bus 1, device 4, function 0: Ethernet controller: Intel 82557 (rev 5). Medium devsel. Fast back-to-back capable. IRQ 17. Master Capable. Late ncy=32. Min Gnt=8.Max Lat=56. Non-prefetchable 32 bit memory at 0xa000000 [0xa000000]. I/O at 0x9000 [0x9001]. Non-prefetchable 32 bit memory at 0xa100000 [0xa100000]. Bus 1, device 5, function 0: Ethernet controller: Intel 82557 (rev 5). Medium devsel. Fast back-to-back capable. IRQ 24. Master Capable. Late ncy=32. Min Gnt=8.Max Lat=56. Non-prefetchable 32 bit memory at 0xa200000 [0xa200000]. I/O at 0x9800 [0x9801]. Non-prefetchable 32 bit memory at 0xa300000 [0xa300000]. Bus 1, device 6, function 0: Ethernet controller: Intel 82557 (rev 5). Medium devsel. Fast back-to-back capable. IRQ 28. Master Capable. Late ncy=32. Min Gnt=8.Max Lat=56. Non-prefetchable 32 bit memory at 0xa400000 [0xa400000]. I/O at 0xa000 [0xa001]. Non-prefetchable 32 bit memory at 0xa500000 [0xa500000]. Bus 1, device 7, function 0: Ethernet controller: Intel 82557 (rev 5). Medium devsel. Fast back-to-back capable. IRQ 32. Master Capable. Late ncy=32. Min Gnt=8.Max Lat=56. Non-prefetchable 32 bit memory at 0xa600000 [0xa600000]. I/O at 0xa800 [0xa801]. Non-prefetchable 32 bit memory at 0xa700000 [0xa700000]. and there doesn't seem to be any IO issues: cat of /proc/ioports: 0060-006f : keyboard 0070-007f : timer 0170-0177 : ide1 01f0-01f7 : ide0 02f8-02ff : serial(auto) 0376-0376 : ide1 03c0-03df : vga+ 03e8-03ef : serial(auto) 03f6-03f6 : ide0 03f8-03ff : serial(auto) 8000-8007 : ide0 8008-800f : ide1 a000000-a00001f : Intel Speedo3 Ethernet a200000-a20001f : Intel Speedo3 Ethernet a400000-a40001f : Intel Speedo3 Ethernet a600000-a60001f : Intel Speedo3 Ethernet TRAIL-N-ERROR: Forcing different interface speeds via mii-diag does not fix anything: 100baseTX-FD -- timeout still occurs 100baseTX-HD -- timeout still occurs 10baseT-FD -- timeout still occurs 10baseT-HD -- timeout still occurs eepro-diag: eepro100-diag.c:v2.02 7/19/2000 Donald Becker (becker@scyld.com) http://www.scyld.com/diag/index.html Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0x9000 . A potential i82557 chip has been found, but it appears to be active. Either shutdown the network, or use the '-f' flag. Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0x9800 . A potential i82557 chip has been found, but it appears to be active. Either shutdown the network, or use the '-f' flag. Index #3: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xa000 . A potential i82557 chip has been found, but it appears to be active. Either shutdown the network, or use the '-f' flag. Index #4: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter at 0xa800 . Chainging MACROS: v1.06: txfifo/rxfifo: changes do nothing TX_RING_SIZE/RX_RINGSIZE: changes do nothing TX_TIMEOUT: Increasing this number decreases the freqency of the timeouts until the number reaches roughly double what it was originally set for, then the interfaces are not usable until an ifdown/ifup v1.09: txfifo/rxfifo: changes do nothing TX_RING_SIZE/RX_RINGSIZE: changes do nothing TX_TIMEOUT: Incresing this number at all makes the interfaces unusable until an ifdown/ifup. Also, I ported the code from v1.09 to v1.06 for the function "static void speedo_tx_timeout(struct net_device *dev)" to see what happens -- the new "hybrid" driver exhibited the characteristics of the v1.09 timeouts. Lastly, changing txqueuelen via ifconfig does nothing. . . Conclusion: v1.06 of the driver seemed to handle the TX timeouts a quicker then v1.09, but in v1.09 they were less frequent. I tried to compile v1.10 and experimental v1.11, but I got all types of compile errors and did not have the motivation to port them to v2.2.16 of the kernel after all my above failures. I have NO IDEA what is causing these TX timeouts. . . if any of the gurus here would be as kind as to aide me in my efforts to figure this out, I would greatly appreciate it! I will grant accounts on the troublesome machine if that will aide in trouble-shooting, and I will code whatever I can if anyone can give me a direction to go in. . . Is there anything special that I have to set in the kernel for 64-bit PCI, BTW? Could the fact that this card is a 64-bit PCI card be the issue? Are there any special parameters that I could try tweaking that are alpha-specific? Thank you for any help!! --Chris From kallol@bugula.fpk.hp.com Sun, 30 Jul 2000 10:41:29 EDT Date: Sun, 30 Jul 2000 10:41:29 EDT From: Kallol Biswas kallol@bugula.fpk.hp.com Subject: [eepro100] Transmitter Timeout -- addednum I don't know about the latest eepro100 driver, but the version I saw had a fundamental design problem, again I will try explain: 82559 prefetches the next command from the command ring, suppose the cmd unit is executing ith command and has has prefetched the next one, i.e. (i+1)th already, driver sets up the the (i+1)th cmd, sets the S bit and sends RESUME, if the CU: *in Suspended state it goes to active state, does not re-read next link ponter(address for i+1th) re-reads the Sbit of of ith command. If the Sbit of ith command is cleared then executes the i+1th otherwise goes back to suspended state. *If CU is active it checks the validity of S bits of next(i+1 th) and present(ith) cmd(PCI cmd 0x6 MR is used to re-read Sbit of a TxCB, I saw it on analyzer). Please note that it does not say it re-analize the next(i+1 th) command but the S bit. So if the i+1 th command was a previously executed say transmit cmd and driver sets up now as a say multicast cmd then the card executes i+1 th cmd with invalid parameters, and the card stall. Our initial version of the 82559 driver would hang on an Itanium processor based system because of this problem, but adding a NOP after a cmd has solved the problem. Now our stress tests run for days without any problem on 82559. Hope I could make this clear, if you have any question please feel free to make a call at 973-443-7469/973-442-0164. I will try to explain as much as I can. Regards, Kallol > > > A quick re-cap of my hardware: > > * i82557 quad 64-bit PCI (33Mhz) Ethernet card > * DEC PC164 Motherboard with 21164 EV56 processor. > > I've been messing with eepro100 drivers for about 32 hours straight now > (with a few hours off for pizza), and as an addednum to my last e-mail, > this is what I have tried and found thus far: > > * The TX-timeout is not dependant on what the card is connected to > afterall. Regardless of whether it is connected to a 3c905, Bay 350T, > UB 100-tx hub, or tulip card the "TX-timeout" still happens. The > timeout just happens a little quicker when connected via X-over to a > 905b. . . > * All cabling is tried and true on other network cards. > * The TX-timout occurs on just about all heavy-traffic. . . the initial > (initial meaing the first timeout since boot) timeout takes a little > while to happen, but afterwards the successive time-outs come > quicker. Here is a quick table of the occurence of the timeouts in > regards to the different driver versions: > > Traffic Driver Version Kernel Version Initial-Timeout(sec) > Successive Time-outs(sec) Recovery Time(sec) > heavy NFS read/writes 1.06 2.2.14 25-30 8-10 1-2 > mpeg streaming vis SAMBA 1.06 2.2.14 35-40 12-15 1-2 > HEAVY FTP 1.06 2.2.14 IMMEDIATE 1-2 4-5 > telnet/ssh/http 1.06 2.2.14 NONE - - > heavy NFS read/writes 1.09 2.2.14 30-45 10-12 8-10 > mpeg streaming vis SAMBA 1.09 2.2.14 115-140 15-20 8-10 > HEAVY FTP 1.09 2.2.14 IMMEDIATE <1 1-2 > telnet/ssh/http 1.09 2.2.14 NONE - - > heavy NFS read/writes 1.09 2.2.16 30-45 10-12 8-10 > mpeg streaming vis SAMBA 1.09 2.2.16 115-140 15-20 8-10 > HEAVY FTP 1.09 2.2.16 IMMEDIATE <1 1-2 > telnet/ssh/http 1.09 2.2.16 30minutes ??? a long > time. > ALL 1.09 2.4.0-test5 N/A* > *=OS locks IMMEDIATELY after reaching the eepro100 code when compiled in > the kernel, or upon ismod when running as a module with NO ERROR > MESSAGES. > > MESSAGES: > > On v1.06 of the driver, this is what /var/log/messages says: > Jul 25 09:59:12 fosters kernel: eth0: Transmit timed out: status 0050 > 0000 at 322796/322810 command 000c0000. > Jul 25 09:59:12 fosters kernel: eth0: Trying to restart the > transmitter... > > On v1.09 of the driver this is what /var/log/messages says: > Jul 30 03:25:26 fosters kernel: eth0: Transmit timed out: status 0050 > 0c00 at 107640/107670 command 200c0000. > > BOOT MESSAGE: > > Jul 29 22:39:31 fosters kernel: eth0: OEM i82557/i82558 10/100 Ethernet > at 0x9000, 00:08:C7:91:08:72, IRQ 17. > Jul 29 22:39:31 fosters kernel: Board assembly 009542-001, Physical > connectors present: RJ45 > Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1. > Jul 29 22:39:31 fosters kernel: General self-test: passed. > Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed. > Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed. > Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed > (0x24c9f043). > Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated. > Jul 29 22:39:31 fosters kernel: eth1: OEM i82557/i82558 10/100 Ethernet > at 0x9800, 00:08:C7:91:08:73, IRQ 24. > Jul 29 22:39:31 fosters kernel: Board assembly 009542-001, Physical > connectors present: RJ45 > Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1. > Jul 29 22:39:31 fosters kernel: General self-test: passed. > Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed. > Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed. > Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed > (0x24c9f043). > Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated. > Jul 29 22:39:31 fosters kernel: eth2: OEM i82557/i82558 10/100 Ethernet > at 0xa000, 00:08:C7:66:80:F7, IRQ 28. > Jul 29 22:39:31 fosters kernel: Board assembly 009545-001, Physical > connectors present: RJ45 > Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1. > Jul 29 22:39:31 fosters kernel: General self-test: passed. > Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed. > Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed. > Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed > (0x24c9f043). > Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated. > Jul 29 22:39:31 fosters kernel: eth3: OEM i82557/i82558 10/100 Ethernet > at 0xa800, 00:08:C7:66:80:0F, IRQ 32. > Jul 29 22:39:31 fosters kernel: Board assembly 009545-001, Physical > connectors present: RJ45 > Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1. > Jul 29 22:39:31 fosters kernel: General self-test: passed. > Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed. > Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed. > Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed > (0x24c9f043). > Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated. > > PCI: > > There doesn't seem to be any PCI conflicts and I tried both enabling and > disabling "PCI quirks" in the kernel with no avail. . . > > Here is a cat of my /proc/pci: > > PCI devices found: > Bus 0, device 7, function 0: > PCI bridge: DEC DC21154 (rev 2). > Medium devsel. Fast back-to-back capable. Master Capable. > Latency=32. > Min Gnt=4. > Bus 0, device 8, function 0: > Non-VGA device: Intel 82378IB (rev 67). > Medium devsel. Master Capable. No bursts. > Bus 0, device 9, function 0: > VGA compatible controller: Matrox Millennium (rev 1). > Medium devsel. Fast back-to-back capable. IRQ 19. > Non-prefetchable 32 bit memory at 0x9000000 [0x9000000]. > Non-prefetchable 32 bit memory at 0x9800000 [0x9800000]. > Bus 0, device 11, function 0: > IDE interface: CMD 646 (rev 1). > Medium devsel. Fast back-to-back capable. IRQ 21. Master > Capable. Late > ncy=64. Min Gnt=2.Max Lat=4. > I/O at 0x8000 [0x8001]. > Bus 1, device 4, function 0: > Ethernet controller: Intel 82557 (rev 5). > Medium devsel. Fast back-to-back capable. IRQ 17. Master > Capable. Late > ncy=32. Min Gnt=8.Max Lat=56. > Non-prefetchable 32 bit memory at 0xa000000 [0xa000000]. > I/O at 0x9000 [0x9001]. > Non-prefetchable 32 bit memory at 0xa100000 [0xa100000]. > Bus 1, device 5, function 0: > Ethernet controller: Intel 82557 (rev 5). > Medium devsel. Fast back-to-back capable. IRQ 24. Master > Capable. Late > ncy=32. Min Gnt=8.Max Lat=56. > Non-prefetchable 32 bit memory at 0xa200000 [0xa200000]. > I/O at 0x9800 [0x9801]. > Non-prefetchable 32 bit memory at 0xa300000 [0xa300000]. > Bus 1, device 6, function 0: > Ethernet controller: Intel 82557 (rev 5). > Medium devsel. Fast back-to-back capable. IRQ 28. Master > Capable. Late > ncy=32. Min Gnt=8.Max Lat=56. > Non-prefetchable 32 bit memory at 0xa400000 [0xa400000]. > I/O at 0xa000 [0xa001]. > Non-prefetchable 32 bit memory at 0xa500000 [0xa500000]. > Bus 1, device 7, function 0: > Ethernet controller: Intel 82557 (rev 5). > Medium devsel. Fast back-to-back capable. IRQ 32. Master > Capable. Late > ncy=32. Min Gnt=8.Max Lat=56. > Non-prefetchable 32 bit memory at 0xa600000 [0xa600000]. > I/O at 0xa800 [0xa801]. > Non-prefetchable 32 bit memory at 0xa700000 [0xa700000]. > > > and there doesn't seem to be any IO issues: cat of /proc/ioports: > > 0060-006f : keyboard > 0070-007f : timer > 0170-0177 : ide1 > 01f0-01f7 : ide0 > 02f8-02ff : serial(auto) > 0376-0376 : ide1 > 03c0-03df : vga+ > 03e8-03ef : serial(auto) > 03f6-03f6 : ide0 > 03f8-03ff : serial(auto) > 8000-8007 : ide0 > 8008-800f : ide1 > a000000-a00001f : Intel Speedo3 Ethernet > a200000-a20001f : Intel Speedo3 Ethernet > a400000-a40001f : Intel Speedo3 Ethernet > a600000-a60001f : Intel Speedo3 Ethernet > TRAIL-N-ERROR: > > Forcing different interface speeds via mii-diag does not fix anything: > 100baseTX-FD -- timeout still occurs > 100baseTX-HD -- timeout still occurs > 10baseT-FD -- timeout still occurs > 10baseT-HD -- timeout still occurs > > eepro-diag: > > eepro100-diag.c:v2.02 7/19/2000 Donald Becker (becker@scyld.com) > http://www.scyld.com/diag/index.html > Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter > at 0x9000 > . > A potential i82557 chip has been found, but it appears to be active. > Either shutdown the network, or use the '-f' flag. > Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter > at 0x9800 > . > A potential i82557 chip has been found, but it appears to be active. > Either shutdown the network, or use the '-f' flag. > Index #3: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter > at 0xa000 > . > A potential i82557 chip has been found, but it appears to be active. > Either shutdown the network, or use the '-f' flag. > Index #4: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter > at 0xa800 > . > > Chainging MACROS: > > v1.06: > txfifo/rxfifo: changes do nothing > TX_RING_SIZE/RX_RINGSIZE: changes do nothing > TX_TIMEOUT: Increasing this number decreases the freqency of the > timeouts until the number reaches roughly double what it was originally > set for, then the interfaces are not usable until an ifdown/ifup > > v1.09: > txfifo/rxfifo: changes do nothing > TX_RING_SIZE/RX_RINGSIZE: changes do nothing > TX_TIMEOUT: Incresing this number at all makes the interfaces unusable > until an ifdown/ifup. > > Also, I ported the code from v1.09 to v1.06 for the function "static > void speedo_tx_timeout(struct net_device *dev)" to see what happens -- > the new "hybrid" driver exhibited the characteristics of the v1.09 > timeouts. > > Lastly, changing txqueuelen via ifconfig does nothing. . . > > Conclusion: > > v1.06 of the driver seemed to handle the TX timeouts a quicker then > v1.09, but in v1.09 they were less frequent. I tried to compile v1.10 > and experimental v1.11, but I got all types of compile errors and did > not have the motivation to port them to v2.2.16 of the kernel after all > my above failures. > > I have NO IDEA what is causing these TX timeouts. . . if any of the > gurus here would be as kind as to aide me in my efforts to figure this > out, I would greatly appreciate it! I will grant accounts on the > troublesome machine if that will aide in trouble-shooting, and I will > code whatever I can if anyone can give me a direction to go in. . . > > Is there anything special that I have to set in the kernel for 64-bit > PCI, BTW? > Could the fact that this card is a 64-bit PCI card be the issue? > Are there any special parameters that I could try tweaking that are > alpha-specific? > > > Thank you for any help!! > > --Chris > > _______________________________________________ > eepro100 mailing list > eepro100@scyld.com > http://www.scyld.com/mailman/listinfo/eepro100 > -- Phone: 973-443-7469 Telnet: 1-443-7469 www.kallolbiswas.com kallol_biswas@hp.com From chris@soma.978.org Mon, 31 Jul 2000 02:30:13 -0700 Date: Mon, 31 Jul 2000 02:30:13 -0700 From: chris chris@soma.978.org Subject: [eepro100] Transmitter Timeout -- addednum Thank you very much, Kallol, I appreciate it! It seems that in the v1.09 driver that the regular->multicast transmit command issue was resolved: In set_rx_mode(): /* Change the command to a NoOp, pointing to the CmdMulti command. */ sp->tx_skbuff[entry] = 0; sp->tx_ring[entry].status = cpu_to_le32(CmdNOp); sp->tx_ring[entry].link = virt_to_le32desc(mc_setup_frm); I could not see the driver accounting for any other cases . . . would you be so kind as to send me a copy of your "modified" driver so that I may see what you did? The code for the eepro100 driver is confusing me a bit. . . the rx ring seems pretty clear and concise: * ethx is discovered and pci_dev is set up * pci_dev functions point to speedo functions, and a pointer is made to struct speedo_private * a ring of RX_RING_SIZE sk_buffs is set up, and for each sk_buff a RxFD->rx_buf_addr is pointed to sk_buff->tail * in speedo_private an array of rx_skbuf[] is set up pointing to the sk_buffs * in speedo_private an array rx_ringp[] is set up pointing to the RxFs * the eepro100 card DMAs the incoming data into the sk_buff->tail pointed to by the RxF * the kernel knows how to deal with the sk_buff and takes the data. * not too shure on how the RxF are marked dirty and dealt with, but that is not the issue. As I had said, stright out of the text-book DMA-oriented driver. . . but I can't figure out for the life of me how the tx ring is dealt with. . . I'm assuming that the data to be sent is held in the same sk_buff structure ring as the recieved data, but I can't even find where the tx_ring[] is set up, nor can I grep out the structure definition for sk_buff. . . . In fact the only functions that I could find that are tx-oriented are speedo_tx_timeout() which only seems to be dealt with how to reset the card on a timeout and is only called by the kernel through the pci_dev->tx_timeout() pointer, speedo_start_xmit() which seems to be only called after the card changes transmit modes, and speedo_tx_buffer_gc() which seems to free dirty tx sk_buffs and increment the packet counter. . .why the driver frees sk_buffs, I also don't understand, because as I see it the ring of sk_buffs are allocated at init time and are marked as "dirty" so that new data is allowed to be put in it. . . . . I appreciate your patience and help in this matter. I am fresh out of college and have never witten a driver for linux before so thus it is a little tricky for me to understand some of the very OS-orinted routines. . . I did write a USB driver for a pure-hardware setup on an Ascend 550 series ATM switch in an internship, but that was very easy seeing that I did not have to deal with an OS. . . . I think after I figure this all out I'm going to go out an find an obscure network card that a linux driver has not been written for and give it a shot from scratch :). . . .recommend any good books? Thanks, Chris From saw@saw.sw.com.sg Mon, 31 Jul 2000 17:30:21 +0800 Date: Mon, 31 Jul 2000 17:30:21 +0800 From: Andrey Savochkin saw@saw.sw.com.sg Subject: [eepro100] Re: Transmitter Timeout -- addednum On Sun, Jul 30, 2000 at 10:41:29AM -0400, Kallol Biswas wrote: > I don't know about the latest eepro100 driver, but the version > I saw had a fundamental design problem, again I will try explain: > 82559 prefetches the next command from the command ring, > suppose the cmd unit is executing ith command and has > has prefetched the next one, i.e. (i+1)th already, driver > sets up the the (i+1)th cmd, sets the S bit and sends RESUME, > if the CU: > *in Suspended state it goes to active state, does not re-read next > link ponter(address for i+1th) re-reads the Sbit of of ith command. > If the Sbit of ith command is cleared then executes the i+1th otherwise > goes back to suspended state. > *If CU is active it checks the validity of S bits of next(i+1 th) > and present(ith) cmd(PCI cmd 0x6 MR is used to re-read Sbit of a TxCB, I saw > it on analyzer). > Please note that it does not say it re-analize the next(i+1 th) command but > the S bit. If I understand right, you state that the hardware reads and caches the command from the (i+1)th slot when it proceeds (i)th even if (i)th descriptor has S bit in it, don't you? If it does so, it's a very broken piece! I don't know what documentation states about TX ring processing, but this policy clearly contradicts the common sense! > So if the i+1 th command was a previously executed say transmit cmd and > driver sets up now as a say multicast cmd then the card executes > i+1 th cmd with invalid parameters, and the card stall. > > Our initial version of the 82559 driver would hang on an Itanium processor > based system because of this problem, but adding a NOP after a > cmd has solved the problem. Now our stress tests run for days without > any problem on 82559. > > Hope I could make this clear, if you have any question please feel > free to make a call at 973-443-7469/973-442-0164. > I will try to explain as much as I can. Best regards Andrey V. Savochkin From kallol@bugula.fpk.hp.com Mon, 31 Jul 2000 8:31:02 EDT Date: Mon, 31 Jul 2000 8:31:02 EDT From: Kallol Biswas kallol@bugula.fpk.hp.com Subject: [eepro100] Re: Transmitter Timeout -- addednum > > If I understand right, you state that the hardware reads and caches the > command from the (i+1)th slot when it proceeds (i)th even if (i)th descriptor > has S bit in it, don't you? > If it does so, it's a very broken piece! > I don't know what documentation states about TX ring processing, but this > policy clearly contradicts the common sense! Why does it contradict common sense? The S bit does not stop prefeteching, you could see it on a PCI logic analyzer. There are many I/O cards also that use the same S bit policy, but those cards also support NOP command, you don't have to put a NOP after each command but after evey cmd with S bit set. > From kallol@bugula.fpk.hp.com Mon, 31 Jul 2000 9:53:44 EDT Date: Mon, 31 Jul 2000 9:53:44 EDT From: Kallol Biswas kallol@bugula.fpk.hp.com Subject: [eepro100] Transmitter Timeout -- addednum Chris, I would be happy to share my modifications, rather the design of my driver if I was allowed to, but the driver is for hp-ux and on Itanium based system. You may change the eepro100 to put a cmd NOP after you set the S bit, probably intel's linux driver already does it. About developing a new driver, you could use the card 3C905CTXM from 3COM they have the programming manual publicly available at: support.3com.com/partners/developer/license.html Bye, Kallol > > I could not see the driver accounting for any other cases . . . would > you be so kind as to send me a copy of your "modified" driver so that I > may see what you did? > > > The code for the eepro100 driver is confusing me a bit. . . the rx ring > seems pretty clear and concise: > > * ethx is discovered and pci_dev is set up > * pci_dev functions point to speedo functions, and a pointer is made to > struct speedo_private > * a ring of RX_RING_SIZE sk_buffs is set up, and for each sk_buff a > RxFD->rx_buf_addr is pointed to sk_buff->tail > * in speedo_private an array of rx_skbuf[] is set up pointing to the > sk_buffs > * in speedo_private an array rx_ringp[] is set up pointing to the RxFs > * the eepro100 card DMAs the incoming data into the sk_buff->tail > pointed to by the RxF > * the kernel knows how to deal with the sk_buff and takes the data. > * not too shure on how the RxF are marked dirty and dealt with, but > that is not the issue. > > As I had said, stright out of the text-book DMA-oriented driver. . . but > I can't figure out for the life of me how the tx ring is dealt with. . . > I'm assuming that the data to be sent is held in the same sk_buff > structure ring as the recieved data, but I can't even find where the > tx_ring[] is set up, nor can I grep out the structure definition for > sk_buff. . . . > > In fact the only functions that I could find that are tx-oriented are > speedo_tx_timeout() which only seems to be dealt with how to reset the > card on a timeout and is only called by the kernel through the > pci_dev->tx_timeout() pointer, speedo_start_xmit() which seems to be > only called after the card changes transmit modes, and > speedo_tx_buffer_gc() which seems to free dirty tx sk_buffs and > increment the packet counter. . .why the driver frees sk_buffs, I also > don't understand, because as I see it the ring of sk_buffs are allocated > at init time and are marked as "dirty" so that new data is allowed to be > put in it. . . . . > > I appreciate your patience and help in this matter. I am fresh out of > college and have never witten a driver for linux before so thus it is a > little tricky for me to understand some of the very OS-orinted routines. > . . I did write a USB driver for a pure-hardware setup on an Ascend 550 > series ATM switch in an internship, but that was very easy seeing that I > did not have to deal with an OS. . . . > > I think after I figure this all out I'm going to go out an find an > obscure network card that a linux driver has not been written for and > give it a shot from scratch :). . . .recommend any good books? > > Thanks, > Chris > > _______________________________________________ > eepro100 mailing list > eepro100@scyld.com > http://www.scyld.com/mailman/listinfo/eepro100 > -- Phone: 973-443-7469 Telnet: 1-443-7469 www.kallolbiswas.com kallol_biswas@hp.com