[eepro100] wait_for_cmd_done timeout

Donald Becker becker@scyld.com
Tue Mar 5 14:58:01 2002


On Tue, 5 Mar 2002, Wilson, John wrote:

> I am seeing a problem with wait_for_cmd_done that is very similar to timeout
> issue that I found on GeoCrawler.
...
> In summary: It appears to me that the network is being flooded with ICMP
> traffic (and possibly other traffic) and that the eepro100 may not be
> handling the errors/traffic. (I'm new to Linux device drivers, so please
> bear with me here).

There are a bunch of errors reported here.  The device driver does not
cause the errors -- it only reports them.

> I'm running:
> 	RH 7.2
> 	Kernel 2.4.9-13 modified to support the ATM device drivers (eni and
> FORE (Marconi))
> 	ATM on Linux support software: linux-atm-2.4.0
> 	Samba
...
> Mar  5 09:05:14 sla2 kernel: eepro100: wait_for_cmd_done timeout!
> Mar  5 09:05:46 sla2 last message repeated 24 times
> Mar  5 09:05:48 sla2 last message repeated 3 times
> Mar  5 09:05:49 sla2 kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Mar  5 09:05:49 sla2 kernel: eth0: Transmit timed out: status 0050  0c80 at
> 48699/48728 command 00030000.

You should run eepro100-diag to see more chip status information.
Nothing is obviously wrong from this report.

> Mar  5 09:06:22 sla2 kernel: eni(itf 0): TX DMA full
> Mar  5 09:06:23 sla2 last message repeated 7 times
> Mar  5 09:06:24 sla2 kernel: eni(itf 0): TX DMA full
>
> At this point both the eth0 interface and atm0 interface stop working.  Note
> that the eepro100 times out first and then the eni driver also dies with TX
> DMA full error.

Yup.  That indicates that there is a system problem that affects both
devices.

> ifconfig shows:

> eth0      Link encap:Ethernet  HWaddr 00:50:8B:D3:92:7C  
>           RX packets:504622 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:47444 errors:289 dropped:0 overruns:0 carrier:0
>           collisions:1416 txqueuelen:100 
>           RX bytes:50644863 (48.2 Mb)  TX bytes:10479503 (9.9 Mb)

> Note the collisions are on eth0.

What type of link partner?  What does 'mii-diag' or 'eepro100-diag -m'
report?

> I wanted to point out that the eepro100 is timing out and is effecting the
> ATM device driver too.

That's not likely what is happening.  While the eepro100 driver is
encountering a problem that causes a timeout, the system workload is
reduced.  Even so, the ATM device driver is reporting a problem.  It
seems more likely that both problems are caused by a third source.

> The eepro100 version is:
> "eepro100.c:v1.09j-t 9/29/99 Donald Becker
> http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html\n"

Grrr, they still refuse to update the URL.

> "eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
> <saw@saw.sw.com.sg> and others\n";
> 
> I know there is a lot of info here, but after reading the thread on the
> wait_for_cmd_done, I thought this might shed some light on the problem and
> that it may not be confined to the newer/experimental kernels.
> 
> Any help would be much appreciated.

Have you tried the driver from
   http://www.scyld.com/network/eepro100.html
      ftp://www.scyld.com/pub/network/eepro100.c

It might not solve the system problem, but it is more likely to report
useful diagnostic information.

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993