[eepro100] Transmit timed out with high Tx load

Donald Becker becker@scyld.com
Sun Feb 3 23:57:00 2002


On Thu, 31 Jan 2002, Andrew Pam wrote:

> I have a router with six Intel PCI EtherExpress Pro100 adapters,
> eth0 through eth5, interrupts as follows:
> 
> eth0 IRQ5, eth1 IRQ12, eth2 IRQ10, eth3 IRQ11, eth4 IRQ5, eth5 IRQ12
...
> The system is RedHat 7.2 with kernels 2.4.9-7 and 2.4.17.  eth4 is not
> presently in use, and IRQ5 is also shared with USB.  eth1,2,3 and eth5
> have no problems whatsoever even under fairly heavy load.  eth0 however
> constantly has transmit timeouts and errors, regardless of whether the
> usb driver module is loaded or not.
> 
> With the stock eepro100 driver from kernels 2.4.9 and 2.4.17 (v1.09j-t)
> the following errors are logged:
> 
> Jan 31 13:23:56 statistix kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Jan 31 13:23:56 statistix kernel: eth0: Transmit timed out: status ffff  ffff at
>  9179585/9179613 command 0001a000.

This status (0xffff) indicates that the eepro100 device is not
responding.  It might be powered off, or the PCI address decoding isn't
working correctly.

> I compiled and installed the latest v1.19 drivers from www.sycld.com
> and now get the following errors:
> 
> Jan 31 15:48:57 statistix kernel: Command 00ff was not immediately accepted, 100
> 01 ticks!

This indicates a similar problem.

> Jan 31 15:49:01 statistix kernel: eth0: IRQ 5 is physically blocked! Failing bac
> k to low-rate polling.

This is misleading -- the value 0xffff looks as if the chip is trying to
raise an interrupt.

We will need more info to track this down.


Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993