More DFE-570TX Tx Hung Problems

Donald Becker becker@scyld.com
Mon Apr 17 19:06:25 2000


On Mon, 17 Apr 2000, Tim Dixon wrote:

> Subject: More DFE-570TX Tx Hung Problems
> 
> We'd been having a number of problems with certain ports reporting "Tx
> Hung", but these seemed to go away with the 0.91g variant that attempted to
> add "INTR MITIGATION". But they didn't - they still occur, only less
> frequently.

I suspect that interrupt mitigation only minimized the rapid-fire interrupts
that triggered the real problem.

> We're still running Red Hat 5.x - hence Kernel 2.0.34 for a
> variety of reasons....

2.0.30 and 2.0.34 were very stable and reliable systems.  I still run
uniprocessor systems based on those kernels, and understand that "upgrade to
2.3.half-broken" is not an acceptable answer in most environments.

> When they do occur, it's quite drastic because an ifconfig up/down doesn't
> restore the interface to working order, only a hardware reboot does the job
> (which tends to inconvenience people attached to other ports!).

Hmmm, that's an important point.  Are you using modules?  If so, so removing
and reinserting the module restore operation?  (I suspect not.)

> Index #8: Found a Digital DS21143 Tulip adapter at 0xbc80.
>  Port selection is MII, half-duplex.
>  Transmit started, Receive started, half-duplex.
>   The Rx process state is 'Suspended -- no Rx buffers'.
>   The Tx process state is 'Idle'.
>   The transmit threshold is 128.
>  Interrupt sources are pending!  CSR5 is f06980c7.

This is curious: the hardware thinks that it's raising an interrupt.

>    Tx done indication.
>    Tx complete indication.
>    Tx out of buffers indication.
>    Rx Done indication.
>    Receiver out of buffers indication.

These are all consistent with the interrupt handler not being run.

> Index #8: Found a Digital DS21143 Tulip adapter at 0xbc80.
> Digital DS21143 Tulip chip registers at 0xbc80:
>   f8a08000 ffffffff ffffffff 00ffd028 00ffd228 f06980c7 b20e2002 fbfffbff
>   e0000002 ffffcbf8 ffffffff 00000000 000000c6 ffff0000 fff80000 8ff10000

The mask register, CSR7, is 0xfbfffbff, which is the normal setting for a
21143.  (The setting is actually 0x0801fbff and the unimplemented bits read
back as '1'.)  You are *not* encountering a situation where the driver has
disabled the hardware because of too high of an interrupt rate.  Instead
something else is stopping the interrupt.

Do any of the interfaces on the 4-port card work when this happens?
See if the interrupt count in /proc/interrupts is not increasing when
this happens.

The v0.92 driver on ftp.scyld.com improves the interrupt mitigation setting,
but this change will just further reduce the symptoms, not remove the source.

Donald Becker				becker@scyld.com
Scyld Computing Corporation
410 Severn Ave. Suite 210
Annapolis MD 21403


-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-tulip-request@beowulf.org