[vortex] Problems with 3c59x

Donald Becker becker@scyld.com
Fri Dec 13 10:49:00 2002


On Fri, 13 Dec 2002, Lists (lst) wrote:

> On Fri, 6 Dec 2002, Lists (lst) wrote:
> > > Try debug=8, which should show only error packets.
> 
> Now I'm using 3c59x.c:v0.99Xg with option debug=8
...
> Dec 12 17:35:22 lapd 3c59x.c:v0.99Xg 11/27/2002 Donald Becker, becker@scyld.com
> Dec 12 17:35:22 lapd eth0: 3Com 3c905C Tornado at 0xe800,  00:50:da:8a:1a:8b, IRQ 10
> Dec 12 17:35:22 lapd MII transceiver found at address 24, status 782d.
> Dec 12 17:35:22 lapd Using bus-master transmits and whole-frame receives.

...
> This is the normal print (last normal Yesterday at 17:29:28):
> Dec 12 17:29:24 lapd eth0: Media selection timer tick happened, Autonegotiate full duplex.
> Dec 12 17:29:24 lapd eth0: MII transceiver has status 782d.
> Dec 12 17:29:24 lapd eth0: Media selection timer finished, Autonegotiate.
> Dec 12 17:29:28 lapd eth1: Media selection timer tick happened, Autonegotiate full duplex.
> Dec 12 17:29:28 lapd eth1: MII transceiver has status 782d.
> Dec 12 17:29:28 lapd eth1: Media selection timer finished, Autonegotiate.

You should be getting far more information than this.

Try adding this line to /etc/syslog.conf

kern.*		-/var/log/debug

> Yesterday my eth1 was broken. The peer machine (it's an crossover link 
> from my eth1) not responding to my pings. I shutdown the interface and in 
> my kernel logs appears:
> 
> Dec 12 17:29:37 lapd eth1: vortex_close() status e601, Tx status 00.

Bingo!

What were the messages before this one?  Something about "interrupt
blocked"?  What does 'vortex-diag -af' report?  My guess is that
interrupts are posted that are not being handled.

> Dec 12 17:29:37 lapd eth1: vortex close stats: rx_nocopy 1112698 rx_copy 
> 8498224 tx_queued 3 Rx pre-checksummed 9294542.

Hmmm, you are passing a bunch of traffic over this link.

> When i'm trying to bring the interface up again I receive this:
> 
> Dec 12 17:32:34 lapd eth1: Initial media type Autonegotiate half-duplex.
> Dec 12 17:32:34 lapd eth1: MII #24 status 782d, link partner capability 41e1, setting full-duplex.
> Dec 12 17:32:34 lapd eth1: vortex_open() irq 11 media status 8880.
> Dec 12 17:32:34 lapd eth1: Tx Ring full, refusing to send buffer.
> Dec 12 17:32:34 lapd eth1: Tx Ring full, refusing to send buffer.

Ahhh, buglet.  This can never happen in normal use, but it can when the
Tx queue was full before being shut down.

Fixed for the next version.

>  . . . Lots of: eth1: Tx Ring full, refusing to send buffer. :(

What was the status?  Something with the low bit set?

> Dec 12 17:34:31 lapd eth1: vortex_close() status e401, Tx status 00.

Again, interrupts active.

> What can I do? For more than 3 days the machine runs perfect. For up.down 
> operations I using ifconfig.

This appears to be the old-time IOAPIC bug, where interrupts suddently
stop working.  You will have to verify this by running 'vortex-diag -af'
when the interface hangs.

Passing 'noapic' is the only known work-around.

Bogdan, do you know where there is a listing of which APIC bugs were
fixed when?

> I'm running an Andreea A. kernel 2.4.20-aa1 (with some more patches but 
> without efect over the network).

I guess I can't suggest trying a newer kernel ;->.
Ask AA about "APCI interrupt failures".

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993