[tulip] RE: problem with tulip card ceasing to function - requires ifdown/ifup to fix (Dani Roisman)

Moti Haimovsky motih@cisco.com
Wed Jan 16 06:00:03 2002


My name is Moti Haimovsky and I've worked as a software engineer in the
group
that developed the 2114x family of chips in Digital Semiconductor.
My experience with the 2114x family started from the early days of the 21040
device and ending with the 21143. As such I'm willing to take the heat (and
hit) from
users of those products answering device related questions and more ...

:) Thanks
 motih@cisco.com

regarding Dani Roisman problem with tulip card ceasing to function -
requires ifdown/ifup to fix :
The registers dump of the tulip device at 0xc800:
 CSR5 val is: f0680000
 CSR7 val is: fbfffbff
 The Rx process state is 'Suspended -- no Rx buffers'.

Although the RX state reports it found no free rx buffer the RU (receive
buffer unavailable)
interrupt is not set in CSR5 which suggests that this interrupt had been
acknowledged by the driver
but still the receive process did not see a free descriptor to work with. in
such situation (where
RU interrupt was set by tulip) no further RI nor RU interrupts will be
initiated by the chip until
a chip-owned  RX descriptor is recognized by the chip.
The philosophy behind this strange behavior (if I remember correctly) is
that RU situation indicates a
 busy system, and if the chip will keep interrupting the system it will only
make things worse.

This may very well happen in the following code located in tulip driver ISR
when rx > maxrx:
1.  In a heavily loaded system it may happen that the following code will be
activated:
		if (tx > maxtx || rx > maxrx || oi > maxoi) {
			if (tulip_debug > 1)
				printk(KERN_WARNING "%s: Too much work during an interrupt, "
					   "csr5=0x%8.8x. (%lu) (%d,%d,%d)\n", dev->name, csr5, tp->nir, tx,
rx, oi);

                       /* Acknowledge all interrupt sources. */
                        outl(0x8001ffff, ioaddr + CSR5);
                        if (tp->flags & HAS_INTR_MITIGATION) {
                     /* Josip Loncaric at ICASE did extensive
experimentation
			to develop a good interrupt mitigation setting.*/
                                outl(0x8b240000, ioaddr + CSR11);
                        } else {
                          /* Mask all interrupting sources, set timer to
				re-enable. */
                                outl(((~csr5) & 0x0001ebef) | AbnormalIntr |
TimerInt, ioaddr + CSR7);
                                outl(0x0012, ioaddr + CSR11);
                        }
			break;
		}

Suppose that RU interrupt is set by the tulip device just prior to
performing the above section
then the command outl(0x8001ffff, ioaddr + CSR5) will clear RU silencing our
receive path for good.

2. When no skbufs are available for refilling the RX ring.
 Chip sets CSR5 RU, driver acknowledges it

	/* Let's see whether the interrupt really is for us */
	csr5 = inl(ioaddr + CSR5);

	if ((csr5 & (NormalIntr|AbnormalIntr)) == 0)
		return;

	tp->nir++;

	do {
		/* Acknowledge all of the current interrupt sources ASAP. */
		outl(csr5 & 0x0001ffff, ioaddr + CSR5);
		if (csr5 & (RxIntr | RxNoBuf)) {
			rx += tulip_rx(dev);
			tulip_refill_rx(dev);
		}

tulip_refill_rx(dev) - failed getting SKBs and the rest is history.




Why ifdown eth0 ; ifup eth0  help?
The RX ring is reinitialized again making the tulip a happy chip.
This may suggest that the no-skbuf scenario is the less probable one.


Hope I didn't mess thigs up
 Moti.

-----Original Message-----
From: tulip-admin@scyld.com [mailto:tulip-admin@scyld.com]On Behalf Of
tulip-request@scyld.com
Sent: Tuesday, January 15, 2002 9:05 AM
To: tulip@scyld.com
Subject: tulip digest, Vol 1 #455 - 3 msgs


Send tulip mailing list submissions to
	tulip@scyld.com

To subscribe or unsubscribe via the World Wide Web, visit
	http://www.scyld.com/mailman/listinfo/tulip
or, via email, send a message with subject or body 'help' to
	tulip-request@scyld.com

You can reach the person managing the list at
	tulip-admin@scyld.com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of tulip digest..."


Today's Topics:

   1. Re: problem with tulip card ceasing to function - requires ifdown/ifup
to fix (Dani Roisman)
   2. 2.4.17 tulip multiport-patch (Christoph Dworzak)
   3. Re: 2.4.17 tulip multiport-patch (Donald Becker)

--__--__--

Message: 1
Date: Mon, 14 Jan 2002 11:52:47 -0800
From: Dani Roisman <dani-post@roisman.com>
To: tulip@scyld.com
Subject: [tulip] Re: problem with tulip card ceasing to function - requires
ifdown/ifup to fix

... following up my problems from 11/2001 ....

We had a couple recurrances of the problem - where one interface on a DLINK
DFE-570TX hangs, and requires an ifdown eth0 ; ifup eth0 to get going again.

This time, I was able to get on and run some tulip-diag's before bringing up
the interface.
I know that Donlad B. wanted to see these.  First the detection messages
(FYI, since
I'm only using eth0 and eth1, I've cut out messages for the other 2
interfaces to keep this
email a bit shorter).

I'll take suggestions, including what I should run next time to offer better
troubleshooting
information.

Thank you!

from dmesg:
eth0: Digital DS21143-xD Tulip rev 65 at 0xc800, 00:80:C8:B9:98:4D, IRQ 12.
eth0:  EEPROM default media type Autosense.
eth0:  Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block.
eth0:  MII transceiver #1 config 3100 status 7869 advertising 01e1.
eth1: Digital DS21143-xD Tulip rev 65 at 0xc400, 00:80:C8:B9:98:4E, IRQ 5.
eth1:  EEPROM default media type Autosense.
eth1:  Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block.
eth1:  MII transceiver #1 config 3100 status 7869 advertising 01e1.
<snip>
tulip.c:v0.92t 1/15/2001  Written by Donald Becker <becker@scyld.com>
  http://www.scyld.com/network/tulip.html

# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:80:C8:B9:98:4D
          inet addr:WW.XX.YY.11  Bcast:WW.XX.YY.15  Mask:255.255.255.248
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2147483647 errors:0 dropped:94727 overruns:0 frame:0
          TX packets:2147483647 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:12 Base address:0xc800

# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:80:C8:B9:98:4E
          inet addr:WW.XX.YY.1  Bcast:WW.XX.YY.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2147483647 errors:0 dropped:3 overruns:0 frame:0
          TX packets:2147483647 errors:1 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:5 Base address:0xc400


# tulip-diag -aa
tulip-diag.c:v2.06 1/8/2001 Donald Becker (becker@scyld.com)
 http://www.scyld.com/diag/index.html
Index #1: Found a Digital DS21143 Tulip adapter at 0xc800.
 * A potential Tulip chip has been found, but it appears to be active.
 * Either shutdown the network, or use the '-f' flag to see all values.
Digital DS21143 Tulip chip registers at 0xc800:
 0x00: f8a08000 ffffffff ffffffff 07fee800 07feea00 f0680000 b20e2202
fbfffbff
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Suspended -- no Rx buffers'.
  The Tx process state is 'Idle'.
  The transmit threshold is 128.
  The NWay status register is 000000c6.
Index #2: Found a Digital DS21143 Tulip adapter at 0xc400.
 * A potential Tulip chip has been found, but it appears to be active.
 * Either shutdown the network, or use the '-f' flag to see all values.
Digital DS21143 Tulip chip registers at 0xc400:
 0x00: f8a08000 ffffffff ffffffff 07fee000 07fee200 f0660000 b20e6202
fbfffbff
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Waiting for packets'.
  The Tx process state is 'Idle'.
  The transmit threshold is 256.
  The NWay status register is 000000c6.

# tulip-diag -mm
tulip-diag.c:v2.06 1/8/2001 Donald Becker (becker@scyld.com)
 http://www.scyld.com/diag/index.html
Index #1: Found a Digital DS21143 Tulip adapter at 0xc800.
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Suspended -- no Rx buffers'.
  The Tx process state is 'Idle'.
  The transmit threshold is 128.
  The NWay status register is 000000c6.
 MII PHY found at address 1, status 0x786d.
 MII PHY #1 transceiver registers:
   1100 786d 2000 5c10 01e1 41e1 0007 2801
   0000 0000 0000 0000 0000 0000 0000 0000
   0a25 0000 0000 0000 0000 0000 0020 0000
   0080 0001 00a3 0100 0006 0f00 0000 0000.
 Basic mode control register 0x1100: Auto-negotiation enabled.
 Basic mode status register 0x786d ... 786d.
   Link status: established.
   Capable of  100baseTx-FD 100baseTx 10baseT-FD 10baseT.
   Able to perform Auto-negotiation, negotiation complete.
 Vendor ID is 08:00:17:--:--:--, model 1 rev. 0.
   No specific information is known about this transceiver type.
 I'm advertising 01e1: 100baseTx-FD 100baseTx 10baseT-FD 10baseT
   Advertising no additional info pages.
   IEEE 802.3 CSMA/CD protocol.
 Link partner capability is 41e1: 100baseTx-FD 100baseTx 10baseT-FD 10baseT.
   Negotiation  completed.
  Internal autonegotiation state is 'Autonegotiation disabled'.
Index #2: Found a Digital DS21143 Tulip adapter at 0xc400.
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Waiting for packets'.
  The Tx process state is 'Idle'.
  The transmit threshold is 256.
  The NWay status register is 000000c6.
 MII PHY found at address 1, status 0x786d.
 MII PHY #1 transceiver registers:
   1100 786d 2000 5c10 01e1 41e1 0007 2801
   0000 0000 0000 0000 0000 0000 0000 0000
   0a25 0000 0000 0000 0000 0000 0020 0000
   0080 0001 00a3 0100 0006 0f00 0000 0000.
 Basic mode control register 0x1100: Auto-negotiation enabled.
 Basic mode status register 0x786d ... 786d.
   Link status: established.
   Capable of  100baseTx-FD 100baseTx 10baseT-FD 10baseT.
   Able to perform Auto-negotiation, negotiation complete.
 Vendor ID is 08:00:17:--:--:--, model 1 rev. 0.
   No specific information is known about this transceiver type.
 I'm advertising 01e1: 100baseTx-FD 100baseTx 10baseT-FD 10baseT
   Advertising no additional info pages.
   IEEE 802.3 CSMA/CD protocol.
 Link partner capability is 41e1: 100baseTx-FD 100baseTx 10baseT-FD 10baseT.
   Negotiation  completed.
  Internal autonegotiation state is 'Autonegotiation disabled'.

# tulip-diag -ee
tulip-diag.c:v2.06 1/8/2001 Donald Becker (becker@scyld.com)
 http://www.scyld.com/diag/index.html
Index #1: Found a Digital DS21143 Tulip adapter at 0xc800.
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Suspended -- no Rx buffers'.
  The Tx process state is 'Idle'.
  The transmit threshold is 128.
  The NWay status register is 000000c6.
EEPROM 64 words, 6 address bits.
PCI Subsystem IDs, vendor 1186, device 1112.
CardBus Information Structure at offset 00000000.
Ethernet MAC Station Address 00:80:C8:B9:98:4D.
EEPROM transceiver/media description table.
Leaf node at offset 30, default media type 0800 (Autosense).
 1 transceiver description blocks:
  Media MII, block type 3, length 13.
   MII interface PHY 0 (media type 11).
   21143 MII initialization sequence is 0 words:.
   21143 MII reset sequence is 0 words:.
    Media capabilities are 7800, advertising 01e1.
    Full-duplex map 5000, Threshold map 1800.
    No MII interrupt.
EEPROM contents (64 words):
0x00:  1186 1112 0000 0000 0000 0000 0000 0000
0x08:  0067 0103 8000 b9c8 4d98 1e00 0000 0800
0x10:  8d01 0003 0000 7800 01e0 5000 1800 0000
0x18:  0000 0000 0000 0000 0000 0000 0000 0000
0x20:  0000 0000 0000 0000 0000 0000 0000 0000
0x28:  0000 0000 0000 0000 0000 0000 0000 0000
0x30:  0000 0000 0000 0000 0000 0000 0000 0000
0x38:  0000 0000 0000 0000 0000 0000 0000 19ed
 ID block CRC 0x67 (vs. 0x67).
  Full contents CRC 0x19ed (read as 0x19ed).
 MII PHY found at address 1, status 0x786d.
  Internal autonegotiation state is 'Autonegotiation disabled'.
Index #2: Found a Digital DS21143 Tulip adapter at 0xc400.
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Waiting for packets'.
  The Tx process state is 'Idle'.
  The transmit threshold is 256.
  The NWay status register is 000000c6.
EEPROM 64 words, 6 address bits.
PCI Subsystem IDs, vendor 1186, device 1112.
CardBus Information Structure at offset 00000000.
Ethernet MAC Station Address 00:80:C8:B9:98:4E.
EEPROM transceiver/media description table.
Leaf node at offset 30, default media type 0800 (Autosense).
 1 transceiver description blocks:
  Media MII, block type 3, length 13.
   MII interface PHY 0 (media type 11).
   21143 MII initialization sequence is 0 words:.
   21143 MII reset sequence is 0 words:.
    Media capabilities are 7800, advertising 01e1.
    Full-duplex map 5000, Threshold map 1800.
    No MII interrupt.
EEPROM contents (64 words):
0x00:  1186 1112 0000 0000 0000 0000 0000 0000
0x08:  0067 0103 8000 b9c8 4e98 1e00 0000 0800
0x10:  8d01 0003 0000 7800 01e0 5000 1800 0000
0x18:  0000 0000 0000 0000 0000 0000 0000 0000
0x20:  0000 0000 0000 0000 0000 0000 0000 0000
0x28:  0000 0000 0000 0000 0000 0000 0000 0000
0x30:  0000 0000 0000 0000 0000 0000 0000 0000
0x38:  0000 0000 0000 0000 0000 0000 0000 8fed
 ID block CRC 0x67 (vs. 0x67).
  Full contents CRC 0x8fed (read as 0x8fed).
 MII PHY found at address 1, status 0x786d.
  Internal autonegotiation state is 'Autonegotiation disabled'.


# tulip-diag -e
tulip-diag.c:v2.06 1/8/2001 Donald Becker (becker@scyld.com)
 http://www.scyld.com/diag/index.html
Index #1: Found a Digital DS21143 Tulip adapter at 0xc800.
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Suspended -- no Rx buffers'.
  The Tx process state is 'Idle'.
  The transmit threshold is 128.
  The NWay status register is 000000c6.
EEPROM 64 words, 6 address bits.
PCI Subsystem IDs, vendor 1186, device 1112.
CardBus Information Structure at offset 00000000.
Ethernet MAC Station Address 00:80:C8:B9:98:4D.
EEPROM transceiver/media description table.
Leaf node at offset 30, default media type 0800 (Autosense).
 1 transceiver description blocks:
  Media MII, block type 3, length 13.
   MII interface PHY 0 (media type 11).
   21143 MII initialization sequence is 0 words:.
   21143 MII reset sequence is 0 words:.
    Media capabilities are 7800, advertising 01e1.
    Full-duplex map 5000, Threshold map 1800.
    No MII interrupt.
 MII PHY found at address 1, status 0x786d.
  Internal autonegotiation state is 'Autonegotiation disabled'.
Index #2: Found a Digital DS21143 Tulip adapter at 0xc400.
 Port selection is MII, full-duplex.
 Transmit started, Receive started, full-duplex.
  The Rx process state is 'Waiting for packets'.
  The Tx process state is 'Idle'.
  The transmit threshold is 256.
  The NWay status register is 000000c6.
EEPROM 64 words, 6 address bits.
PCI Subsystem IDs, vendor 1186, device 1112.
CardBus Information Structure at offset 00000000.
Ethernet MAC Station Address 00:80:C8:B9:98:4E.
EEPROM transceiver/media description table.
Leaf node at offset 30, default media type 0800 (Autosense).
 1 transceiver description blocks:
  Media MII, block type 3, length 13.
   MII interface PHY 0 (media type 11).
   21143 MII initialization sequence is 0 words:.
   21143 MII reset sequence is 0 words:.
    Media capabilities are 7800, advertising 01e1.
    Full-duplex map 5000, Threshold map 1800.
    No MII interrupt.
 MII PHY found at address 1, status 0x786d.
  Internal autonegotiation state is 'Autonegotiation disabled'.


--__--__--

Message: 2
Date: Tue, 15 Jan 2002 02:04:13 +0100
From: Christoph Dworzak <linuxkernel@amazing.ch>
To: linux-kernel@vger.kernel.org
Cc: tulip@scyld.com
Subject: [tulip] 2.4.17 tulip multiport-patch

Hi

After lots of Headscratching, I found this little bug:

replace line 1642 of tulip_core.c:

		    irq = last_irq;

with

		    dev->irq = irq = last_irq;


It's a hack for Multiport-NICs where only the first one contains
an EEPROM (I have a Adaptec ANA-6944A/TX). It puts other ports
on the same irq as the first one, but it forgot to actually set
it in the dev-structure...

With this correction my Firewall works like a champ now (before
it crashed immediately when activating the second port of the
multiport-nic).


While searching for this Bug, I also tried the de4x5-driver.
It worked, but with troubles. It sets the MAC-Address of all
the other ports to the MAC of the first port + 1. ALL of them
to the same MAC!
I tried to find out why, but I didn't find this in the code
(I found where it adds this 1, but didn't see why it doesn't
increase it further for further ports...)



Don't know if this is related:
While using the de4x5-driver, my system-load climbed steadily
up. After 3 Days it was at 99.5%.

Top didn't show any processes using this time, but the Computer
was reaaaaaly slow (pressing a key took several seconds until
it appeared on the console...)

This was repeatable. -> Reboot every other day :(

If this happens again, how do I find out what's using the CPU?
(I tried top, vmstat, free, but nothing unusual showed up beside
the system-% in top).


bye
 dworz

Config (two Computers A and B):
A Amd-k6/300, 64MB
B Dual PIII-600, 512MB
both with ANA-6944A/TX + 2 other tulips
both RH7.2 with all updates as of 1.1.02
kernel 2.4.9-13 (the tulip-bug is still in 2.4.18pre3)

Computer B would not slow down with DE4x5, but maybe it wasn't
running long enough yet...
Both crashed with tulip.

--__--__--

Message: 3
Date: Mon, 14 Jan 2002 21:03:00 -0500 (EST)
From: Donald Becker <becker@scyld.com>
To: Christoph Dworzak <linuxkernel@amazing.ch>
cc: linux-kernel@vger.kernel.org, tulip@scyld.com
Subject: Re: [tulip] 2.4.17 tulip multiport-patch

On Tue, 15 Jan 2002, Christoph Dworzak wrote:

> After lots of Headscratching, I found this little bug:
>
> replace line 1642 of tulip_core.c:
>
> 		    irq = last_irq;
>
> with
>
> 		    dev->irq = irq = last_irq;

Hmmm, a little bit of bad conversion here.  The tulip.c code follows
this section by

     dev->irq = irq;

a few lines later.

> While using the de4x5-driver, my system-load climbed steadily
> up. After 3 Days it was at 99.5%.

Check the interrupt count in /dev/interrupts.

> This was repeatable. -> Reboot every other day :(

Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993



--__--__--

_______________________________________________
tulip mailing list
tulip@scyld.com
http://www.scyld.com/mailman/listinfo/tulip


End of tulip Digest