[eepro100] Transmitter Timeout -- addednum

chris chris@soma.978.org
Sun, 30 Jul 2000 06:04:26 -0700


A quick re-cap of my hardware:

* i82557 quad 64-bit PCI (33Mhz) Ethernet card
* DEC PC164 Motherboard with 21164 EV56 processor.

I've been messing with eepro100 drivers for about 32 hours straight now
(with a few hours off for pizza), and as an addednum to my last e-mail,
this is what I have tried and found thus far:

* The TX-timeout is not  dependant on what the card is connected to
afterall.  Regardless of whether it is connected to a   3c905, Bay 350T,
UB 100-tx hub, or tulip card the "TX-timeout" still happens.  The
timeout just happens a little quicker when   connected via X-over to a
905b. . .
* All cabling is tried and true on other network cards.
* The TX-timout occurs on just about all heavy-traffic. . . the initial
(initial meaing the first timeout since boot) timeout takes a little
while to happen, but afterwards   the successive time-outs come
quicker.  Here is a quick table of the occurence of the timeouts in
regards to the different   driver versions: 

Traffic			 Driver Version	 Kernel Version   Initial-Timeout(sec)
Successive Time-outs(sec)  Recovery Time(sec)
heavy NFS read/writes	 1.06		 2.2.14		  25-30			8-10			   1-2
mpeg streaming vis SAMBA 1.06		 2.2.14		  35-40			12-15 			   1-2
HEAVY FTP		 1.06	         2.2.14		  IMMEDIATE		1-2			   4-5
telnet/ssh/http          1.06		 2.2.14		  NONE			-			   -
heavy NFS read/writes	 1.09		 2.2.14		  30-45			10-12			   8-10
mpeg streaming vis SAMBA 1.09		 2.2.14		  115-140		15-20 			   8-10
HEAVY FTP		 1.09	         2.2.14		  IMMEDIATE		<1			   1-2
telnet/ssh/http          1.09		 2.2.14		  NONE			-			   -
heavy NFS read/writes	 1.09		 2.2.16		  30-45			10-12			   8-10
mpeg streaming vis SAMBA 1.09		 2.2.16		  115-140		15-20 			   8-10
HEAVY FTP		 1.09	         2.2.16		  IMMEDIATE		<1			   1-2
telnet/ssh/http          1.09		 2.2.16		  30minutes		???			   a long
time.
ALL			 1.09	         2.4.0-test5      N/A*
*=OS locks IMMEDIATELY after reaching the eepro100 code when compiled in
the kernel, or upon ismod when running as a module with NO ERROR
MESSAGES.

MESSAGES:

On v1.06 of the driver, this is what /var/log/messages says:
Jul 25 09:59:12 fosters kernel: eth0: Transmit timed out: status 0050 
0000 at 322796/322810 command 000c0000.
Jul 25 09:59:12 fosters kernel: eth0: Trying to restart the
transmitter...

On v1.09 of the driver this is what /var/log/messages says:
Jul 30 03:25:26 fosters kernel: eth0: Transmit timed out: status 0050 
0c00 at 107640/107670 command 200c0000.

BOOT MESSAGE:

Jul 29 22:39:31 fosters kernel: eth0: OEM i82557/i82558 10/100 Ethernet
at 0x9000, 00:08:C7:91:08:72, IRQ 17.
Jul 29 22:39:31 fosters kernel:   Board assembly 009542-001, Physical
connectors present: RJ45
Jul 29 22:39:31 fosters kernel:   Primary interface chip i82555 PHY #1.
Jul 29 22:39:31 fosters kernel:   General self-test: passed.
Jul 29 22:39:31 fosters kernel:   Serial sub-system self-test: passed.
Jul 29 22:39:31 fosters kernel:   Internal registers self-test: passed.
Jul 29 22:39:31 fosters kernel:   ROM checksum self-test: passed
(0x24c9f043).
Jul 29 22:39:31 fosters kernel:   Receiver lock-up workaround activated.
Jul 29 22:39:31 fosters kernel: eth1: OEM i82557/i82558 10/100 Ethernet
at 0x9800, 00:08:C7:91:08:73, IRQ 24.
Jul 29 22:39:31 fosters kernel:   Board assembly 009542-001, Physical
connectors present: RJ45
Jul 29 22:39:31 fosters kernel:   Primary interface chip i82555 PHY #1.
Jul 29 22:39:31 fosters kernel:   General self-test: passed.
Jul 29 22:39:31 fosters kernel:   Serial sub-system self-test: passed.
Jul 29 22:39:31 fosters kernel:   Internal registers self-test: passed.
Jul 29 22:39:31 fosters kernel:   ROM checksum self-test: passed
(0x24c9f043).
Jul 29 22:39:31 fosters kernel:   Receiver lock-up workaround activated.
Jul 29 22:39:31 fosters kernel: eth2: OEM i82557/i82558 10/100 Ethernet
at 0xa000, 00:08:C7:66:80:F7, IRQ 28.
Jul 29 22:39:31 fosters kernel:   Board assembly 009545-001, Physical
connectors present: RJ45
Jul 29 22:39:31 fosters kernel:   Primary interface chip i82555 PHY #1.
Jul 29 22:39:31 fosters kernel:   General self-test: passed.
Jul 29 22:39:31 fosters kernel:   Serial sub-system self-test: passed.
Jul 29 22:39:31 fosters kernel:   Internal registers self-test: passed.
Jul 29 22:39:31 fosters kernel:   ROM checksum self-test: passed
(0x24c9f043).
Jul 29 22:39:31 fosters kernel:   Receiver lock-up workaround activated.
Jul 29 22:39:31 fosters kernel: eth3: OEM i82557/i82558 10/100 Ethernet
at 0xa800, 00:08:C7:66:80:0F, IRQ 32.
Jul 29 22:39:31 fosters kernel:   Board assembly 009545-001, Physical
connectors present: RJ45
Jul 29 22:39:31 fosters kernel:   Primary interface chip i82555 PHY #1.
Jul 29 22:39:31 fosters kernel:   General self-test: passed.
Jul 29 22:39:31 fosters kernel:   Serial sub-system self-test: passed.
Jul 29 22:39:31 fosters kernel:   Internal registers self-test: passed.
Jul 29 22:39:31 fosters kernel:   ROM checksum self-test: passed
(0x24c9f043).
Jul 29 22:39:31 fosters kernel:   Receiver lock-up workaround activated.

PCI:

There doesn't seem to be any PCI conflicts and I tried both enabling and
disabling "PCI quirks" in the kernel with no avail. . .

Here is a cat of my /proc/pci:

PCI devices found:
  Bus  0, device   7, function  0:
    PCI bridge: DEC DC21154 (rev 2).
      Medium devsel.  Fast back-to-back capable.  Master Capable. 
Latency=32.
Min Gnt=4.
  Bus  0, device   8, function  0:
    Non-VGA device: Intel 82378IB (rev 67).
      Medium devsel.  Master Capable.  No bursts.
  Bus  0, device   9, function  0:
    VGA compatible controller: Matrox Millennium (rev 1).
      Medium devsel.  Fast back-to-back capable.  IRQ 19.
      Non-prefetchable 32 bit memory at 0x9000000 [0x9000000].
      Non-prefetchable 32 bit memory at 0x9800000 [0x9800000].
  Bus  0, device  11, function  0:
    IDE interface: CMD 646 (rev 1).
      Medium devsel.  Fast back-to-back capable.  IRQ 21.  Master
Capable.  Late
ncy=64.  Min Gnt=2.Max Lat=4.
      I/O at 0x8000 [0x8001].
  Bus  1, device   4, function  0:
    Ethernet controller: Intel 82557 (rev 5).
      Medium devsel.  Fast back-to-back capable.  IRQ 17.  Master
Capable.  Late
ncy=32.  Min Gnt=8.Max Lat=56.
      Non-prefetchable 32 bit memory at 0xa000000 [0xa000000].
      I/O at 0x9000 [0x9001].
      Non-prefetchable 32 bit memory at 0xa100000 [0xa100000].
  Bus  1, device   5, function  0:
    Ethernet controller: Intel 82557 (rev 5).
      Medium devsel.  Fast back-to-back capable.  IRQ 24.  Master
Capable.  Late
ncy=32.  Min Gnt=8.Max Lat=56.
      Non-prefetchable 32 bit memory at 0xa200000 [0xa200000].
      I/O at 0x9800 [0x9801].
      Non-prefetchable 32 bit memory at 0xa300000 [0xa300000].
  Bus  1, device   6, function  0:
    Ethernet controller: Intel 82557 (rev 5).
      Medium devsel.  Fast back-to-back capable.  IRQ 28.  Master
Capable.  Late
ncy=32.  Min Gnt=8.Max Lat=56.
      Non-prefetchable 32 bit memory at 0xa400000 [0xa400000].
      I/O at 0xa000 [0xa001].
      Non-prefetchable 32 bit memory at 0xa500000 [0xa500000].
  Bus  1, device   7, function  0:
    Ethernet controller: Intel 82557 (rev 5).
      Medium devsel.  Fast back-to-back capable.  IRQ 32.  Master
Capable.  Late
ncy=32.  Min Gnt=8.Max Lat=56.
      Non-prefetchable 32 bit memory at 0xa600000 [0xa600000].
      I/O at 0xa800 [0xa801].
      Non-prefetchable 32 bit memory at 0xa700000 [0xa700000].


and there doesn't seem to be any IO issues:  cat of /proc/ioports:

0060-006f : keyboard
0070-007f : timer
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03e8-03ef : serial(auto)
03f6-03f6 : ide0
03f8-03ff : serial(auto)
8000-8007 : ide0
8008-800f : ide1
a000000-a00001f : Intel Speedo3 Ethernet
a200000-a20001f : Intel Speedo3 Ethernet
a400000-a40001f : Intel Speedo3 Ethernet
a600000-a60001f : Intel Speedo3 Ethernet
TRAIL-N-ERROR:

Forcing different interface speeds via mii-diag does not fix anything:
100baseTX-FD -- timeout still occurs
100baseTX-HD -- timeout still occurs
10baseT-FD   -- timeout still occurs
10baseT-HD   -- timeout still occurs

eepro-diag:

eepro100-diag.c:v2.02 7/19/2000 Donald Becker (becker@scyld.com)
 http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
at 0x9000
.
A potential i82557 chip has been found, but it appears to be active.
Either shutdown the network, or use the '-f' flag.
Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
at 0x9800
.
A potential i82557 chip has been found, but it appears to be active.
Either shutdown the network, or use the '-f' flag.
Index #3: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
at 0xa000
.
A potential i82557 chip has been found, but it appears to be active.
Either shutdown the network, or use the '-f' flag.
Index #4: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
at 0xa800
.

Chainging MACROS:

v1.06:
txfifo/rxfifo: changes do nothing
TX_RING_SIZE/RX_RINGSIZE: changes do nothing
TX_TIMEOUT:  Increasing this number decreases the freqency of the
timeouts until the number reaches roughly double what it was originally
set for, then the interfaces are not usable until an ifdown/ifup

v1.09:
txfifo/rxfifo: changes do nothing
TX_RING_SIZE/RX_RINGSIZE: changes do nothing
TX_TIMEOUT:  Incresing this number at all makes the interfaces unusable
until an ifdown/ifup.

Also, I ported the code from v1.09 to v1.06 for the function "static
void speedo_tx_timeout(struct net_device *dev)" to see what happens --
the new "hybrid" driver exhibited the characteristics of the v1.09
timeouts.

Lastly, changing txqueuelen via ifconfig does nothing. . .

Conclusion:

v1.06 of the driver seemed to handle the TX timeouts a quicker then
v1.09, but in v1.09 they were less frequent.  I tried to compile v1.10
and experimental v1.11, but I got all types of compile errors and did
not have the motivation to port them to v2.2.16 of the kernel after all
my above failures.

I have NO IDEA what is causing these TX timeouts. . . if any of the
gurus here would be as kind as to aide me in my efforts to figure this
out, I would greatly appreciate it!  I will grant accounts on the
troublesome machine if that will aide in trouble-shooting, and I will
code whatever I can if anyone can give me a direction to go in. . . 

Is there anything special that I have to set in the kernel for 64-bit
PCI, BTW?
Could the fact that this card is a 64-bit PCI card be the issue?
Are there any special parameters that I could try tweaking that are
alpha-specific?


Thank you for any help!!

--Chris