[tulip] Problem with smc 1255 card -- netdev watchdog timed o ut

Donald Becker becker@scyld.com
Fri Jul 19 06:25:01 2002


On Wed, 17 Jul 2002 erik.ostlyngen@telelogic.com wrote:
> > On Wed, 17 Jul 2002 erik.ostlyngen@telelogic.com wrote:
> > 
> > > I have problems getting my smc 1255 card to work...
> > ... everything seems to be ok until it receives an nfs mount
> > request.
> > The nfs mounting works, but shortly afterwards, the network becomes
> > extremely slow.

The working theory is that something about the NFS mount request causes
a network card reset, and the newly-reset card has some changed
configuration.

> tulip.c:v0.95a 6/27/2002  Written by Donald Becker <becker@scyld.com>
>   http://www.scyld.com/network/tulip.html
> eth0: Accton EN1217/EN2242 (ADMtek Comet) rev 17 at 0xd0912000,
> 00:04:E2:33:0A:D9, IRQ 11.

> The chip on SMC 1255 is called SMC-EN-5251-BE. The behavour is the
> same as with the 2.4.18 driver, but now the dmesg trace is different:

Hmmm, I thought that the SMC-1255 board used a Comet-II chip and had a
device ID 0x1255.  Accton and SMC are pretty much the same company, thus
the same PCI ID.

The EN1217 ID was added in November 2000.  When was your board built?
Note that most chips have a date code of WWYY, week+year.  So 1402 is
the 14th week of 2002.

> eth0: MII link partner 0000, negotiated 0000.
> eth0: No link beat on the MII interface, status 7849.
...
> eth0: Comet link status 786d partner capability 4081.
> eth0: MII link partner 4081, negotiated 0081.

Did you have the board plugged in the whole time?
What is the link partner?  A 100baseTx repeater, or some other machine?

> eth0: Comet link status 7869 partner capability 41e1.

Errkkk!  It looks as if autonegotation completes, but you are not able
to establish a good 100baseTx link.  What kind of cables are you using?

> eth0: Comet link status 786d partner capability 41e1.
> eth0: MII link partner 41e1, negotiated 01e1.
> eth0: Comet link status 786d partner capability 41e1.
> eth0: MII link partner 41e1, negotiated 01e1.
> eth0: Comet link status 786d partner capability 41e1.

OK, now the link is stable.

> The transmitter is stopped, but no watchdog timeout anymore.

We must stop the transmitter to switch to full duplex.
We might get a transmitter stopped interrupt if the transmitter was
previously active.

> Here is what tulip-diag says before the failure:

> tulip-diag.c:v2.11 6/17/2002 Donald Becker (becker@scyld.com)
>  http://www.scyld.com/diag/index.html
> Index #1: Found a Accton EN1217/EN2242 (ADMtek Comet) adapter at 0xe000.
> Accton EN1217/EN2242 (ADMtek Comet) chip registers at 0xe000:
>  0x00: fff98000 ffffffff ffffffff 0efa1800 0efa1a00 fc664010 ff972113
> ffffffff
...
>  Comet duplex is reported in the MII status registers.
>  Transmit started, Receive started, half-duplex.
>   The Rx process state is 'Waiting for packets'.
>   The Tx process state is 'Idle'.
>   The transmit threshold is 128.
>   Comet MAC address registers 33e20400 ffffd90a
>   Comet multicast filter 0000000040000000.
...
>  MII PHY found at address 1, status 0x7849.

This is curious -- no link beat is reported.

> And after:
> 
> tulip-diag.c:v2.11 6/17/2002 Donald Becker (becker@scyld.com)
>  http://www.scyld.com/diag/index.html
> Index #1: Found a Accton EN1217/EN2242 (ADMtek Comet) adapter at 0xe000.
> Accton EN1217/EN2242 (ADMtek Comet) chip registers at 0xe000:
>  0x00: fff98000 ffffffff ffffffff 0eee2800 0eee2a00 fc664010 ff976113
> ffffffff

>  Comet duplex is reported in the MII status registers.
>  Transmit started, Receive started, half-duplex.
>   The Rx process state is 'Waiting for packets'.
>   The Tx process state is 'Idle'.
>   The transmit threshold is 256.
>   Comet MAC address registers 33e20400 ffffd90a
>   Comet multicast filter 0000000040000000.

The device appears to be still operating with an almost identical
configuration, except for the underrun that increased the Tx threshold.

The Comet does have an automatic Tx underrun feature that might be
coming into play.  The feature should be optional, but the chip doesn't
handle underruns correctly unless it's turned on.  We, of course, turn
it on.   (Dan Hollis tracked this down, added tulip.c:v0.92q 1/9/2001.)

>  MII PHY found at address 1, status 0x786d.

And now we have link beat.

> EEPROM 64 words, 6 address bits.
>   Ethernet MAC Station Address 00:04:e2:33:0a:d9.
>   Default connection type 'Autosense'.
>   PCI IDs Vendor 1113 Device 1216  Subsystem 10b8 1255
>   PCI min_grant 255 max_latency 255.
>   CSR18 power-up setting 0xa4dc****.

> > That's a curious setting for the PCI parameters.  That might be what
> > caused the Tx threshold to increase.
> 
> How are these parameters set? Can they be configured?

What does /proc/pci or 'lspci -v' report about the latency timer
settings for this device?
The BIOS configures the bus arbitration and timers based on what the
device requests throught the PCI min_grant / max_latency settings.

You shouldn't need to manually tune for a 100Mbps bus master device.
The best way to tune is to change the EEPROM setting, rather than
second-guess the BIOS.

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993