3c597 stops

Brian D. Winters brianw@alumni.caltech.edu
Thu Dec 31 16:08:19 1998


I've got a problem with a 3c597 (EISA 10/100).  It seemed to work ok
under 2.0.35 and .36.  Recently I switched to the 2.1 series, running
2.1.131, .132, and 2.2.0-pre1.  Under any of the 2.1/2.2pre kernels,
after a while the 3c597 will just stop working.

It will fail on its own under normal use (transferring big files under
NFS is the worst), but the easiest way to make it fail is to flood
ping using fairly large (1200 to 1400 byte) packets.  It can take
anywhere from a few hundred to tens of thousands of packets, but
eventually (almost always less than 100,000 packets) it will stop
sending and receiving, and the only way I have found to wake it back
up is to "ifdown eth1 ; ifup eth1".  I've left it for hours, with
other traffic on network, and I don't know of it ever unsticking
itself.  (I tried to nail down a particular packet size threshold, but
there isn't one.  Sometimes 800 bytes is enough to kill it in a few
hundred packets, and sometimes I can run a million 1000 byte packets
without a problem.)

Even with the module loaded with debug=6 I see no indication of
errors; it just stops transmitting and receiving.  ifconfig still
reports the card as up, and if I try to send packets ifconfig updates
its TX count, but I never see a nonzero TX "errors", "dropped",
"overruns", or "carrier" for this card.  While it is stuck I see no
indication from the lights on the hub or from the RX count on the
system I'm pinging that it is actually transmitting, even though the
TX count is increasing.  When I run ping with -v, it shows "no route
to host" errors for every packet.

I upgraded to 0.99H, and the problem persists.  Currently I'm running
2.2.0-pre1-ac3, SMP (dual Pentium 166).  I borrowed a 3c905B to run as
a control, and the 3c905B runs fine alongside the 3c597, continuing to
work even when the 3c597 is dead.

I've included a little debugging info at the end of this message, but
I'm not sure what is useful.  Any suggestions for things I can check
are welcome.


I've noticed a few other problems with the card too, although they are
not quite as big a deal as the "totally stops working" problem.

One is that if I unload the 3c59x module so I can reload it with a
different debug setting or load a different driver version, when I
reload it the IRQ for the 3c597 is misdetected, showing up as 3
instead of the 9 that is where it is actually at.  The only way I've
found to fix this and get the card back up is to reboot the system.
This seems to be a symptom of a greater problem though, because even
after I hacked the driver to replace IRQ 3 with IRQ 9, the card still
wouldn't work, even though the driver and ifconfig now display the
correct configuration information.  Again, no problems with the
3c905B.

I've also had some problems with intermittent data corruption over the
network under the 2.0 kernels, but I never had the time to diagnose it
very well.  The corruption seemed to exhibit itself mostly under heavy
NFS loads (similar to my 2.1 freeze-up problem), and like the
freeze-ups I can't reproduce the problem with the 3c905B.  (When
running the 2.0 kernels I've always just used the 3c59x driver that
shipped with them.)

Lastly, setting the MTU with ifconfig doesn't seem to affect anything.
I'm not sure if my MTU setting is being ignored, or if I don't
understand what MTU does.  Attempting to ping with packets larger than
the MTU for that interface should cause the packets to be dropped,
right?  What am I missing here?


Sorry to dump all of these problems at once.  Help on any of them
would be greatly appreciated.  TIA.

Brian


======

>From dmesg:

eth1: 3Com 3Com Vortex at 0x5000,  00:20:af:f7:b4:b1, IRQ 9
  64K word-wide RAM 3:1 Rx:Tx split, autoselect/100baseTX interface.


Output of vortex-diag -aamm while dead:

vortex-diag.c:v1.05 5/22/98 Donald Becker (becker@cesdis.gsfc.nasa.gov)
The Vortex chip may be active, so FIFO registers will not be read.
To see all register values use the '-f' flag.
Initial window 7, registers values by window:
  Window 0: 6d50 7059 0a01 8000 9000 00bf 0000 0011.
  Window 1: FIFO FIFO 0000 2011 05a2 00ff 3ffc 2011.
  Window 2: 2000 f7af b1b4 0000 0000 0000 00de 4011.
  Window 3: 001b 0141 0000 0000 e10a 0003 3fff 6011.
  Window 4: 0000 01d0 8800 0c80 0001 8882 0000 8011.
  Window 5: 1ffc 1ffc 00de 1ffc 0007 02de 00de a011.
  Window 6: 0000 0000 0000 0000 0000 0000 0062 c011.
  Window 7: 0000 0000 0000 0000 05a2 00ff 0000 e011.
Vortex chip registers at 0x5000
  0x5010: **FIFO** **FIFO** 000005a2 00003ffc
  0x5020: ffffffff ffffffff ffffffff ffffffff
  0x5030: ffffffff ffffffff ffffffff ffffffff
 Interrupt sources are pending.
   Interrupt latch indication.
   Rx Complete indication.
 Transceiver/media interfaces available:  100baseTx 10baseT.
 MAC settings: half-duplex.
 ***WARNING***: No MII transceivers found!


Note: I didn't see much point in including vortex-diag -aamm for the
card when it isn't stuck.  The only interesting difference between
vortex-diag -aamm when stuck and when instuck seems to be that when it
is unstuck it says "No interrupt sources are pending."