wierd lockups on KNE100TX (DEC DC21142 (rev 65))

Ruediger Oberhage ruediger@next12.Theo-Phys.Uni-Essen.DE
Fri Oct 1 04:12:09 1999


Hello!

Although I'm not the addressed in this thread, let me add some
observations, as I can imagine, that they are of more generic
type than thought.

Donald Becker <becker@cesdis1.gsfc.nasa.gov> kindly answered:
 On Thu, 30 Sep 1999 scottk@plover.atdesk.com wrote:

> > We have a 3com SuperStack II Switch 3900-36 that seems to reboot
> > spontaneously.  Whenever it does, it takes down networking on all
> > the machines on it that have DEC DC21142 (rev 65).

Us, we too have DEC 21142/3 (rev 65)s in Adaptec's 6911A/TX boards.
I already notified the linux-tulip list, that we have difficulties
in getting this card to work under 2.0.36 and serveral 2.2.x Linux
kernels with v. 0.89 and v. 0.91 drivers (from "nothing" to
revision "g", iirc), but was then kindly pointed to a v. 0.90
driver that works for us, from a list-member (thanks again).

>
> Note: that's actually a 21143-TD chip.  Digital didn't want to
> change the device ID, and kept throwing in (sometimes incompatible)
> features based on the revision number.

Yes, shame on them for this behaviour.

>
> > when this happens, these machines are no longer able to transfer
> > any data. The card is basically locked up.  I have tried [...]

Now the interesting thing is, that with OPENSTEP's 4.2 generic
21x4x-driver the card basically runs well. But as soon as the
physical connection ("link") to the "other side" is lost, be it
by pulling the (network-cable) plug on either side or by a power
loss on the other side, the card stops working until the machine
is rebooted (- remember, this is OPENSTEP's driver, not the (Linux)
tulip one!).

The "other side" have been various hubs and switches, most promi-
nently Cisco ones, as our computing center builds the infrastructure
from Cisco components. Always the same result. If just the logical
connection is lost, i.e. the card hangs from a hub whose uplink
temporarily isn't working, but the hub itself is not affected,
the card works properly. Thus the lost link-state seems to be the
problem.

What I find remarkable here is the following: there seems to be a
more generic problem with link-loss with this chip and obviously
different (and independant) kinds of drivers. The tip to activate
re-negotiation, e.g. by pulling the plug, badly fails, at least
here for the OPENSTEP driver and our Linux tulip version driver.
Thus such a try might actually provoke the "hanging" problem.

If noone has a counterexample of a working (automatic) re-negoti-
ation, this might be a basic problem of that chip(-set).

Hopefully this helps you to gain some idea of what might be
happening here and how to work around it.

Thanks for listening and greetings,
 Ruediger Oberhage
--
H.-R. Oberhage
Mail: Univ.-GH Essen	       E-Mail: phy070@sp2.power.Uni-Essen.DE
      Fachbereich 7 (Physik)	      ruediger@Theo-Phys.Uni-Essen.DE
      S05 V07 E88
      Universitaetsstrasse 5	Phone:  {+49|0} 201 / 183-2493
      D-45117 Essen, Germany	FAX:    {+49|0} 201 / 183-2120