Transmit timed out

Osma Ahvenlampi oa@razorfish.fi
Tue Feb 9 02:19:18 1999


bam@student.net (Bret Martin) writes:
> Can anyone point to particular messages from this mailing list (or
> somewhere else) that definitively describe why this problem occurs,
> why the workaround fixes it, and why it still happens?

No one definitely knows why the problem occurs (if we did, someone
could fix it, right?), but here's some data:

On a uniprocessor 2.0.36 kernel with eepro100 1.05 driver, the bug
will occur within 5 minutes of starting netatalk, the
AppleTalk/AppleShare server for UNIX on a network with 6 MacOS clients 
in fairly heavy use. AppleTalk (EtherTalk, actually) uses ethernet
multicast for address negotiation (AppleTalk has automatic address
assignment by way of broadcasting "tell me your address" to all
machines on the network and picking an unused address).

It appears that the exact reason for the crash is a received
(multicast?) packet during hardware multicast filter reconfiguration,
when the configuration data doesn't fit in one of eepro100's
configuration frames (3 first addresses fit there, for more, up to the 
upper limit of 64, extension frames have to be allocated). The
symptoms are a lockup of the interface during which it will randomly
spew out (mostly broadcast) traffic onto the network, creating a huge
load on switches and routers on the network. After a while (10-30
seconds? can't recall for sure), the kernel will detect that the
eepro100 driver or hardware is no longer responding and will force a
hardware reset. The interface will work again for a while until the
same incident will reoccur.

My guess is an interrupt reentry on a handler which isn't
reentrant. Some static data or eepro100 register gets corrupted with
two interrupt handlers running at once, and the interface crashes. Why 
this is so difficult to trace and fix I can't say, because I have
absolutely no experience writing ethernet or Linux drivers, save for
the time taken to read through (and not completely understand) the
eepro100 code to figure out even this much.

-- 
All things considered, insanity may be the only reasonable alternative.
Osma Ahvenlampi <oa@iki.fi>