Transmit timed out

Tue Feb 9 17:54:29 1999

In article <36C09554.A344AB6C@razorfish.fi>,
	oa@razorfish.fi (Osma Ahvenlampi) writes:
> After all, it does work for me reliably once I figured out this
> workaround.

Likewise: it worked quite reliably for me until I found I needed the
workaround. :)

> Do you have machines that never experience this problem, even though
> their based on their configuration they "should"?

In the sense that they are identical to machines that HAVE experienced 
it in configuration, network location, and use, yes.

I upgraded all of my production machines to v1.05 today (previously I
was running v0.99B, which worked great for months) and added the
multicast_filter_limit option.  I assume that will cure the problem.

> If that's so, perhaps we should try to figure out the common
> denominator in the systems that do show this behaviour, and their
> difference to the immune systems. I can tell you right now that in
> none of the Linux machines I have with eepro100 cards is the card
> sharing interrupts with other hardware, so this isn't just a
> shared-interrupt problem.

The two machines that started exhibiting the problem recently were
both SMP machines, running kernel 2.0.35 with driver v0.99B.  These
cards are not sharing interrupts either.  When the driver on eth0 hung 
with the "transmit timed out" error, I was able to log in to the
machine through a different interface (eth1, also an EEPro100, but
functioning normally) and lower and raise eth0, which cleared up the
problem (until it recurred a few days later).  I did this a two or
three times until today when I was able to upgrade the driver and
supply the workaround options.

I can't expect anyone to do much tracking down of things since I was
not running the latest driver.  But I am certainly interested in doing 
whatever I can in terms of providing hardware information, running an
instrumented version of the driver, or whatever, to get Don (or
whoever) the information needed to get rid of this problem.  My only
problem is that I can't, so far, reliably reproduce it -- it just
happened randomly after months of uninterrupted, smooth operation.

When I started investigating the problem and looked for (and found)
similar reports on this list, I was surprised to find a report that it
still happened with v1.05, even though Don's change history suggests
this problem was solved at v1.02.  This is what most perplexed me, and
makes me wonder if I will still see the problem anyway after upgrading
from v0.99B to v1.05: perhaps not, since I went ahead and specified
the multicast_filter_limit at 0.

--Bret

-- 
Bret Andrew Martin      Student.Net Publishing
bam@student.net         http://www.student.com + http://www.tvgrid.com