bug in speedo_tx_timeout

Yisrael yhersch@allot.com
Thu Feb 3 10:55:49 2000


Hi again,

I wrote previously...

> I don't need to write to the ISOLATE bit of the control register. I don't
> see any difference in the behavior of the NICs. Maybe something very
> subtle is occurring that I'm not spotting.

Well, after more experimenting I've come to the conclusion that writing to
this bit (the ISOLATE bit, or bit 10 of the MDI control register) causes
problems for me. I had everything running for a couple of hours using a
driver that doesn't write to this bit. Then when I changed the driver so
that it would write to this bit, the NIC locked up after a very short time.
Then, every time I tried it, the same thing happened, the NIC locks up
shortly after.

Fred, you're still a genius in my book, (make sure your mother's looking
over your shoulder when you read this), but this bit just doesn't work for
me.

Richard @ iguana wrote...

> Damn thats some good work there :) Loved the explanation.

About the work I'm not sure. I enjoyed writing up the explanation though :-)

I've really got to get the 559 book. That should help. My distributor says
it's on the way.

> Sounds like several serious advances have been made on this whole weird
> transmit timeout thing. Complicated ones too.

I debated posting my findings since I didn't want anyone to get their hopes
up too high or too soon. However, I decided in the end to post since I'm
also desperate for an answer and I thought that maybe my findings could help
out.

"Weird" and "complicated" seem to the be the keywords here. Among other
things I tried out...

The Intel documentation states clearly about the PORT Software reset:

"The PORT RESET command should not be used during normal operation when the
82557 is active."

And that's exactly what the driver is doing.

Intel: "This command will reset the 82557 unconditionally."

Duh. Isn't that what we want?

Intel: "In some cases this can cause PCI protocol violation and hang the
bus."

Cute.

Intel: "To avoid the problem, issue a Selective Reset PORT command, wait for
the PORT register to be cleared (completion of the Selective Reset command)
and then issue a PORT reset command."

What, and I don't even have to bother standing on my head?? Well, I tried
exactly what Intel recommends and guess what? It doesn't work. Even when
standing on my head. (Just makes it more difficult to read the screen).
Donald obviously had already come to this conclusion, which is why he uses
the PORT Reset command in the driver.

Just for the record, the throughput seems to be the same for both
techniques. It's just that the Intel recommended way locks up after a very
short time. Go figure.

I also tried the transmit restart. Just in case you don't remember the
code segment from speedo_tx_timeout ...

01 speedo_show_state(dev);
02 if ((status & 0x00C0) != 0x0080
03     &&  (status & 0x003C) == 0x0010  &&  0) {
04     /* Only the command unit has stopped. */
05     printk(KERN_WARNING "%s: Trying to restart the transmitter...\n",
06             dev->name);
07     outl(virt_to_bus(&sp->tx_ring[sp->dirty_tx % TX_RING_SIZE]),
08           ioaddr + SCBPointer);
09     outw(CUStart, ioaddr + SCBCmd);
10 } else {
11     /* Reset the Tx and Rx units. */
12     outl(PortReset, ioaddr + SCBPort);
13     if (speedo_debug > 0)
14         speedo_show_state(dev);
15     udelay(10);
16     speedo_resume(dev);
17 }

In line 03, I removed the check for zero (0) to let the driver try
restarting the transmitter. No dice. Seems to me that this section of code
could be removed altogether. I now agree with Donald on this (for what
it's worth).

Richard @ iquana continued...

> Anyone know if/when this is likely to be rolled into the drivers? please
> let us know if it is, some of us are suffering a bit at the moment.

and Mark Hagger wrote...

> I too would like to see a complete version of the modified driver to give
> it a testing here.
>
> Perhaps you could mail the driver to the list?

I'm not so sure that the members of the list would like to have their mail
box stuffed with large attachments. Not to mention that I'm not the proper
address for this. I thinks it's best that we all toss this around and let
Donald be the final arbiter on what the driver should look like. I'm pretty
sure that the driver that we'll be using here at Allot will look very close
to the one that I've described in my last few ramblings. Especially since it
seems to work :-}. It's just that I would much prefer that the "official"
offering come from Donald. We certainly don't need "unauthorized" or
"bootleg" drivers floating around muddying up the waters.

If anyone wants to try out my fixes for themselves before Donald decides
what to do, it's easy enough to apply the changes. I used the 1.09t version
of the driver, and my last email described in nauseating detail the fixes.
Then just compile the sucker and try it out.

Anyone else have some ideas? I'm really desperate. I'm ready to switch
to 3Com. In fact, I'm even ready to switch to ArcNet. (Does anyone have the
email address for the "Parents of ArcNet users support group?)

Our weekend here in Israel is Friday/Saturday, so I won't be back in the
office until Sunday. Thank God it's Thursday :-))

Thanks again for listening, and have a good weekend,

Yisrael

-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-eepro100-request@beowulf.org