AW: Some comments...

Adrian B Yee abyee@sfu.ca
Fri Jan 14 00:23:23 2000


Hi,

I currently have 4 of the intel 10/100 management adapters (i82259) and I
haven't had a single problem with it.  Though I don't think i've been
having any multicast traffic (that was the culprit to most of the problems
I recall).  I have it running on Windows NT 4.0, Windows 98 SE, Slackware
Linux 7.0 (2.2.x kernels) and I haven't had a single problem.  I also have
two other intel (i82257, and i82258) based cards, though I haven't used
those as extensively, but I haven't had any problems with those either.

My gateway/server currently uses on of these nics and had been up for 199
days without any problems (really only did ip masq'ing).  Now it's doing
some http/ftp/ip masq/xhosting but still no probs.  One thing to note is
that I've only used them on for a long term basis on Pentium 1 machines,
and shortly used one in a Celeron 466 (also no probs).  Maybe you guys are
just plain unlucky, or you have something in your kernel the driver just
doesn't like, or it's one of those dreaded
this-hardware-don't-like-that-hardware problems.  Just my two cents.

Adrian

On Thu, 13 Jan 2000, Stauffer, Walter (Exchange) wrote:

> Am I the only person on earth having the following problem with
> i82558B NIC's ?
> 
>  - put a box on the net which waits for requests (i.e. a server)
> 
>  - send a request (connect to FTP, WWW, even Ping)
> 
> -> 50% of the NIC's stop responding after some time !
> 
> They can be recovered by having the server create some network
> traffic by itself (ping something).
> 
> I have observed this with on-board NIC's and also with PCI NIC's
> from IBM, under NT, Win95/98, Linux, and even DOS ... so don't
> tell me it's a driver issue.
> 
> Regards,
> Walter
> 
> >I have been in contact with these NICs in various forms under NT for
> >quite some time and I have never seen any errors or problems there.    I
> >also have never seen any problems under Linux but my server is a
> >Quake2/Quake3 server with some ftp/http.  I'm on a half duplex 10BaseT
> >link as well.  Am assuming my bandwidth hasn't reached a critical level
> >or it is full duplex that gives the NIC fits.   My Machine is a Dual
> >PII400 (Gigabyte MB) running redhat 6.0, kernel 2.2.13 and ver 1.06 of
> >the NIC driver.
> >My guess is that Intel has 1) worked around issues with the chipset in
> >their windows drivers to hide design problems and/or 2) Not released
> >complete specs on the board and this is causing problems.
> >If I knew what I was doing when it came to c or networking drivers I'd
> >create a driver that followed Intel's specs 100% and then work off of
> >that (Not that Donald has not done this).   That way you elimiate any
> >deviations from the specs as the culprit.  Just my $0.02.
> >If anyone has a way for me to test to see if I can crap out my NIC I'd
> >be willing to do that and feed the results back to the list.
> >
> >
> >Scott
> >
> >
> >----- Original Message -----
> >From: "yhersch" <yhersch@allot.com>
> >To: <linux-eepro100@beowulf.gsfc.nasa.gov>
> >Sent: Wednesday, September 08, 1999 7:09 AM
> >Subject: Some comments...
> >
> >
> >> Hi,
> >>
> >> I've been following the various discussions concerning the operation
> >(or
> >> inoperation?) of the eepro100. Until now I haven't had much to
> >contribute.
> >> However, things got hairy and I had no choice but to figure out what's
> >> going on. Some observations...
> >>
> >> 1) My feeling (OK, this isn't an observation) all along has been that
> >the
> >> Intel chip itself has some basic flaw. It seems to get confused and
> >there
> >> is no way to recover gracefully. I have no proof, but look at the
> >topics
> >> discussed in this mailing list (receive hangs, transmit timeouts,
> >etc). On
> >> second thought, maybe this IS an observation.
> >>
> >> 2) We (Allot Communications) started experiencing crashes when we
> >upgraded
> >> to a faster system board. I made an assumption (yes, I know what
> >ass-u-me
> >> means), at least for this exercise (other possibilities of course
> >exist)
> >> that the problem was timing based. More specifically, the new system
> >board
> >> is TOO fast, and the NIC can't keep up. This could be caused by an
> >improper
> >> board design, which doesn't allow certain signals to stabilize
> >properly
> >> (quickly enough), or it could be a bug in the NIC itself (see #1
> >above).
> >> Another possibility is that the chip just isn't designed to operate in
> >> high-speed systems, and either certain hardware or software design
> >changes
> >> or workarounds are necessary. Workarounds make me nervous - they often
> >> translate into reduced performance.
> >>
> >> 3) So, I got my hands dirty and started mucking around with the
> >driver.
> >> Most of my experiments involved various delays and code shuffling in
> >the
> >> driver's interrupt routine. Yeah, you all read correctly, delays in an
> >> interrupt routine - If any of my computer science instructors were
> >dead
> >> today they'd be rolling in their graves. Of interest:
> >> ==> The proper delay inserted between reading the interrupt status and
> >> acking the interrupts (writing back to the same register) keeps the
> >board
> >> from crashing. The size of the delay is particularly sensitive - if
> >too
> >> low, the system crashes; if too high, the ISR is overworked.
> >Performance
> >> results were varied based on different delay values.
> >> Acking the interrupts twice (two sequential writes to the status
> >register)
> >> also kept the system from crashing, however performance suffered
> >> significantly.
> >> I was unsuccessful in my attempts at removing the delay by shuffling
> >the
> >> code around. The system continued to crash. More research and
> >> experimentation is necessary to find another solution to the delay. In
> >my
> >> opinion, adding a delay is an evil workaround due to faulty hardware
> >> behavior and it will negatively affect performance.
> >>
> >> 4) I discovered some potential problems with the driver itself. The
> >Intel
> >> User's Guide clearly RECOMMENDS that all accesses to the command and
> >status
> >> registers be limited to byte-wide access to avoid any side-effects.
> >> However, the driver uses only word-wide access to these registers.
> >There
> >> might be nothing more sinister in this than the fact that Intel is
> >> recommending good programming practice. However, I know what it means
> >when
> >> my wife RECOMMENDS that I tackle some chores around the house. It
> >might be
> >> that there is in fact a problem with word-wide access, and the driver
> >needs
> >> to be rewritten, or seriously massaged.
> >>
> >> 5) The loop in the wait_for_cmd_done() routine might be too short for
> >very
> >> fast boards. I changed the loop from 100 to 10000. Is this too high,
> >or too
> >> low? It seems that this keeps the system more stable, but I don't have
> >any
> >> positive proof (yet).
> >>
> >> 6) Intel documentation states clearly that the CU Start and RU Start
> >should
> >> only be executed when the unit is in either the idle or no resources
> >state.
> >> This is not always checked. For example, in the ISR, the RxStart
> >command
> >> (RX_START in older drivers) is issued without first invoking
> >> wait_for_cmd_done(). It seems to me that unless it's 100% sure that
> >the
> >> receive unit is idle here, wait_for_cmd_done() should be called. Also
> >as I
> >> recall, there are one or two other places in the driver where either
> >the
> >> RxStart or CuStart commands are issued without first invoking
> >> wait_for_cmd_done().
> >>
> > >> 7) The transmit routine has a somewhat lengthy section of code in
> >which
> >> interrupts are disabled. It seems to me that perhaps it would be
> >worthwhile
> >> seeing if there is a way to redesign this area to eliminate (or at
> >least
> >> shorten the duration of) the interrupts being disabled.
> >>
> >>
> >> Using version 1.05 of the driver, I was able to come up with a stable
> >> working version of the driver. This was accomplished by doing the
> >> following:
> >> - In the speedo_interrupt() routine, I added a delay - udelay(2) -
> >right
> >> after reading the interrupt status.
> >> - Changed the wait_for_cmd_done() loop to 10000.
> >> - Made sure that wait_for_cmd_done() was invoked every place that the
> >> RxStart or CuStart commands are issued.
> >>
> >> I hope that I've contributed some useful ideas and haven't just
> >waisted
> >> mailing list bandwidth. I'm continuing my experiments and maybe
> >something
> >> will come of all this. I'll keep you all posted.
> >>
> >> Thanks of course goes to Donald Becker. Along with Daniel Veillard, I
> >too
> >> find it amazing that just about every NIC driver has Donald's name as
> >the
> >> author. Doesn't the guy ever sleep?!
> >>
> >> Regards,
> >>
> >> Yisrael (Russ) Hersch
> >> Allot Communications
> >> yhersch@allot.com
> >>
> >>
> 
> -------------------------------------------------------------------
> To unsubscribe send a message body containing "unsubscribe"
> to linux-eepro100-request@beowulf.org
> 


-------------------------------------------------------------------
To unsubscribe send a message body containing "unsubscribe"
to linux-eepro100-request@beowulf.org