[eepro100] Re: card reports no resources / RX buffers

Bertsch, Charles CBertsch@microtest.com
Mon, 9 Oct 2000 15:28:42 -0700


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C03240.47AC8940
Content-Type: text/plain;
	charset="iso-8859-1"

Roeland,
For the eepro100 driver and the "no RX buffers/no resources" problem,
further
examination of the sequence of changes that you had made before I clobbered
them
with the code from 2.2.16 kernel showed a udelay(30) after each
wait_for_cmd_done()
within speedo_resume().  While considering all the changes between the
versions,
I saw this and failed to appreciate the significance (or maybe I assumed,
and failed to
check, that the udelay() had been moved into wait_for_cmd_done), and dropped
them.

Further reflection on why this change would make a difference led to the
following
experiment, keeping two udelay(30) which were between (or could be between)
consecutive commands that would update the SCBPointer register.
This code has been tested on three different systems, two in reboot loops
resulting
in over 500 reboots without any occurrences of "no RX buffers/no resources",
and
the third board in an interrupt-handler stress test, with access by 5 PCs
causing over
5000 interrupt/second for 6 days.

I hope this will be the return to the rock-solid code situation that you
have described.

--Chuck Bertsch


*** eepro100.c	2000/08/15 23:06:30	1.13
--- eepro100.c	2000/10/02 17:01:11
***************
*** 1015,1048 ****
--- 1157,1207 ----
  	if ((sp->phy[0] & 0x8000) == 0)
  		mdio_read(ioaddr, sp->phy[0] & 0x1f, 0);
  
  	return 0;
  }
  
+ /* note delays below.  Routine wait_for_cmd_done() actually waits for */
+ /* the cmd to be accepted - apparently means that NIC has gotten the cmd,
*/
+ /* a new value could be safely stored as a cmd, and the NIC will start */
+ /* interpreting.  If the NIC needs to read the value in SCBPointer, */
+ /* it starts off to do that.  If the CPU updates SCBPointer too soon, */
+ /* the race can lead to syslog msgs "card reports no RX buffers" */
+ /* alternating with "card reports no resources" */
+ /* How the NIC manages to recover (sometimes) from this condition is
unclear. */
+ /* With these two udelay(30) in place, several hundred reboots have been
done */
+ /* with no "card reports" messages and no NIC initialize failures. */
+ /* (A better approach than using udelay might be to put a non-SCBPointer
cmd */
+ /* between any two cmds that use SCBPointer, even a no-op.  When the no-op
cmd */
+ /* has been accepted, then it is known that the SCBPointer has been read
*/
+ /* by the NIC for the prior cmd, and can be safely updated for a later
cmd.) */
+ 
  /* Start the chip hardware after a full reset. */
  static void speedo_resume(struct net_device *dev)
  {
  	struct speedo_private *sp = (struct speedo_private *)dev->priv;
  	long ioaddr = dev->base_addr;
  
  	/* Start with a Tx threshold of 256 (0x..20.... 8 byte units). */
  	sp->tx_threshold = 0x01208000;
  
  	/* Set the segment registers to '0'. */
  	wait_for_cmd_done(ioaddr + SCBCmd);
+     udelay(30);		/* Might prev command have used SCBPointer
?? */
  	outl(0, ioaddr + SCBPointer);
  	outb(RxAddrLoad, ioaddr + SCBCmd);
  	wait_for_cmd_done(ioaddr + SCBCmd);
  	outb(CUCmdBase, ioaddr + SCBCmd);
  	wait_for_cmd_done(ioaddr + SCBCmd);
  
  	/* Load the statistics block and rx ring addresses. */
  	outl(virt_to_bus(&sp->lstats), ioaddr + SCBPointer);
  	outb(CUStatsAddr, ioaddr + SCBCmd);
  	sp->lstats.done_marker = 0;
  	wait_for_cmd_done(ioaddr + SCBCmd);
+     udelay(30);		/* must pause, ensure NIC gets Pointer value
*/
  
  	if (sp->rx_ringp[sp->cur_rx % RX_RING_SIZE] == NULL) {
  		if (speedo_debug > 2)
  			printk(KERN_DEBUG "%s: NULL cur_rx in
speedo_resume().\n",
  					dev->name);
  	} else {
***************

> -----Original Message-----
> From:	Krawl, Roeland 
> Sent:	Thursday, August 24, 2000 6:46 PM
> To:	Bertsch, Charles
> Cc:	Morgan, Mike
> Subject:	FW: [eepro100] Re: card reports no resources / RX buffers
> 
> 
> 
> -----Original Message-----
> From:	Andrey Savochkin [SMTP:saw@saw.sw.com.sg]
> Sent:	Tuesday, July 25, 2000 7:19 PM
> To:	Krawl, Roeland; 'eepro100@scyld.com'
> Subject:	[eepro100] Re: card reports no resources / RX buffers
> 
> Hello Roeland,
> 
> On Tue, Jul 25, 2000 at 01:13:50PM -0700, Krawl, Roeland wrote:
> > Since our solution to this problem has been rock solid, I have not
> examined
> > the eepro100 driver or read the chip documentation since March. However
> I
> > clearly recollect that the "wait_for_cmd_done()" routine actually waits
> for
> > command acceptance, not command completion. Are you absolutely sure that
> the
> > contents of the System Control Block General Pointer can be harmlessly
> > overwritten after the chip has accepted the command but not yet executed
> it?
> 
> No, I'm not sure.
> There are reports about card/driver misbehavior which may be explained as
> a
> bad consequence of overwriting the general pointer.
> So far, all workarounds, including your own, have consisted of an
> additional
> delay somewhere.  The problems also go away in PIO instead of MMIO mode,
> which slowdown the operations a bit.
> 
> The accurate wait for command completion depends on the command.
> For example, for CUStart command it is waiting for CU leaving idle state.
> I'm sure that Intel's driver doesn't do it.
> 
> Nevertheless, I'll take an additional look on all existing drivers to get
> the
> idea what may be done for the proper initialization.
> 
> > I saw evidence to the contrary. The driver does not "wait for command
> done"
> > before overwriting the System Control Block General Pointer in
> preparation
> > for issuing the next command.
> > 
> > I have heard that you already have a fix for this problem. That is
> great. No
> 
> No, I have only workarounds.
> 
> > more dialog is necessary. The Linux community should be grateful to be
> rid
> > of this difficult and elusive problem.
> 
> Best regards
> 					Andrey V.
> 					Savochkin
> 
> _______________________________________________
> eepro100 mailing list
> eepro100@scyld.com
> http://www.scyld.com/mailman/listinfo/eepro100

------_=_NextPart_001_01C03240.47AC8940
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">



RE: [eepro100] Re: card reports no resources / RX =
buffers



Roeland,
For the eepro100 = driver and the "no RX buffers/no resources" problem, = further
examination of the = sequence of changes that you had made before I clobbered them
with the code from = 2.2.16 kernel showed a udelay(30) after each wait_for_cmd_done()
within = speedo_resume().  While considering all the changes between the = versions,
I saw this and = failed to appreciate the significance (or maybe I assumed, and failed = to
check, that the = udelay() had been moved into wait_for_cmd_done), and dropped = them.

Further reflection = on why this change would make a difference led to the following
experiment, keeping = two udelay(30) which were between (or could be between)
consecutive = commands that would update the SCBPointer register.
This code has been = tested on three different systems, two in reboot loops resulting
in over 500 reboots = without any occurrences of "no RX buffers/no resources", = and
the third board in = an interrupt-handler stress test, with access by 5 PCs causing = over
5000 = interrupt/second for 6 days.

I hope this will be = the return to the rock-solid code situation that you have = described.

--Chuck = Bertsch


*** eepro100.c  2000/08/15 = 23:06:30     1.13
--- eepro100.c  2000/10/02 = 17:01:11
***************
*** 1015,1048 ****
--- 1157,1207 ----
  =       if ((sp->phy[0] & 0x8000) =3D=3D = 0)
  =       =         mdio_read(ioaddr, = sp->phy[0] & 0x1f, 0);
 
  =       return 0;
  }
 
+ /* note delays below.  = Routine wait_for_cmd_done() actually waits for */
+ /* the cmd to be accepted - = apparently means that NIC has gotten the cmd, */
+ /* a new value could be = safely stored as a cmd, and the NIC will start */
+ /* interpreting.  If the = NIC needs to read the value in SCBPointer, */
+ /* it starts off to do = that.  If the CPU updates SCBPointer too soon, */
+ /* the race can lead to = syslog msgs "card reports no RX buffers" */
+ /* alternating with = "card reports no resources" */
+ /* How the NIC manages to = recover (sometimes) from this condition is unclear. */
+ /* With these two udelay(30) = in place, several hundred reboots have been done */
+ /* with no "card = reports" messages and no NIC initialize failures. */
+ /* (A better approach than = using udelay might be to put a non-SCBPointer cmd */
+ /* between any two cmds that = use SCBPointer, even a no-op.  When the no-op cmd */
+ /* has been accepted, then it = is known that the SCBPointer has been read */
+ /* by the NIC for the prior = cmd, and can be safely updated for a later cmd.) */
+
  /* Start the chip = hardware after a full reset. */
  static void = speedo_resume(struct net_device *dev)
  {
  =       struct speedo_private *sp =3D (struct = speedo_private *)dev->priv;
  =       long ioaddr =3D = dev->base_addr;
 
  =       /* Start with a Tx threshold of 256 = (0x..20.... 8 byte units). */
  =       sp->tx_threshold =3D = 0x01208000;
 
  =       /* Set the segment registers to '0'. = */
  =       wait_for_cmd_done(ioaddr + = SCBCmd);
+     = udelay(30);       =         /* Might prev command have = used SCBPointer ?? */
  =       outl(0, ioaddr + SCBPointer);
  =       outb(RxAddrLoad, ioaddr + = SCBCmd);
  =       wait_for_cmd_done(ioaddr + = SCBCmd);
  =       outb(CUCmdBase, ioaddr + SCBCmd);
  =       wait_for_cmd_done(ioaddr + = SCBCmd);
 
  =       /* Load the statistics block and rx ring = addresses. */
  =       outl(virt_to_bus(&sp->lstats), = ioaddr + SCBPointer);
  =       outb(CUStatsAddr, ioaddr + = SCBCmd);
  =       sp->lstats.done_marker =3D 0;
  =       wait_for_cmd_done(ioaddr + = SCBCmd);
+     = udelay(30);       =         /* must pause, ensure NIC = gets Pointer value */
 
  =       if (sp->rx_ringp[sp->cur_rx % = RX_RING_SIZE] =3D=3D NULL) {
  =       =         if (speedo_debug > = 2)
  =       =         =         printk(KERN_DEBUG "%s: = NULL cur_rx in speedo_resume().\n",
  =       =         =         =         =         dev->name);
  =       } else {
***************

------_=_NextPart_001_01C03240.47AC8940--