Race condition in tulip.c 0.91

Keith Owens kaos@ocs.com.au
Sun Jun 13 09:56:29 1999


This race is horribly timing dependent.  Adding diagnostics to track
another problem slowed my system down enough to trip the race.  The
symptoms are that the card does not respond to its own MAC address.
The race tends to occur under load when resetting the MAC filters.  So
it can occur with multicast changes, promiscuous mode and IPv6 which
uses multicast.  Packets go out but nothing is received.

Sound familiar :) ?

>From Intel 27807401.pdf, table 4-6, TDES0 bit 31.

OWN-Own Bit
When set, indicates that the descriptor is owned by the 21143. When
cleared, indicates that the descriptor is owned by the host. The 21143
clears this bit either when it completes the frame transmission or when
the buffers allocated in the descriptor are empty.  The ownership bit
of the first descriptor of the frame should be set after all subsequent
descriptors belonging to the same frame have been set. This avoids a
possible race condition between the 21143 fetching a descriptor and the
driver setting an ownership bit.

--- tulip.c.0.91	Mon Jun  7 23:41:44 1999
+++ tulip.c	Sun Jun 13 23:26:09 1999
@@ -3022,7 +3022,7 @@
 			/* Same setup recently queued, we need not add it. */
 		} else {
 			unsigned long flags;
-			unsigned int entry;
+			unsigned int entry, dummy = -1;
 
 			save_flags(flags); cli();
 			entry = tp->cur_tx++ % TX_RING_SIZE;
@@ -3033,7 +3033,8 @@
 				tp->tx_ring[entry].length =
 					(entry == TX_RING_SIZE-1) ? DESC_RING_WRAP : 0;
 				tp->tx_ring[entry].buffer1 = 0;
-				tp->tx_ring[entry].status = DescOwned;
+				/* race with chip, set DescOwned later */
+				dummy = entry;
 				entry = tp->cur_tx++ % TX_RING_SIZE;
 			}
 
@@ -3048,6 +3049,8 @@
 				set_bit(0, (void*)&dev->tbusy);
 				tp->tx_full = 1;
 			}
+			if (dummy >= 0)
+				tp->tx_ring[dummy].status = DescOwned;
 			restore_flags(flags);
 			/* Trigger an immediate transmit demand. */
 			outl(0, ioaddr + CSR1);