linuxppc vs. LinkSys, v0.91

Brian Warner warner@lothar.com
Sun May 16 05:01:35 1999


> I'm having a problem with a LinkSys 10/100 card that I picked up at Fry's
> cheap ($25). It connects fine to a cheap 10baseT hub when in my x86 linux box
> (stock 2.2.6 kernel, tulip driver), but I can't get it to send packets on my
> powerpc linux box. Here's the data:
> 
>  linux-2.2.8 with tulip.c v0.91 copied into place, built as a module
>  LinkSys "EtherFast" 10/100 card, the kind with the PNIC from what I can tell
>  10baseT hub, nothing fancy.

Well, thanks to a clue from Donald I managed to make it work. The tulip driver
on big-endian machines (ppc, sparc?) takes advantage of a feature in real
2114[0123] chips that allows the xmit/recv descriptors to be in big endian
format (normally they are always in little-endian format). This is convenient
for the big-endian processors, as the driver's critical path reads and writes
those registers a lot. Not having to swap bytes around saves a lot of time and
improves performance.

The problem is that some clones don't implement this feature. The PNIC used in
my cheap LinkSys card, apparently, does not have the bit in CSR0 that would
turn this on (it stays zero even though we write a 1 to it, indicating an
"unimplemented bit"). In fact it seems to not implement several bits, mostly
relating to PCI bus speedups.

I went through the driver to replace every assignment to a descriptor field
from something like:

 rx_ring[i].status = foo;

with something like

 rx_ring[i].status = cpu_to_le32(foo);

and similar swaps for the reads: status = le32_to_cpu(rx_ring[i].status); .
After that, the driver works well for me.

This wouldn't be an easy thing to patch into the mainstream driver. Because of
the performance cost, you wouldn't want to do those swabs when your real
DECchip can do it for you, but you can't find out that you have the
less-capable chip until runtime, at which point you couldn't really use a
variable to tell you whether you should swab or not because you'd waste more
time checking the variable each write than actually doing the swab. So you'd
need a compile time macro that says either to:

 A) not turn on the CSR0 bit, swap bytes on all accesses (works on all cards,
    but is slightly slower)

 B) turn on the CSR0 bit, don't swap bytes (faster, but won't work on PNIC)

Folks who have a PNIC-based card on big-endian systems would just have to know
that they need to flip the switch before compiling.

Anyway, just wanted to throw that out there. The patch that basically
implements (A) above but without a compile-time switch is attached below. This
patch turns off the "hey chip please swab descriptors" bit (the 0x01A0 instead
of 0x01B0 is that missing bit), so it ought to work for real DECchips too, but
slightly slower. Note that this patch is against v0.91 of tulip.c, which is
more recent than the version included in the 2.2.8 kernel. Grab it off the
tulip page: <http://cesdis.gsfc.nasa.gov/linux/drivers/tulip.html> .


Note that the performance difference is entirely theoretical. I don't know how
big it would be in real life (and, not having the more-capable chip, I'm not
in a position to find out). I think the PPC can do that kind of swab in a
single instruction, and it would be about 10 extra instructions in a receive
routine of maybe 100 lines. No idea how that translates into the real world,
if it would even be measurable.

 -Brian
   warner@lothar.com

-------------------- patch start ------------------------------
--- Tulip-stuff/tulip.c.v0.91.orig	Fri May 14 00:49:30 1999
+++ tulip.c	Sun May 16 01:41:59 1999
@@ -76,9 +76,9 @@
 #if defined(__alpha__)
 static int csr0 = 0x01A00000 | 0xE000;
 #elif defined(__powerpc__)
-static int csr0 = 0x01B00000 | 0x8000;
+static int csr0 = 0x01A00000 | 0x8000;
 #elif defined(__sparc__)
-static int csr0 = 0x01B00080 | 0x8000;
+static int csr0 = 0x01A00080 | 0x8000;
 #elif defined(__i386__)
 static int csr0 = 0x01A00000 | 0x8000;
 #else
@@ -1414,9 +1414,9 @@
 		*setup_frm++ = eaddrs[1]; *setup_frm++ = eaddrs[1];
 		*setup_frm++ = eaddrs[2]; *setup_frm++ = eaddrs[2];
 		/* Put the setup frame on the Tx list. */
-		tp->tx_ring[0].length = 0x08000000 | 192;
-		tp->tx_ring[0].buffer1 = virt_to_bus(tp->setup_frame);
-		tp->tx_ring[0].status = DescOwned;
+		tp->tx_ring[0].length = cpu_to_le32(0x08000000 | 192);
+		tp->tx_ring[0].buffer1 = cpu_to_le32(virt_to_bus(tp->setup_frame));
+		tp->tx_ring[0].status = cpu_to_le32(DescOwned);
 
 		tp->cur_tx++;
 	}
@@ -2339,14 +2339,14 @@
 	if (tulip_debug > 3) {
 		int i;
 		for (i = 0; i < RX_RING_SIZE; i++) {
-			u8 *buf = (u8 *)(tp->rx_ring[i].buffer1);
+			u8 *buf = (u8 *)(le32_to_cpu(tp->rx_ring[i].buffer1));
 			int j;
 			printk(KERN_DEBUG "%2d: %8.8x %8.8x %8.8x %8.8x  "
 				   "%2.2x %2.2x %2.2x.\n",
-				   i, (unsigned int)tp->rx_ring[i].status,
-				   (unsigned int)tp->rx_ring[i].length,
-				   (unsigned int)tp->rx_ring[i].buffer1,
-				   (unsigned int)tp->rx_ring[i].buffer2,
+				   i, (unsigned int)le32_to_cpu(tp->rx_ring[i].status),
+				   (unsigned int)le32_to_cpu(tp->rx_ring[i].length),
+				   (unsigned int)le32_to_cpu(tp->rx_ring[i].buffer1),
+				   (unsigned int)le32_to_cpu(tp->rx_ring[i].buffer2),
 				   buf[0], buf[1], buf[2]);
 			for (j = 0; buf[j] != 0xee && j < 1600; j++)
 				if (j < 100) printk(" %2.2x", buf[j]);
@@ -2354,10 +2354,10 @@
 		}
 		printk(KERN_DEBUG "  Rx ring %8.8x: ", (int)tp->rx_ring);
 		for (i = 0; i < RX_RING_SIZE; i++)
-			printk(" %8.8x", (unsigned int)tp->rx_ring[i].status);
+			printk(" %8.8x", (unsigned int)le32_to_cpu(tp->rx_ring[i].status));
 		printk("\n" KERN_DEBUG "  Tx ring %8.8x: ", (int)tp->tx_ring);
 		for (i = 0; i < TX_RING_SIZE; i++)
-			printk(" %8.8x", (unsigned int)tp->tx_ring[i].status);
+			printk(" %8.8x", (unsigned int)le32_to_cpu(tp->tx_ring[i].status));
 		printk("\n");
 	}
 #endif
@@ -2385,14 +2385,14 @@
 	tp->dirty_rx = tp->dirty_tx = 0;
 
 	for (i = 0; i < RX_RING_SIZE; i++) {
-		tp->rx_ring[i].status = 0x00000000;
-		tp->rx_ring[i].length = PKT_BUF_SZ;
-		tp->rx_ring[i].buffer2 = virt_to_bus(&tp->rx_ring[i+1]);
+		tp->rx_ring[i].status = cpu_to_le32(0x00000000);
+		tp->rx_ring[i].length = cpu_to_le32(PKT_BUF_SZ);
+		tp->rx_ring[i].buffer2 = cpu_to_le32(virt_to_bus(&tp->rx_ring[i+1]));
 		tp->rx_skbuff[i] = NULL;
 	}
 	/* Mark the last entry as wrapping the ring. */
-	tp->rx_ring[i-1].length = PKT_BUF_SZ | DESC_RING_WRAP;
-	tp->rx_ring[i-1].buffer2 = virt_to_bus(&tp->rx_ring[0]);
+	tp->rx_ring[i-1].length = cpu_to_le32(PKT_BUF_SZ | DESC_RING_WRAP);
+	tp->rx_ring[i-1].buffer2 = cpu_to_le32(virt_to_bus(&tp->rx_ring[0]));
 
 
 	for (i = 0; i < RX_RING_SIZE; i++) {
@@ -2404,8 +2404,8 @@
 		if (skb == NULL)
 			break;
 		skb->dev = dev;			/* Mark as being used by this device. */
-		tp->rx_ring[i].status = DescOwned;	/* Owned by Tulip chip */
-		tp->rx_ring[i].buffer1 = virt_to_bus(skb->tail);
+		tp->rx_ring[i].status = cpu_to_le32(DescOwned);	/* Owned by Tulip chip */
+		tp->rx_ring[i].buffer1 = cpu_to_le32(virt_to_bus(skb->tail));
 	}
 	tp->dirty_rx = (unsigned int)(i - RX_RING_SIZE);
 
@@ -2413,10 +2413,10 @@
 	   do need to clear the ownership bit. */
 	for (i = 0; i < TX_RING_SIZE; i++) {
 		tp->tx_skbuff[i] = 0;
-		tp->tx_ring[i].status = 0x00000000;
-		tp->tx_ring[i].buffer2 = virt_to_bus(&tp->tx_ring[i+1]);
+		tp->tx_ring[i].status = cpu_to_le32(0x00000000);
+		tp->tx_ring[i].buffer2 = cpu_to_le32(virt_to_bus(&tp->tx_ring[i+1]));
 	}
-	tp->tx_ring[i-1].buffer2 = virt_to_bus(&tp->tx_ring[0]);
+	tp->tx_ring[i-1].buffer2 = cpu_to_le32(virt_to_bus(&tp->tx_ring[0]));
 }
 
 static int
@@ -2442,7 +2442,7 @@
 	entry = tp->cur_tx % TX_RING_SIZE;
 
 	tp->tx_skbuff[entry] = skb;
-	tp->tx_ring[entry].buffer1 = virt_to_bus(skb->data);
+	tp->tx_ring[entry].buffer1 = cpu_to_le32(virt_to_bus(skb->data));
 
 	if (tp->cur_tx - tp->dirty_tx < TX_RING_SIZE/2) {/* Typical path */
 		flag = 0x60000000; /* No interrupt */
@@ -2458,8 +2458,8 @@
 	if (entry == TX_RING_SIZE-1)
 		flag |= 0xe0000000 | DESC_RING_WRAP;
 
-	tp->tx_ring[entry].length = skb->len | flag;
-	tp->tx_ring[entry].status = DescOwned;	/* Pass ownership to the chip. */
+	tp->tx_ring[entry].length = cpu_to_le32(skb->len | flag);
+	tp->tx_ring[entry].status = cpu_to_le32(DescOwned);	/* Pass ownership to the chip. */
 	tp->cur_tx++;
 	if ( ! tp->tx_full)
 		clear_bit(0, (void*)&dev->tbusy);
@@ -2518,7 +2518,7 @@
 			for (dirty_tx = tp->dirty_tx; tp->cur_tx - dirty_tx > 0;
 				 dirty_tx++) {
 				int entry = dirty_tx % TX_RING_SIZE;
-				int status = tp->tx_ring[entry].status;
+				int status = le32_to_cpu(tp->tx_ring[entry].status);
 
 				if (status < 0)
 					break;			/* It still hasn't been Txed */
@@ -2548,7 +2548,7 @@
 					if (status & 0x0001) tp->stats.tx_deferred++;
 #endif
 #if LINUX_VERSION_CODE > 0x20127
-					tp->stats.tx_bytes += tp->tx_ring[entry].length & 0x7ff;
+					tp->stats.tx_bytes += le32_to_cpu(tp->tx_ring[entry].length) & 0x7ff;
 #endif
 					tp->stats.collisions += (status >> 3) & 15;
 					tp->stats.tx_packets++;
@@ -2654,14 +2654,14 @@
 
 	if (tulip_debug > 4)
 		printk(KERN_DEBUG " In tulip_rx(), entry %d %8.8x.\n", entry,
-			   tp->rx_ring[entry].status);
+			   le32_to_cpu(tp->rx_ring[entry].status));
 	/* If we own the next entry, it's a new packet. Send it up. */
-	while (tp->rx_ring[entry].status >= 0) {
-		s32 status = tp->rx_ring[entry].status;
+	while ((le32_to_cpu(tp->rx_ring[entry].status) & 0x80000000) == 0) {
+		s32 status = le32_to_cpu(tp->rx_ring[entry].status);
 
 		if (tulip_debug > 5)
 			printk(KERN_DEBUG " In tulip_rx(), entry %d %8.8x.\n", entry,
-				   tp->rx_ring[entry].status);
+				   le32_to_cpu(tp->rx_ring[entry].status));
 		if (--rx_work_limit < 0)
 			break;
 		if ((status & 0x38008300) != 0x0300) {
@@ -2705,22 +2705,22 @@
 				skb->dev = dev;
 				skb_reserve(skb, 2);	/* 16 byte align the IP header */
 #if ! defined(__alpha__)
-				eth_copy_and_sum(skb, bus_to_virt(tp->rx_ring[entry].buffer1),
+				eth_copy_and_sum(skb, bus_to_virt(le32_to_cpu(tp->rx_ring[entry].buffer1)),
 								 pkt_len, 0);
 				skb_put(skb, pkt_len);
 #else
 				memcpy(skb_put(skb, pkt_len),
-					   bus_to_virt(tp->rx_ring[entry].buffer1), pkt_len);
+					   bus_to_virt(le32_to_cpu(tp->rx_ring[entry].buffer1)), pkt_len);
 #endif
 				work_done++;
 			} else { 	/* Pass up the skb already on the Rx ring. */
 				char *temp = skb_put(skb = tp->rx_skbuff[entry], pkt_len);
 				tp->rx_skbuff[entry] = NULL;
 #ifndef final_version
-				if (bus_to_virt(tp->rx_ring[entry].buffer1) != temp)
+				if (bus_to_virt(le32_to_cpu(tp->rx_ring[entry].buffer1)) != temp)
 					printk(KERN_ERR "%s: Internal fault: The skbuff addresses "
 						   "do not match in tulip_rx: %p vs. %p / %p.\n",
-						   dev->name, bus_to_virt(tp->rx_ring[entry].buffer1),
+						   dev->name, bus_to_virt(le32_to_cpu(tp->rx_ring[entry].buffer1)),
 						   skb->head, temp);
 #endif
 			}
@@ -2744,10 +2744,10 @@
 			if (skb == NULL)
 				break;
 			skb->dev = dev;			/* Mark as being used by this device. */
-			tp->rx_ring[entry].buffer1 = virt_to_bus(skb->tail);
+			tp->rx_ring[entry].buffer1 = cpu_to_le32(virt_to_bus(skb->tail));
 			work_done++;
 		}
-		tp->rx_ring[entry].status = DescOwned;
+		tp->rx_ring[entry].status = cpu_to_le32(DescOwned);
 	}
 
 	return work_done;
@@ -2788,9 +2788,9 @@
 	for (i = 0; i < RX_RING_SIZE; i++) {
 		struct sk_buff *skb = tp->rx_skbuff[i];
 		tp->rx_skbuff[i] = 0;
-		tp->rx_ring[i].status = 0;		/* Not owned by Tulip chip. */
-		tp->rx_ring[i].length = 0;
-		tp->rx_ring[i].buffer1 = 0xBADF00D0; /* An invalid address. */
+		tp->rx_ring[i].status = cpu_to_le32(0);		/* Not owned by Tulip chip. */
+		tp->rx_ring[i].length = cpu_to_le32(0);
+		tp->rx_ring[i].buffer1 = cpu_to_le32(0xBADF00D0); /* An invalid address. */
 		if (skb) {
 #if LINUX_VERSION_CODE < 0x20100
 			skb->free = 1;
@@ -3031,9 +3031,9 @@
 				/* Avoid a chip errata by prefixing a dummy entry. */
 				tp->tx_skbuff[entry] = 0;
 				tp->tx_ring[entry].length =
-					(entry == TX_RING_SIZE-1) ? DESC_RING_WRAP : 0;
-				tp->tx_ring[entry].buffer1 = 0;
-				tp->tx_ring[entry].status = DescOwned;
+					cpu_to_le32((entry == TX_RING_SIZE-1) ? DESC_RING_WRAP : 0);
+				tp->tx_ring[entry].buffer1 = cpu_to_le32(0);
+				tp->tx_ring[entry].status = cpu_to_le32(DescOwned);
 				entry = tp->cur_tx++ % TX_RING_SIZE;
 			}
 
@@ -3041,9 +3041,9 @@
 			/* Put the setup frame on the Tx list. */
 			if (entry == TX_RING_SIZE-1)
 				tx_flags |= DESC_RING_WRAP;		/* Wrap ring. */
-			tp->tx_ring[entry].length = tx_flags;
-			tp->tx_ring[entry].buffer1 = virt_to_bus(tp->setup_frame);
-			tp->tx_ring[entry].status = DescOwned;
+			tp->tx_ring[entry].length = cpu_to_le32(tx_flags);
+			tp->tx_ring[entry].buffer1 = cpu_to_le32(virt_to_bus(tp->setup_frame));
+			tp->tx_ring[entry].status = cpu_to_le32(DescOwned);
 			if (tp->cur_tx - tp->dirty_tx >= TX_RING_SIZE - 2) {
 				set_bit(0, (void*)&dev->tbusy);
 				tp->tx_full = 1;
@@ -3184,7 +3184,7 @@
 /*
  * Local variables:
  *  SMP-compile-command: "gcc -D__SMP__ -DMODULE -D__KERNEL__ -Wall -Wstrict-prototypes -O6 -c tulip.c `[ -f /usr/include/linux/modversions.h ] && echo -DMODVERSIONS`"
- *  compile-command: "gcc -DMODULE -D__KERNEL__ -Wall -Wstrict-prototypes -O6 -c tulip.c `[ -f /usr/include/linux/modversions.h ] && echo -DMODVERSIONS`"
+ *  compile-command: "gcc -D__KERNEL__ -I../../include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -D__powerpc__ -fsigned-char -msoft-float -pipe -fno-builtin -ffixed-r2 -Wno-uninitialized -mmultiple -mstring -DMODULE   -c -o tulip.o tulip.c"
  *  cardbus-compile-command: "gcc -DCARDBUS -DMODULE -D__KERNEL__ -Wall -Wstrict-prototypes -O6 -c tulip.c -o tulip_cb.o -I/usr/src/pcmcia-cs-3.0.9/include/"
  *  c-indent-level: 4
  *  c-basic-offset: 4

-------------------- patch end -----------------------------------