[realtek] Optimising the rtl8139 driver for low latency!

Donald Becker becker@scyld.com
Fri, 29 Jun 2001 01:08:23 -0400 (EDT)


On Thu, 28 Jun 2001 korahs@vsnl.com wrote:

> I've been studying the realtek driver (v. 1.07) to try and optimize it
> for low latency operation. I have a few doubts about the code and the
> working of the chip which I hope someone can clarify. I am listing
> them out below. 
> 
> 1. What is the minimum value to which TX_FIFO_THRESH and
> RX_FIFO_THRESH can be set. I want transmission to start as soon as
> bytes are transfered to the chip.

The Tx FIFO threshold is dynamically adjusted on Tx underruns.  This
results in a mostly self-tuning driver.

You can reduce the initial Tx threshold value below the default value of
256 bytes, but decreasing it below the Tx PCI burst length has very
little latency benefit.  The Tx PCI burst length is set at 256 by

#define TX_DMA_BURST	4		/* Calculate as 16<<val. */

I chose 256 bytes as a tradeoff between latency, PCI efficiency, and
avoiding Tx underruns on most machines.

> 2. Which line starts transmission in rtl8129_start_xmit(). Is it 
> 	outl (virt_to_bus(tp->tx_buf[entry],...)
>   or
> 	outl (tp->tx_flag | (skb->len....).

Setting the length register starts transmission.

> 3. Is there a maximum value that RX_BUF_LEN can be set to and what is it?

There are four permitted settings.
The datasheet describes the slightly different wrap semanatics for 64KB.
0==8K, 1==16K, 2==32K, 3==64K

Recent driver versions, e.g. v1.14, have dynamically sized Rx rings.

> 4. How exactly does the rtl8129_rx work. How does the chip know where in rx_ring the next packet is to be copied (is it RxBufPtr)? Could somebody explain the updating of the cur_rx pointer below..
>               cur_rx = (cur_rx + rx_size + 4 + 3)& ~3;
>               outw (cur_rx - 16, ioaddr + RxBufPtr);

The chip does the obvious thing: it writes packets one after another.
It only works on whole 32 bit words, thus the rounding-up code.

> Also how does the single linear ring architecture compare with the
> descriptor based arch of most other drivers. It seems that we can't
> have a window with a dirty_rx and cur_rx pointer? Doesn't that mean
> that packets have to be processed immediately? 

Yes.  Other chips can receive directly into a skbuff.
With the rtl8129 design we must instead do a copy into a skbuff.
We can't be clever and avoid the copy, otherwise the Rx ring might clog.

> 5. How do I discard/pop the top packet in the rx_ring.

Move the ring-write-block pointer "RxBufPtr".

> I understand that there are a lot of questions. But from reading the
> archives, this seems like the best place to ask. Thanks in advance. 

It's easy to answer specific questions such as these.
A question like "how does the chip work?"  is sure to get the response
"RTFM".  


Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993