[vortex-bug] DPD FSH for Cyclone/Tornado

Donald Becker becker@scyld.com
Tue, 16 May 2000 14:07:40 -0400 (EDT)


On Tue, 16 May 2000, Bogdan Costescu wrote:

> The drivers are all using in boomerang_start_xmit:
> 
> vp->tx_ring[entry].status = cpu_to_le32(skb->len | TxIntrUploaded);
> 
> This is correct for Boomerang cards (90x), but not for Cyclone/Tornado
> (90xB and 90xC) - page 86 in C docs. Setting the FSH to the above value is
> harmles (see the comment about rounding at page 89 in C docs), as this
> field was redesigned with compatibilty in mind; it should be set
> to TxIntrUploaded only, for the current use.

The new format is backward-compatibe.
With the previous code, checking the chip type would have added a test and a
branch, so unconditionally setting the bit for all chip types was the best
approach. 

The current code structure would allow this to be changed without an obvious
impact on the C code.  But it still results in two register spills on the
x86: 

Original code
________________
	vp->tx_ring[entry].addr = virt_to_le32desc(skb->data);
	vp->tx_ring[entry].length = cpu_to_le32(skb->len | LAST_FRAG);
	vp->tx_ring[entry].status = cpu_to_le32(skb->len | TxIntrUploaded);

	if (vp->drv_flags & IS_BOOMERANG) {
           ...
		outw(DownStall, ioaddr + EL3_CMD);
           ...
	} else {
		vp->tx_desc_tail->next = virt_to_le32desc(&vp->tx_ring[entry]);
           ...

Your suggested change
________________
	vp->tx_ring[entry].addr = virt_to_le32desc(skb->data);
	vp->tx_ring[entry].length = cpu_to_le32(skb->len | LAST_FRAG);

	if (vp->drv_flags & IS_BOOMERANG) {
	   	vp->tx_ring[entry].status = cpu_to_le32(skb->len | TxIntrUploaded);
           ...
		outw(DownStall, ioaddr + EL3_CMD);
           ...
	} else {
		vp->tx_ring[entry].status = cpu_to_le32(skb->len);
		vp->tx_desc_tail->next = virt_to_le32desc(&vp->tx_ring[entry]);
           ...
		

They look similar, but the calculation of
      if (vp->drv_flags & IS_BOOMERANG)
adds instructions to shuffle values on the register-poor x86.

Hmmm, now that I think about it, beyond the cache impact of loading the few
extra instruction there likely is no performance impact.
It *is* important to make certain vp->tx_ring[entry] is written as a whole
cache line, rather than as individual words, but the code path change
should still permit that to be done.


Donald Becker				becker@scyld.com
Scyld Computing Corporation
410 Severn Ave. Suite 210
Annapolis MD 21403