3c509B fun and games at 100M on NT

Ted_Rule@flextech.co.uk Ted_Rule@flextech.co.uk
Mon Aug 24 05:13:29 1998


We've been having some fun and games with the new DELL machines, Optiplex
GX1's and Workstation 410's.

These are fitted with the latest 3Com 3C509B 10/100 motherboard card -
otherwise known as the Cyclone chip set, as opposed to the 3C905A Boomerang
present
in earlier PCI motherboards, and the Vortex present in earlier EISA cards.

When we had the earlier cards linked up to our Cabletron MMAC+ switch at
100M, we found definite problems with the speed/duplex Auto-Detect -
especially the duplex detect. As a result, as a general policy, we nailed
down the configuration on the cards using the DOS configuration disk that
came with the machines.
For most machines this was 10Base-T half-duplex. The corresponding hub
ports were mostly only shared 10Base-T half-duplex as well. Those switch
ports which
were duplex capable, were nailed down to only advertise one speed in their
auto negotiation - i.e. the speed to which we'd nailed the 905.

Earlier experience with the EISA machines led us to believe that the cards
were not 100Base-T FD capable, but we think that was erroneous, and simply
a consequence of the earlier chipset - the later Boomerangs do appear to be
100Base-T FD capable.

Some recent machines set up for testing - with 3C905B's - were required to
run at 100Base-T. ( FD if possible, but not absolutely required. )

Since the previous tests, the MMAC+ had been firmware upgraded, and I had
noted some specific references in the release notes for better handling of
Auto-Neg
when talking to 3Com cards. I still didn't trust it really.

These GX1's were configured using the nicdiag utility available as part of
the driver kit. Again, following previous policy, we nailed down the speeds
of the cards.
The first tests were carried out at 100Base-T HD.

When run up , it was immediately apparent something was very wrong. The
login process took forever. It did eventually succeed though.

Falling back to performing a simple FTP file transfer from a 100Base-T FD
connected server, we found that roughly every 30 or 40 packets from the
server were being dropped by the workstation, ( as monitored by Netmon on
the workstation itself. ) This led to enormous TCP retransmission timeouts,
and an appallng overall transfer rate.

Going back to DELL support to complain about the situation - we were pretty
annoyed that they might have supplied a duff set of cards by this time -
they did
actually suggest re-enabling the Auto-Detect as their support notes
suggested that the NICDIAG utility's speed setting didn't actually work.
Moreover, they
said that NT performed a complete card reset at reboot, which would have
wiped our hard coding anyway.

This we did, and were amazed to find all our problems vanish. We can only
presume that different parts of the 905 driver assume the card is going at
different speeds. It's almost as if the card always performs a correct
Auto-Detect, but the upper layers assume duplex is turned on when one tries
to statically
configure the speed. This would explain the situation we found of dropped
packets - the presumption is that one of the receiver's ACK's collided
outbound with
the sender's next DATA packet, but neither end detected the collision
because the 3Com card had it's collision detection circuitry disabled
because it thought
it was in duplex mode.

Anyway, the moral of the story seems to be let the 3C905B to its own
devices, and always let it auto-detect.

Meanwhile, of course, our older machines have to be told NOT to
auto-detect. Nothing like consistency , is there.

As a further point of note, when connected at 100MBase-T HD, we found that
the sender overran the receiver's 8k TCP window quite often during our FTP
tests.
Burying ourselves in Registry handbooks, we found that adding a Registry
Key:


HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TcpIp\Parameters\TcpWi
ndowSize=35040

significantly improved large file size transfer speed. The wierd no. is TCP
Ethernet MSS * 24 == 1460 * 24.
The normal window size on the machine was TCP Ethernet MSS *6 = 8760.

Due, we think, to the difference in speed between the server and the
workstation, and the fact that the intervening switches would have been
buffering packets during the TCP slow start sequence, this registry change
actually made things worse when the workstation was connected to a 10Base-T
hub.


I realise the tests don't apply to a Linux box, but I wonder whether any of
the odd behaviour corresponds to other people's experience on the 3c59x
driver,
especially when running at 100M and especially when trying to be clever and
nailing down the auto-detect. I'm also concerned that RedHat mention
specific
problems running the card at 100M - maybe our saga is relevant? - or maybe
the driver included with 5.1 is not latest release and therefore not truly
905B capable?



Ted Rule,


Flextech Television