[vortex] Re: 3c905 errors:

Steven Timm timm@fnal.gov
Thu Oct 9 15:29:30 2003


PS--the switch in question, a Cisco 4000 series switch,
doesn't show any network errors at all with this node.
That has been the pattern throughout on a variety of switches.

Steve


------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Core Support Services Dept.
Assistant Group Leader, Scientific Computing Support Group
Lead of Computing Farms Team

On Thu, 9 Oct 2003, Steven Timm wrote:

> In an earlier post I had reported errors on a Tyan 2466
> motherboard, 3c905C-tx on-board interface.
> Donald Becker's response was that I should run the new (9/6/03)
> version of mii-diag --monitor.
>
> Attached below is the output from one of my most persnickety nodes.
> This covers about two hours of running.  The machine is running
> a normal cpu load plus the "nettest" program which is
> sending and receiving packets to another node as fast as it can,
> continuously.
>
> Over the course of this test, which is still ongoing, this
> test generated 33 receive errors in the "err" column of /proc/net/dev,
> 14 FIFO receive errors, and 29 "frame errors.
>
> We did not see any of the messages
>
> Oct  3 21:32:37 fnd0196 kernel: eth0: Setting half-duplex based on MII #24
> link partner capability of 0000.
> Oct  3 21:33:37 fnd0196 kernel: eth0: Setting full-duplex based on MII #24
> link partner capability of 41e1.
> during this time.  Up to this point I haven't been able to observe
> one of these errors as it happens, but over the course of this
> week there has been an average of two or three of them per node
> in a 240-node cluster.
>
> On the remote node I was sending the packets to, there was
> an error:
>
> eth0: Updating statistics failed, disabling stats as an interrupt source.
> but no increase in the error counters.
>
> The down and up messages in the log below follow each
> other in very rapid succession. The link light appears to blink
> out but it is very hard to tell.
>
> Any advice on what might be happening?
>
> Thanks,
>
> Steve Timm
>
>
> [root@fnd0196 bin]# mii-diag --monitor
> Using the default interface 'eth0'.
> up           0x782d 0x41e1
> negotiating  0x7821 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x0000
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> down         0x0f00 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> up           0x0f05 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> negotiating  0x0061 0x0000
> up           0x782d 0x41e1
> up           0x6305 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> negotiating  0x7821 0x0000
> up           0x782d 0x41e1
> up           0x6305 0x0000
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> up           0x7104 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> negotiating  0x0061 0x0000
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> negotiating  0x0061 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> up           0x4f05 0x0000
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> down         0x0300 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> down         0x0300 0x0000
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> down         0x0f00 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> down         0x7c10 0x0000
> up           0x782d 0x41e1
> negotiating  0x7828 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> up           0x4f05 0x0000
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> down         0x0000 0x41e1
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> up           0x7805 0x0000
> up           0x782d 0x41e1
> negotiating  0x7821 0x0000
> up           0x782d 0x41e1
>
>
> ------------------------------------------------------------------
> Steven C. Timm (630) 840-8525  timm@fnal.gov  http://home.fnal.gov/~timm/
> Fermilab Computing Division/Core Support Services Dept.
> Assistant Group Leader, Scientific Computing Support Group
> Lead of Computing Farms Team
>