[vortex] can't unload module

David Fries dfries@umr.edu
Fri, 22 Sep 2000 15:39:00 -0500


On Fri, Sep 22, 2000 at 11:36:30PM +1100, Andrew Morton wrote:
> [ added netdev ]
> 
> David Fries wrote:
> > 
> > I don't know if it is a motherboard/network card combination, but I
> > see networking going down the tube with nfs traffic the first to show
> > things slowing down.  Sometimes networking completely stops and
> > without any error message.
> > 
> > In previous kernels I just did ifconfig eth0 down, rmmod 3c59x, insmod
> > 3c59x, ifconfig ... and it was back to normal.
> 
> This is unfamiliar and it does sound like a driver problem.  Does it do
> this under kernel 2.2?

yes

I'm going to say there are two problems, 'net drop out' and
'unregister_netdevice'.

> Could you please do some further investigation?  Try increasing the
> driver debug level, look at the `ifconfig' output and /proc/interrupts
> when it's happening, etc?  Thanks.

'net drop out' problem,
There are two stages, reduced network and no network.  For example
when I do a `ping -s 15000 aerospace` ping from spacedout (troubled
computer) to aerospace (another one), I'll get response times of
either 4ms or 3000ms.

When networking stops I don't get any packets received or interrupts,
but I and showing RX overruns incrementing.  When I ping from
spacedout, spacedout shows an arp request going out, aerospace sees
the arp request, but spacedout never sees the reply.

> > Now I booted the 2.4.0-test9 pre4 kernel ( the test8 did it also, but
> > I think I could unload under test7).  This is with a 3c509B network
> > card.
> > 
> > With 2.4.0-test9 it gives me,
> > 
> > unregister_netdevice: waiting for eth0 to become free.  Usage count =
> > 2
> > 
> > and that keeps scrolling on the screen.
> 
> And does it continue to do this for more than thirty seconds?

yes.

> If so, then someone may be leaking some skbuffs.
> 
> Is this an NFS server or a client?

both problems
It is a client, but it doesn't matter if I have NFS mounted or not.
It will happen without NFS so that shouldn't matter.

> What NFS mount options are you using (specifically, rsize and wsize)?
> 
> I've just tried an ifdown/rmmod in the middle of heavy NFS client
> traffic and everything seems to hang together, although the application
> which is using NFS gets errors when the interface is brought back up.  
> (Is NFS client supposed to be able to recover from a local interface
> outage??)

I not sure, I think it should work, but it would matter on your mount
options.

> Are you able to provide a set of steps with which others can reproduce
> this?

'net drop out'
I'll just say no.  AeroSpace is running SMP, spacedout is not SMP.
AeroSpace is a dual Pentium MMX, Spacedout is a K6-2.  They have
basically identical network cards in them 3c905b, I have swaped the
network cards in the past and the problems follow the computer not the
card.

I would suggest try getting a FIC VA 503+ motherboard, K6-2 processor,
3c905B network card, go in X, have something rapidly updating the
video card (rxvt doing `locate \*` worked fine), and send a ton of
network data to the system at 100BaseT.

If you REALLY pulled my leg you might get me to put one of my Pentium
processors in the system, but I would rather not do that.

The new problem about 'unregister_netdevice: waiting ...' I can
reproduce it by,
insmod 3c59x
ifconfig eth0 ...
(on another console) ping -s 15000 -f aerospace
ifconfig eth0 down; rmmod 3c59x

That usually gives about two lines of 'unregister_netdevice...' before
is able to be removed.

Odd thing about the 'unregister_netdevice' problem is I was still able
to unload the module until I inserted my ne2000 card and ifconfiged it
up.

I did,
insmod 3c59x
modprobe ne io=0x300 irq=111
ifconfig eth0 ...
ifconfig eth1 ...
ifconfig eth0 down
rmmod 3c59x
and it keep giving, 'unregister_netdevice' message over and over until
I rebooted.

This is having init run sulogin as about the first thing it does, so
there isn't anything else up on the system yet.

> Are you running SMP?  I assume not, because if you were, your kernel
> would have locked up good and tight.
> 
> Alexey, Dave: that wait-for-ever crap which went into
> unregister_netdevice() happens under the lock_kernel() in
> sys_delete_module()!  This means that the kernel lock will be held until
> all the frags expire.  Ugly.  Is sys_delete_module() the only user of
> unregister_netdevice who can get bitten by this?
> 
> > If I run ifconfig it hangs, doing strace ifconfig shows it hangs on
> > the ioctl ( SIOCGIFCONF) call.
> 
> Possibly the rtnetlink semaphore. Not sure.
> 
> > /proc/net/dev does not list an eth0 device
> > /proc/modules lists 3c59x 21992  0 (deleted)
> 
> OK, you're not using SMP :)
> 
> > I need the latest kernel or usb crashes, but if networking goes down
> > I'm SOL, time to reboot which is just as annoying.
> > 
> > I'm open to suggestions.
> 
> David, it would be useful if you could play with this or an hour or so
> and characterise it a bit more.  It sounds nicely reproducible, which is
> good.  But how do I reproduce it here?

Where is here?  I'm in USA, Missouri.

I'm guessing for the 'unregister_netdevice' put an ne2000 card in and
that might be enough.

For the 'net drop out' problem you might need the same model
motherboard I have.

The the driver for this network card 3c905b, uses a circular buffer
list for it's receive buffer right?  Could you modify the receive
routine to build a histogram that records which buffer each packet is
being received from?  Each second spit out a list that gives each
buffer number and the total number of packets received from that buffer.

I'm wondering if in my case buffers and getting stuck and it has less
buffers to work with causing lower network performance and sometimes
everything is full and it stops receiving.

Then again I haven't looked at the card in detail or driver.

-- 
		+---------------------------------+
		|      David Fries                |
		|      dfries@umr.edu             |
		+---------------------------------+