Too much work at interrupt?

Chris Sterling lemmy@eaze.net
Tue Oct 27 19:39:53 1998



	I've got several Linux servers running on PR440FX boards with SMP
running. They have been stable for months now. However, in the last week,
I have had 3 out of 6 'crash' on me. Of course, the ones that went down
are critical parts of my network, serving email, DNS, and database
functions. 

All hardware is similar: 

large amounts of 50ns ECC RAM, one or two PPRO 200 CPU's, most recent BIOS
rev from Intel, running Slackware Linux 3.5. All kernels are 2.0.35. All
plugged into several BayStack 350T switches, all running 100mbps/full
duplex. 

I'm not sure if there is a DoS attack going on when this happens, I'm
still tracking down other variables that could cause this. 

The problem: machine drops off network, but does not dump core, or
otherwise 'crash', all are still fine, if accessed from the console. 

What I know:

>From /proc: 

no IRQ shareing or other odd stuff I noticed. 

>From syslog: 

kernel: eth0: Too much work at interrupt, status=0x4050

Physical: 

The 'link' light goes dark on the switch and the card. 



	Has anyone seen this?  

	I would have asked Donald Becker at ALS, but this only became
noticable hours after I returned home. Great talk on Beowulf, BTW. 


Thanks!

--------------------------------
Chris Sterling					
System Administrator
EazeNet					     
lemmy@eaze.net
Office:	817-557-3038
Fax:	817-557-3468