2.0.3[345]/SMP/Boomerang - big IRQ reentry problems :-(

Adam Spiers adam@thelonious.new.ox.ac.uk
Wed Jul 22 14:39:35 1998


[sent to linux-smp and linux-vortex-bug]

Cyril Chaboisseau (Cyril.Chaboisseau@Obs.CoE.int) wrote:
> Matthias Reineke wrote:
> > 
> > Hi,
> > 
> > today i got several system crashes with Linux SMP 2.0.35 (dual PII-333,
> > Intel Etherexpress PRO10/100+, Motherboard ASUS P2B-DS, ICP Vortex
> > GDT6537RP SCSI Raid-Controller). The system message was:
> > 
> > eth1: SMP simultaneous entry of an interrupt handler
> 
> I switch yesterday with kernel 2.0.35 and by the end of the day, the
> system would also go crazy with a continuous messages like :
> eth0: Re-entering the interrupt handler with proc 0, proc 0 already
> handling
> ....
> (on the console)
> and nothing could be done to stop it (reset)

I've also had terrible problems getting a stable SMP system with
a 3Com 3c900 Boomerang adapter in; I've tried numerous kernel
releases and pre-patches from 2.0.33 to 2.0.35, which between
them seem to cover all the recent releases of the 3c59x.c
driver, but they've all eventually hung the system completely
while the dreaded

   eth0: Re-entering the interrupt handler

streams past on the console.

Donald Becker (the author) has said this on linux-vortex-bug:

-------- 8< -------- 8< --------
On 10 Mar 1998, luka wrote:

> I'm using kernel 2.0.33 with the 0.49 3x59x driver. It reports the following:
> 
> 3c59x.c:v0.49 1/2/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html
> loading device 'eth0'...
...
> eth0: Re-entering the interrupt handler.
> 
> Right now it is running a single CPU kernel, and has not had a problem.
> I would greatly appreciate any advice regarding this matter. Thanks. :)

Until a kernel fix is found for the interrupt-dispatch problem you must use
the "smpcheck" version on
http://cesdis.gsfc.nasa.gov/linux/drivers/test/3c59x.c

This does not fix the problem, but the driver will continue to operate..
-------- 8< -------- 8< --------

Unfortunately this doesn't appear to be true - I've tried this
version with 2.0.35 and it still crashed in under a day.  Things
are now a bit desperate so I'm running 2.1.109ac1 which has been
OK so far.

More comforting was a very recent post to linux-vortex-bug from
Jerry Sweet:

-------- 8< -------- 8< --------
After pounding on our dual-400MHz Pentium-II system for the last 48
hours without a crash, it looks as though this "Re-entering interrupt
handler" problem has been cured by applying to our 2.0.34 kernel the
"mtrr-fix" patch. Prior to applying the mtrr-fix, the dual-CPU system
wasn't staying up longer than 24 hours without wedging itself with the
"Re-entering interrupt handler" problem.

The "mtrr-fix" compensates for broken SMP BIOSes which do not
synchronize MTRRs (memory type range registers) across all
CPUs. Mathias Froehlich has written a patch for 2.0.3x which has a
similar BIOS fix; it is available from this URL:

http://na.uni-tuebingen.de/~frohlich/mtrr-stuff/mtrr-fix-2.0.33.gz

Since this patch has apparently worked for us, it appears that the
ASUS P2B-DS motherboard's BIOS has the cache synchronization problem.

I found the above URL two links away from the SMP FAQ at this URL:

http://www.phy.duke.edu/brahma/smp-faq/smp-faq-5.html
-------- 8< -------- 8< --------

However, I applied this to 2.0.35 and exactly the same thing
happened within a day.  So it's back to 2.1.x, and struggling to
see if I can get a working iBCS module (needed for backups) ...

If anyone can help, I would be VERY VERY grateful.  It's driving
me up the wall!

-- 
/- Adam Spiers, Computing Officer, New College, Oxford University, UK -/,
#!perl -l .sig 'cello, jazz, cycling, juggling, Linux, security, anti-M$
open$[;$;=q,,,$-++?$?:($#=lc<0>),$==$=>>++$*,$- ++, map {($k=ord)-=$=+$*,  
$c=($k &$-+$*)<<$*,$k>>=$-;$;.=($#=~m[^.{$k}(.{$c})])[$[],$#=$'}(split//  
=>(q,.";7=/43+':,)[$===$[]);;s;;$\;;, y$k, /u-$c@.kau$&&s&||/ \&&print&e;