kernel oopses

Bill Hilf bill at hilfworks.com
Tue Jan 29 08:44:46 PST 2002


Robert Latham wrote:
> 
> On Mon, Jan 21, 2002 at 06:13:44PM -0800, Martin Siegert wrote:
> > This is somewhat off topic - sorry for that.
> 
> it's a great topic for clusters.  in an ideal world, the kernel never
> oopses, but when you have N kernels and possibly dodgy hardware, it
> happens.
> 
> i get frustrated with this list because topics like Martin's get
> ignored, while topics like cooling with LN2, game console clusters
> and anything athlon get multi-day discussions.
> 
> [snip problem report ]
> 
> > The first thing I would like to do is to log the oops message. Right now
> > it goes to the console only - it does not appear in the log files
> > although syslog sends everything of severity *.info to /var/log/messages.
> 
> i guess you've read Documentation/oops-tracing.txt , but if not, it's
> a good start.
> 
> depending on where the panic happens, the part of the kernel that
> would normally write that oops out to disk doesn't run.
> 
> So you've got a few options:
> 
> . typing off the screen:  sucks.  a lot.  and is highly error prone.
>   and the kernel console blanking mechanism might kick in ( and since
>   the kernel has paniced, it won't listed for input signals and unblank
>   itself ) but if you've got no other option...
> 
>   ( one time a guy took a picture of the oops with a digital camera and
>   sent that to me. that was fun.  I don't have any character regognition
>   software, but if someone knows of a linux OCR tool that won't mind a
>   screenful of hex, i'd like to hear about it )
> 
> . serial console:  not bad.  if it's just one machine, you can pass
>   parameters to your kernel and capture all kernel messages over the
>   serial port.  Documentation/serial-console.txt has all the info you
>   need.
> 
> . netconsole: http://people.redhat.com/mingo/netconsole-patches/
>   like a serial console, but using your network device instead of a
>   serial device.  It's a kernel patch and a convienece script for the
>   sender  and a userspace tool for the reciever to display the messages.
>   Patching a kernel and setting up yet another tool might be a bit much,
>   but man is it cool to see it work :>
> 
> . patch your kernel to support "dump log to swapfile" or "dump log to
>   disk".  I haven't set something like this up, but always meant to
>   try it out...

To expand on this, the Linux Kernel Crash Dump package:

http://lkcd.sourceforge.net/

and Dprobes (from IBMs Linux Technology Center):

http://oss.software.ibm.com/developer/opensource/linux/projects/dprobes/

... which can also be used with Opersys's Linux trace toolkit
(http://www.opersys.com/LTT/).

And for the truly brave, use gdb ;)

-Bill

> Basically the name of the game is to get that oops into a form you can
> feed to ksymoops, then hope the backtrace it prints out gives you a
> clue.  ( like "oh, the last thing it called was do_scsi_service... maybe
> i have a dogdy scisi controller ).
> 
> Anybody else know of good ways ( even funny bad ways might be
> entertaining) to capture an oops?
> 
> ==rob
> 
> --
> Rob Latham
>                                              A215 0178 EA2D B059 8CDF
>                                              B29D F333 664A 4280 315B
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


-- 
-Bill	
<bill at hilfworks.com>
PGP Fingerprint: 4CE0 D72C C7A2 89B2 6B23  03DC B5E9 77CB E6F3 0D2A	
http://pgpkeys.mit.edu:11371/pks/lookup?op=get&exact=on&search=0xE6F30D2A



More information about the Beowulf mailing list