kernel oopses

Robert Latham robl at mcs.anl.gov
Tue Jan 29 07:39:33 PST 2002


On Mon, Jan 21, 2002 at 06:13:44PM -0800, Martin Siegert wrote:
> This is somewhat off topic - sorry for that.

it's a great topic for clusters.  in an ideal world, the kernel never
oopses, but when you have N kernels and possibly dodgy hardware, it
happens.  

i get frustrated with this list because topics like Martin's get
ignored, while topics like cooling with LN2, game console clusters
and anything athlon get multi-day discussions.

[snip problem report ]

> The first thing I would like to do is to log the oops message. Right now
> it goes to the console only - it does not appear in the log files
> although syslog sends everything of severity *.info to /var/log/messages.

i guess you've read Documentation/oops-tracing.txt , but if not, it's
a good start.

depending on where the panic happens, the part of the kernel that
would normally write that oops out to disk doesn't run.  

So you've got a few options:

. typing off the screen:  sucks.  a lot.  and is highly error prone.
  and the kernel console blanking mechanism might kick in ( and since
  the kernel has paniced, it won't listed for input signals and unblank
  itself ) but if you've got no other option...  
  
  ( one time a guy took a picture of the oops with a digital camera and
  sent that to me. that was fun.  I don't have any character regognition
  software, but if someone knows of a linux OCR tool that won't mind a
  screenful of hex, i'd like to hear about it )

. serial console:  not bad.  if it's just one machine, you can pass
  parameters to your kernel and capture all kernel messages over the
  serial port.  Documentation/serial-console.txt has all the info you
  need.  

. netconsole: http://people.redhat.com/mingo/netconsole-patches/
  like a serial console, but using your network device instead of a
  serial device.  It's a kernel patch and a convienece script for the
  sender  and a userspace tool for the reciever to display the messages.
  Patching a kernel and setting up yet another tool might be a bit much,
  but man is it cool to see it work :>  

. patch your kernel to support "dump log to swapfile" or "dump log to
  disk".  I haven't set something like this up, but always meant to
  try it out...

Basically the name of the game is to get that oops into a form you can
feed to ksymoops, then hope the backtrace it prints out gives you a
clue.  ( like "oh, the last thing it called was do_scsi_service... maybe
i have a dogdy scisi controller ).

Anybody else know of good ways ( even funny bad ways might be
entertaining) to capture an oops?

==rob

-- 
Rob Latham
                                             A215 0178 EA2D B059 8CDF  
                                             B29D F333 664A 4280 315B



More information about the Beowulf mailing list