[Beowulf] kdump / kexec to optain crash dumps from randomly crashing nodes.

Paolo Supino paolo.supino at gmail.com
Thu Oct 9 12:59:16 PDT 2008


Hi Raul

  Can it be that console messages aren't being sent to VTY (standard PC
monitor), but some other output line?
  I don't know what KVM you're using, but most KVMs keep an open line
only to the server they are displaying at the moment (and will send back
 a signal if they get a probe from an another system they are connected
to) and won't buffer display output from the other systems they are
connected to. What I suggested is to connect a real serial terminal to
the Linux system. Make sure that all console messages (including kernel)
are sent to the serial console (the default is to send them to VTY which
is the standard display in a PC. If you can't connect a real serial
terminal than the closest thing to it would be a PC running putty (or
any other terminal emulation software) listening on COM1 and is
connected to the serial port of the server.







--
ttyl
Paolo





Rahul Nabar wrote:
> Hi Paolo,
> 
> The funny thing is that the console remains blank. We have all these
> systems connected to a KVM and the kvm shows the system as actually
> disconnected post the crash.
> 
> That is what makes it so hard to debug. No screen output at all.
> 
> -Rahul
> 
> On Thu, Oct 9, 2008 at 2:07 PM, Paolo Supino <paolo.supino at gmail.com> wrote:
>> Hi Rahul
>>
>>  Did you try to redirect console to a serial port? If a system crashes
>> and all console messages (including kernel) will be sent to the serial
>> console that will keep displaying the messages it received until the
>> system is power cycled ...
>>
>>
>>
>>
>>
>> --
>> ttyl
>> Paolo
>>
>>
>>
>> Rahul Nabar wrote:
>>> On my Centos system I installed kexec/kdump to investigate the cause of
>>> some random system-crashes by getting access to a crash-dump. I installed
>>> the rpm for kexec and then made the change to grub.conf that reserves the
>>> additional memory for the new kernel.
>>>
>>> Also configured kdump.conf. I start the kexec service.and then I tried to
>>> simulate a crash by echo c to sysrq-trigger.
>>>
>>> The system does crash and then after a while reboots itself. But I see no
>>> vmcore when it coms back up. /var/crash is empty. This is when I tried to
>>> write to local drive.
>>>
>>> I also tried a nfs write but then still no success.
>>>
>>> Any idea what could be missing in my steps? Or any other debug
>>> suggestions? Any other kdump users on Beowulf?
>>>
>>




More information about the Beowulf mailing list