[Beowulf] kdump / kexec to optain crash dumps from randomly crashing nodes.

Rahul Nabar rpnabar at gmail.com
Thu Oct 9 12:19:20 PDT 2008


Hi Paolo,

The funny thing is that the console remains blank. We have all these
systems connected to a KVM and the kvm shows the system as actually
disconnected post the crash.

That is what makes it so hard to debug. No screen output at all.

-Rahul

On Thu, Oct 9, 2008 at 2:07 PM, Paolo Supino <paolo.supino at gmail.com> wrote:
> Hi Rahul
>
>  Did you try to redirect console to a serial port? If a system crashes
> and all console messages (including kernel) will be sent to the serial
> console that will keep displaying the messages it received until the
> system is power cycled ...
>
>
>
>
>
> --
> ttyl
> Paolo
>
>
>
> Rahul Nabar wrote:
>> On my Centos system I installed kexec/kdump to investigate the cause of
>> some random system-crashes by getting access to a crash-dump. I installed
>> the rpm for kexec and then made the change to grub.conf that reserves the
>> additional memory for the new kernel.
>>
>> Also configured kdump.conf. I start the kexec service.and then I tried to
>> simulate a crash by echo c to sysrq-trigger.
>>
>> The system does crash and then after a while reboots itself. But I see no
>> vmcore when it coms back up. /var/crash is empty. This is when I tried to
>> write to local drive.
>>
>> I also tried a nfs write but then still no success.
>>
>> Any idea what could be missing in my steps? Or any other debug
>> suggestions? Any other kdump users on Beowulf?
>>
>
>



More information about the Beowulf mailing list