[Beowulf] Re: recommendation on crash cart for a cluster
room:fullcluster KVM is not an option I suppose?
lindahl at pbm.com
Fri Oct 9 11:44:06 PDT 2009
> > 1) Console logging. Your machine just crashed. No clue in
> > /var/log/messages. "I wonder if it printed something on the console?"
> > Answer: ipmi and conman (available in an rpm in Red Hat distros).
> I was "planning" on using kdump and a crash-kernel for that.
Which is complicated enough to set up that I've never tried.
IPMI doesn't get you the same functionality as kdump: you can't do
further debugging without the dump. But you do get the oops with
ipmi/conman, which is about the same as getting the stacktrace when a
program segfaults. Personally I'm not really going to debug in the
kernel more than staring hard at the oops, and the oops is the
preferred way of filing a bug against the kernel.
> I see. Yes, you read me correctly: I was putting full faith in
> lm_sensors to do this.
lm_sensors isn't going to tell you about something that happened
between scans. ipmi gives you access to the event log, which will show
you all transient events.
The two do look at the same bus and counters. lm_sensors works in
systems which are missing ipmi.
More information about the Beowulf