[Beowulf] Re: Monitoring crashing machines

Robert G. Brown rgb at phy.duke.edu
Tue Sep 9 16:41:28 PDT 2008


On Tue, 9 Sep 2008, David Mathog wrote:

> word.  In the old days some of those crash events spewed garbage to the
> printer, and that resulted in a ream of nonsense on the floor, and more
> often than not, the paper mashed into an accordian behind a pinfeed jam.

Nobody said it was EASY back then, right?  Even when a system DIDN'T
crash, it dump reams of fanfold into the takeup box, most of it never
examined by human mind. ;-)

The real issue is whether or not the kernel dies a hard death or dies
gently enough to issue messages.  Some crashes give you a hint at the
console, in log files, whereever.  If the kernel lives long enough to do
this, you can find SOME way to get access to it.  If it dies hard,
though, it doesn't really matter what you put on the system, there won't
be any messages no matter what the medium you manage to attach.

Beyond that there are many ways to get a non-dead kernel to write
something to where you can see it on a crash.  If one is difficult, try
another.

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977



More information about the Beowulf mailing list