Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] using watchdog timers to reboot a hung systemautomagically: Good idea or bad?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Greg Lindahl lindahl at pbm.com
Fri Oct 23 11:23:17 PDT 2009


On Fri, Oct 23, 2009 at 01:01:05PM -0500, Rahul Nabar wrote:

> 2. Some errors are hardware precipitated. Aging, out-of-warranty
> aging, hardware can sometimes need such a reboot compromise for
> one-off random errors.
> 
> Maybe all the "nice" clusters out there never have this issue but for
> me it is fairly common. Just confessing.

Why, exactly, are you assuming that your freezes are one-off random
errors due to aging hardware? Sounds like you're either guessing, or
you _are_ doing forensics, but aren't calling it forensics.

-- greg







More information about the Beowulf mailing list