[Beowulf] Re: [Linux-HA] Couldn't get watchdog to work

Alex Vrenios alex at DSRLab.com
Tue Dec 28 09:14:16 PST 2004


> -----Original Message-----
> Paul Chen wrote:
> > Both nodes did restart 
> > heartbeat but none of them reboot or shut down. Am I doing 
> > something wrong?
> >
> Alan Robertson wrote:
> The watchdog timer will only kill the system if heartbeat goes insane.
> It didn't.  So, the watchdog timer is happy.
> 
> At this point in time, the watchdog timer is not a 
> replacement for a STONITH device.
>
Which is exactly what I am looking into (the STONITH device)...

I see two solutions, one hardware and one software. The hardware solution
looks expensive, but I believe the software solution will help Mr. Chen
(above), and would appreciate comments.

I would have my "backup" system execute a command as part of its attempts to
assume the identity, responsibilities and resources of the "primary" system.
The command is run from backup, as follows:

   root at backup> ssh root at primary shutdown -h now

This will not work in all cases, but it should work in cases like the above.
A hardware solution is more general, but it doesn't hurt to run this command
in any case.

Alex Vrenios
DSRLab





More information about the Beowulf mailing list