[Beowulf] Re: [Linux-HA] Couldn't get watchdog to work
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Alex Vrenios alex at DSRLab.comTue Dec 28 09:14:16 PST 2004
- Previous message: [Beowulf] Systems administration survey
- Next message: [Beowulf] Benchmarking a Cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> -----Original Message----- > Paul Chen wrote: > > Both nodes did restart > > heartbeat but none of them reboot or shut down. Am I doing > > something wrong? > > > Alan Robertson wrote: > The watchdog timer will only kill the system if heartbeat goes insane. > It didn't. So, the watchdog timer is happy. > > At this point in time, the watchdog timer is not a > replacement for a STONITH device. > Which is exactly what I am looking into (the STONITH device)... I see two solutions, one hardware and one software. The hardware solution looks expensive, but I believe the software solution will help Mr. Chen (above), and would appreciate comments. I would have my "backup" system execute a command as part of its attempts to assume the identity, responsibilities and resources of the "primary" system. The command is run from backup, as follows: root at backup> ssh root at primary shutdown -h now This will not work in all cases, but it should work in cases like the above. A hardware solution is more general, but it doesn't hurt to run this command in any case. Alex Vrenios DSRLab
- Previous message: [Beowulf] Systems administration survey
- Next message: [Beowulf] Benchmarking a Cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
