[Beowulf] disabling bad nodes
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Reuti reuti at staff.uni-marburg.deMon Mar 27 12:19:34 PST 2006
- Previous message: [Beowulf] disabling bad nodes
- Next message: [Beowulf] disabling bad nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, Am 26.03.2006 um 21:07 schrieb James Rustad: > Guys > This is a strange question, but > Is there any way to disable a bad node in PBS without being the > system administrator? > I am lining up about 50 jobs in the queue and they fail > sequentially when they hit > the bad node. This often seems to happen on the weekends when nobody > is around to reboot the node. > > Can I specify within PBS "don't use node015" or something like that. > Thanks > Jim Rustad > ps > I may be using TORQUE rather than PBS, by the way although I can't answer your question directly: what is causing this black hole in the cluster? I faced this with a filled /tmp on some nodes from time to time. As we are using SGE, I use their load-sensor facility to check the free space there and put the node into alarm- state otherwise, i.e. disabling the queues on this node. Maybe something similar could be implemented also with Torque, to get some self-healing at weekends. - Reuti > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] disabling bad nodes
- Next message: [Beowulf] disabling bad nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
