[Beowulf] Compute Node OS on Local Disk vs. Ram Disk
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Bogdan Costescu Bogdan.Costescu at iwr.uni-heidelberg.deTue Sep 30 12:04:51 PDT 2008
- Previous message: [Beowulf] Compute Node OS on Local Disk vs. Ram Disk
- Next message: [Beowulf] Compute Node OS on Local Disk vs. Ram Disk
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 30 Sep 2008, Jon Forrest wrote: > The trouble with rebooting nodes is that this takes human energy. When using a queueing system, rebooting nodes can be automated easily: - the node to be rebooted is switched to "offline" state so that the scheduler doesn't attempt to start new jobs on it - wait until the currently running job finishes - reboot - put the node back "online" so that the scheduler can again start jobs on it All the steps except the reboot itself are interactions with the queueing system and can happen on the frontend/master node only. The reboot step requires some interaction with the node, either remote shell access to run /sbin/reboot or some other way to restart it (IPMI, remote power management, etc.) > It's easier to keep nodes up as long possible With the increasing number of nodes in clusters these days, the overall failure rate also increases. It's much easier to deal with failures when they are not seen as a catastrophe, "twist my fingers and hope that the node is coming up properly and everything still works" kind, but rather as nodes simply going up and down. > This is a good idea. Can you write more about this? The e-mail from Brian Oborn has described in a few words the principle, probably better than I could have done it myself. If you want more details, ask more precise questions and I guess that any of us could answer. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu at iwr.uni-heidelberg.de
- Previous message: [Beowulf] Compute Node OS on Local Disk vs. Ram Disk
- Next message: [Beowulf] Compute Node OS on Local Disk vs. Ram Disk
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
