Beowulf and Big Brother
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jesse Becker jbecker at fryed.netMon Nov 11 15:47:02 PST 2002
- Previous message: Beowulf and Big Brother
- Next message: Beowulf and Big Brother
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 11 Nov 2002, Ollisl wrote: > We have 2000 computing nodes and 96 monitoring computers. There is a > possibility that we have 96 different beowulf clusters there each > having about 20 PC's but you never know(No decisions yet in that > matter) ;) I was just wondering if it is reasonable or smart to monitor > these master nodes with Big Brother? Or is there even ready-made shell- > scripts for that? I use BB to monitor several small clusters (5-8 nodes each). I used to use BB to monitor upwards of 100 servers (although not in a clustering environment). Each cluster is a private network, and each head node acts as a BBNET and BBDISPLAY host. Each node runs the client procs, and sends it's various status reports back to the head node for the cluster. The head node for each cluster does *NOT* run a webserver, and I disable the various webpage generation scripts, since I don't need them (comment them out in runbb.sh). Instead, the various BBDISPLAYs have a BBRELAY:ip.of.real.BBDISPLAY directive in bb-hosts, so all reports collected by the cluster head nodes are immediately sent to the 'read' BBDISPLAY host (which does run a webserver). Out of the box, this gives you a 5 minute status check for each node. However, due to various timing issues, up to about 10 minutes can go by before you see a change on the bb.html and bb2.html pages. Adjust timing, especially on the 'real' BBDISPLAY, as needed. The impact on system load is minimal, and if you really wanted to, you could rewrite the checks in C (almost everything is a shell script in BB), to reduce the load even more. Network bandwidth is trivial: a few hundred bytes every 5 minutes. > I was thinking of something like this: A script runs every once and a > while gathering data of the status of each slave-node, on each master > node. Then that data is sent to Big Brother-server, whenever it is > asked. So every master would be running a BB client. BB is really good at reporting status. It isn't so good at storing status. Now, there is a 'data' message type that BB supports, maybe that is worth taking a look at? I'd also take a look at stuff like mrtg, and orca--they both make pretty pictures of all those status you gathered, whereas BB has to get something else to do it. --Jesse
- Previous message: Beowulf and Big Brother
- Next message: Beowulf and Big Brother
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
