Beowulf and Big Brother
jbecker at fryed.net
Mon Nov 11 15:47:02 PST 2002
On Mon, 11 Nov 2002, Ollisl wrote:
> We have 2000 computing nodes and 96 monitoring computers. There is a
> possibility that we have 96 different beowulf clusters there each
> having about 20 PC's but you never know(No decisions yet in that
> matter) ;) I was just wondering if it is reasonable or smart to monitor
> these master nodes with Big Brother? Or is there even ready-made shell-
> scripts for that?
I use BB to monitor several small clusters (5-8 nodes each). I used to
use BB to monitor upwards of 100 servers (although not in a clustering
environment). Each cluster is a private network, and each head node acts
as a BBNET and BBDISPLAY host. Each node runs the client procs, and
sends it's various status reports back to the head node for the cluster.
The head node for each cluster does *NOT* run a webserver, and I disable
the various webpage generation scripts, since I don't need them (comment
them out in runbb.sh). Instead, the various BBDISPLAYs have a
BBRELAY:ip.of.real.BBDISPLAY directive in bb-hosts, so all reports
collected by the cluster head nodes are immediately sent to the 'read'
BBDISPLAY host (which does run a webserver).
Out of the box, this gives you a 5 minute status check for each node.
However, due to various timing issues, up to about 10 minutes can go by
before you see a change on the bb.html and bb2.html pages. Adjust timing,
especially on the 'real' BBDISPLAY, as needed.
The impact on system load is minimal, and if you really wanted to, you
could rewrite the checks in C (almost everything is a shell script in BB),
to reduce the load even more. Network bandwidth is trivial: a few
hundred bytes every 5 minutes.
> I was thinking of something like this: A script runs every once and a
> while gathering data of the status of each slave-node, on each master
> node. Then that data is sent to Big Brother-server, whenever it is
> asked. So every master would be running a BB client.
BB is really good at reporting status. It isn't so good at storing
status. Now, there is a 'data' message type that BB supports, maybe that
is worth taking a look at?
I'd also take a look at stuff like mrtg, and orca--they both make pretty
pictures of all those status you gathered, whereas BB has to get something
else to do it.
More information about the Beowulf