Beowulf and Big Brother

Jesse Becker jbecker at fryed.net
Mon Nov 11 15:47:02 PST 2002


On Mon, 11 Nov 2002, Ollisl wrote:
> We have 2000 computing nodes and 96 monitoring computers. There is a
> possibility that we have 96 different beowulf clusters there each
> having about 20 PC's but you never know(No decisions yet in that
> matter) ;) I was just wondering if it is reasonable or smart to monitor
> these master nodes with Big Brother? Or is there even ready-made shell-
> scripts for that?

I use BB to monitor several small clusters (5-8 nodes each).  I used to 
use BB to monitor upwards of 100 servers (although not in a clustering 
environment).  Each cluster is a private network, and each head node acts 
as a BBNET and BBDISPLAY host.  Each node runs the client procs, and 
sends it's various status reports back to the head node for the cluster.  
The head node for each cluster does *NOT* run a webserver, and I disable 
the various webpage generation scripts, since I don't need them (comment 
them out in runbb.sh).  Instead, the various BBDISPLAYs have a 
BBRELAY:ip.of.real.BBDISPLAY directive in bb-hosts, so all reports 
collected by the cluster head nodes are immediately sent to the 'read' 
BBDISPLAY host (which does run a webserver).

Out of the box, this gives you a 5 minute status check for each node.  
However, due to various timing issues, up to about 10 minutes can go by
before you see a change on the bb.html and bb2.html pages.  Adjust timing,
especially on the 'real' BBDISPLAY, as needed.

The impact on system load is minimal, and if you really wanted to, you 
could rewrite the checks in C (almost everything is a shell script in BB), 
to reduce the load even more.  Network bandwidth is trivial:  a few 
hundred bytes every 5 minutes.

> I was thinking of something like this: A script runs every once and a
> while gathering data of the status of each slave-node, on each master
> node. Then that data is sent to Big Brother-server, whenever it is
> asked. So every master would be running a BB client.

BB is really good at reporting status.  It isn't so good at storing 
status.  Now, there is a 'data' message type that BB supports, maybe that 
is worth taking a look at?

I'd also take a look at stuff like mrtg, and orca--they both make pretty 
pictures of all those status you gathered, whereas BB has to get something
else to do it.

--Jesse




More information about the Beowulf mailing list