cluster monitoring (was Re: swap or not?)

Velocet math at velocet.ca
Tue Feb 5 11:25:18 PST 2002


On Tue, Feb 05, 2002 at 04:54:19PM +0100, Alan Scheinine's all...
> 
> The Big Brother URL
> 
> http://www.bb4.com/

In my old ISP days, we used SNIPS (nee Nocol) for monitoring such
things as well. It has some features for clients that Netsaint and
the like dont have - i do get notified of machines with excessive page
faults and swap usage for eg. A simple perl script (adapted for many
architectures) is at the core of this for the client side - that code
could be ripped out if you really wanted to do it yourself - but
the client reporting is easy enough to probe (it spews out upon connect
to port 5355 on the client). Quite flexible and simple. 

There is a 'server' side app of course too that reports status in a curses
console or to a CGI-generated webpage - I've actually modified the mozbot IRC
bot that the mozilla coders use to listen for nocol status updates and spew
them into the admin IRC channel we hang out in at work - pagers are too slow
(and we have to filter stuff to not be spammed all night by transient
unimportant 'critical' events - there will be false positives in ALL
monitoring - how much you want to wake up for them is the important part :)
Having it in IRC where many people are sitting all day and glance at every few
minutes (or notice when it scrolls) works quite well for us.

If anyone wants the code for this mozbot mod, feel free to ask.

/kc

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 



More information about the Beowulf mailing list