Beowulf and Big Brother
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Leif Nixon nixon at nsc.liu.seWed Nov 13 01:51:58 PST 2002
- Previous message: Beowulf and Big Brother
- Next message: InterMezzo persistent file cache - a way to go?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
canon at pookie.nersc.gov writes: > We are using netsaint/nagios to monitor our cluster (a little over > 300 nodes). Netsaint works well for monitoring services and basic > host responds. [...] We just recently started using ganglia to > monitor performance. Sadly, there doesn't seem to be a good way of getting Nagios to monitor data from Ganglia. You can feed the data into Nagios as passive service checks, which sort of works, but you can't do passive host checks. So, if you have several clusters and want Nagios to notify you if a node dies, you need to set up Nagios in a distributed configuration, with a Nagios server on each cluster's front-end. That really is a pain, since you have to duplicate much of the configuration between the central Nagios server and the distributed ones. Or rather, you need to duplicate it *and* subtly change it. I started to write scripts to do this in an automated fashion, but after a while threw my hands up in disgust. No fun. I'm having some thoughts about hacking monitoring abilities into Ganglia, but haven't gotten around to actually doing anything about it yet. -- Leif Nixon Systems expert ------------------------------------------------------------ National Supercomputer Centre Linkoping University ------------------------------------------------------------
- Previous message: Beowulf and Big Brother
- Next message: InterMezzo persistent file cache - a way to go?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
