[Beowulf] job scheduler and health monitoring system
ajdecon at ajdecon.org
Fri Jan 10 13:22:46 PST 2014
The "common stack" seems to vary depending on what industry you're looking
at. For example, Grid Engine seems to be a really popular job scheduler in
bioinformatics, even though I get the impression that it's on the way out
in a lot of other industries.
I think most cluster management tools are fairly mature right now. Some are
more actively developed than others, but I don't think "what's hot" is
necessarily a good way to choose your tools.
More important is whether someone on your team is familiar with those
tools, or with the languages they're written in; or whether you can get
support easily if you don't have expertise yourself.
For what it's worth, my current "favorites" for scheduling and monitoring
* Job scheduler: SLURM
* Light-weight health checks between jobs: Warewulf NHC
* Detailed performance monitoring: Ganglia
Neither NHC or Ganglia do temperature monitoring out-of-the-box (last I
checked), but they're both really easy to extend with something as easy as
On Fri, Jan 10, 2014 at 12:36 PM, reza azimi <reza.c.azimi at gmail.com> wrote:
> hello guys,
> I'm looking for a state of art job scheduler and health monitoring for my
> beowulf cluster and due to my research I've found many of them which made
> me confused. Can you help or recommend me the ones which are very hot and
> they are using in industry?
> I have lm-sensors package on my servers and wanna a health monitoring
> program which record the temp as well, all I found are mainly record
> resource utilization.
> Our workload are mainly MPI based benchmarks and we want to test some
> hadoop benchmarks in future.
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf