Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] cli alternative to cluster top?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Thomas Vixel tvixel at gmail.com
Mon Dec 1 15:22:35 PST 2008


That does sound interesting, but more for some of my personal projects.

It wouldn't work for the situation at hand because:
1) It sounds like it introduces a SPF (the head node).
2) Giving our developers cluster-wide 'killall' & 'kill' functionality
makes me cringe.
    Most of them only know just enough about Linux to be dangerous.
3) It would require completely reworking our current cluster solution;
    a daunting task to say the least.
4) There isn't much love for commercial & non-OSS software at our company.

On 11/30/08, Donald Becker <becker at scyld.com> wrote:
> On Wed, 26 Nov 2008, Thomas Vixel wrote:
>
>> I've been googling for a top-like cli tool to use on our cluster, but
>> the closest thing that comes up is Rocks' "cluster top" script. That
>> could be tweaked to work via the cli, but due to factors beyond my
>> control (management) all functionality has to come from a pre-fab
>> program rather than a software stack with local, custom modifications.
>>
>> I'm sure this has come up more than once in the HPC sector as well --
>> could anyone point me to any top-like apps for our cluster?
>
> Most remote job mechanisms only think about starting remote processes, not
> about the full create-monitor-control-report functionality.
>
> The Scyld system (currently branded "Clusterware") defaults to using a
> built-in unified process space.  That presents all of the processes
> running over the cluster in a process space on the master machine, with
> fully POSIX semantics.  It neatly solves your need with... the standard
> 'top' program.
>
> Most scheduling systems also have a way to monitor processes that they
> start, but I haven't found one that takes advantage of all information
> available and reports it quickly/efficiently.
>
> There are many advantages of the Scyld implementation
>   -- no new or modified process management tools need to be written.
>     Standard utilities such as 'top' and 'ps' work unmodified,
>     as well as tools we didn't specifically plan for e.g. GUI versions of
>     'pstree'.
>   -- The 'killall' program works over the cluster, efficiently.
>   -- All signals work as expected, including 'kill -9'.  (Most remote
>      process starting mechanisms will just kill off the local endpoint,
>      leaving the remote process running-but-confused.)
>   -- Process groups and controlling-TTY groups works properly for job
>      control and signals
>   -- Running jobs report their status and statistics accurately -- an
>      updated 'rusage' structure is sent once per second, and a final
>      rusage structure and exit status is sent when the process terminates.
>
> The "downside" is that we explicitly use Linux features and details,
> relying on kernel-version-specific features.  That's an issue if it's a
> one-off hack, but we've been using this approach continuously for
> a decade, since the Linux 2.2 kernel and over multiple
> architectures.  We've been producing supported commercial releases
> since 2000, longer than anyone else in the business.
>
> --
> Donald Becker				becker at scyld.com
> Penguin Computing / Scyld Software
> www.penguincomputing.com		www.scyld.com
> Annapolis MD and San Francisco CA
>
>



More information about the Beowulf mailing list