xmlsysd, wulfstat (cluster monitor apps, beta)

Robert G. Brown rgb at phy.duke.edu
Tue Apr 30 16:03:57 PDT 2002


Dearest DBUG (and beowulf list) persons,

Announcing xmlsysd and its companion application, wulfstat.

xmlsysd is a lightweight, throttleable daemon that runs either as a
forking daemon or out of xinetd (the latter by default).  When one
connects to it it accepts a very simple command language that basically
a) configures it to deliver certain kinds of /proc and
systems-call-derived information, generally throttling it so it doesn't
return anything you aren't interested in; and b) causes it to wrap up
that information in an xml-formatted message and return it to the
caller.  Security is managed any of several ways -- by ipchains or
iptables, using tcp wrappers, or using xinetd's internal ip-level
security features (or using ssl or ssh tunnels, for the truly paranoid
or those who want to monitor across a WAN).

wulfstat is a companion client application that uses the xmlsysd's
running on a collection of cluster nodes or LAN workstation hosts to
gather information about the nodes or hosts and present it in a simple
tty (e.g. xterm, konsole) accessible tabular form, updating the table
every N seconds (default 5).  Think of it as vmstat, procinfo, ifconfig,
uptime, free, date, the upper part of the top command, and a bit more
all rolled into a single application so that you can monitor whole
connected sets of this information across an entire cluster with some
reasonable granularity.

Such a tool has obvious uses -- for cluster users, it allows them to
monitor host load averages, look for idle resources, monitor memory
usage, obtain information at a glance about remote cpu type and clock,
cache size, monitor network loads, and even see what fraction of a
cluster node's up time has been spent "doing work" instead of idle.
Most of this is equally useful to systems administrators seeking to
monitor LAN host activity -- crashing systems are often signalled by
anomalous consumption of memory or a steady rise in cpu usage, for
example.

The toolset has now been in use for some time and has been reasonably
stable for several weeks (in spite of my constant poking at it to add
new features or fix tiny problems).  I am therefore releasing it as
version 0.1.0 BETA for wider testing, although at the moment it seems to
be doing fine in production.

It is expected that wulfstat is just the first of a number of monitoring
applications that will be developed that use the daemon.  The daemon,
for example, can also be used to monitor tasks on remote nodes by
username and/or taskname and/or run status, although the application
that actually permits name and task lists to be managed on the user side
and the returned results properly displayed has yet to be written.  Full
GUI and/or web applications should also be straightforward to build,
although this time I learned my lesson and built the tty application
FIRST (for xmlsysd's predecessor, procstatd I built a GUI application
and have regretted it ever after).  It is also expected that at least a
few more features will be added to the daemon (it lacks e.g. lm-sensors
support at this point, for example).

The daemon >>should<< have just enough power to form the basis for a
load balancing or job distribution system -- it can certainly
efficiently provide realtime monitoring of many of the components upon
which a queuing decision might be based, including load, memory and
network utilization, non-root tasks running or waiting to run, and even
CPU type, clock, and cache.  It does not run as a privileged user,
however, and is not designed to manage the actual distribution or
control of jobs.

Still, I expect and hope that wulfstat and xmlsysd together will be
immediately useful to cluster people who install it.  The included
documentation should be adequate although not overwhelming -- there are
man pages for both xmlsysd and wulfstat that are very nearly up to date
-- and I'm available to help with installations that don't seem to work
correctly.  The one "gotcha" of wulfstat is that it does require libxml2
(and hence probably RH 7.2 or better) to run -- you will need to ensure
that this RPM is installed on the hosts where wulfstat is to run.
xmlsysd similarly requires libxml to run on the cluster nodes.

I would greatly appreciate feedback and bug reports, if any, from
anybody who chooses to install it and give it a try.

To retrieve it in RPM form, you can use the URL's below:

   http://www.phy.duke.edu/brahma/xmlsysd-0.1.0-beta.i386.rpm
   http://www.phy.duke.edu/brahma/xmlsysd-0.1.0-beta.src.rpm

   http://www.phy.duke.edu/brahma/wulfstat-0.1.0-beta.i386.rpm
   http://www.phy.duke.edu/brahma/wulfstat-0.1.0-beta.src.rpm

If anybody needs it in tarball form (not in source or binary rpm form)
they should contact me directly.  I can easily generate one (or it can
be extracted from the source rpm) but I guarantee the instructions for
installation or configuration -- they are encapsulated already in the
RPMs.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list