[Beowulf] cli alternative to cluster top?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Donald Becker becker at scyld.comSun Nov 30 09:19:29 PST 2008
- Previous message: [Beowulf] cli alternative to cluster top?
- Next message: [Beowulf] cli alternative to cluster top?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 30 Nov 2008, Robert G. Brown wrote: > On Sat, 29 Nov 2008, Greg Kurtzer wrote: > > > Warewulf has a real time top like command for the cluster nodes and > > has been known to scale up to the thousands of nodes: > > > > http://www.runlevelzero.net/images/wwtop-screenshot.png > > On Wed, Nov 26, 2008 at 12:39 PM, Thomas Vixel <tvixel at gmail.com> wrote: > >> I've been googling for a top-like cli tool to use on our cluster, but > >> the closest thing that comes up is Rocks' "cluster top" script. That > >> could be tweaked to work via the cli, but due to factors beyond my > >> control (management) all functionality has to come from a pre-fab > >> program rather than a software stack with local, custom modifications. > >> > >> I'm sure this has come up more than once in the HPC sector as well -- > >> could anyone point me to any top-like apps for our cluster? > >> > >> For reference, wulfware/wulfstat was nixed as well because of the > >> xmlsysd dependency. > > That's fine, but I'm curious. How do you expect to run a cluster > information tool over a network without a socket at both ends? If not > xmlsysd, then something else -- sshd, xinetd, dedicated or general > purpose, where the latter almost certainly will have have higher > overhead? Or are you looking for something with a kernel level network > interface, more like scyld? The theoretical architecture of our system has all of the process control communication going over persistent TCP/IP sockets. The master node has a 'master daemon'. As compute nodes boot and join the cluster their 'slave daemon' opens a single TCP socket to the master daemon. Having a persistent connection is a key element to performance. It eliminates the cost and delay of name lookup, reverse name lookup, socket establishment and authentication. (Example: The MPICH people learned this lesson -- MPD is much faster than MPICH v1 using 'rsh'.) We optimized our system extensively, down to the number of bytes in efficiently constructed and parsed packets. But to get scalability to thousands of nodes and processes, we found that we needed to "cheat". While connections are established to the user-level daemon, we optimize by having some of the communication handled by a kernel module that shares the socket. The optimization isn't needed for 'only' hundreds of nodes and processes, or if you are willing to dedicate most of a very powerful head node to process control. But 'thousands' is much more challenging than 'hundreds'. -- Donald Becker becker at scyld.com Penguin Computing / Scyld Software www.penguincomputing.com www.scyld.com Annapolis MD and San Francisco CA
- Previous message: [Beowulf] cli alternative to cluster top?
- Next message: [Beowulf] cli alternative to cluster top?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
