[Beowulf] wulfstat, wulflogger fix, new features
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue May 18 07:51:00 PDT 2004
- Previous message: [Beowulf] memory prefetch on Athlon64
- Next message: [Beowulf] wulfware list
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Karl Bellve posted a bug in wulflogger that caused it to miss connecting to the first host in the wulfhosts list until the second pass. He also requested a feature that would let wulflogger execute only a single time and then exit so that it could be used in e.g. a cron script to graze for downed hosts in a cluster easily. I found the bug (a legacy from wulfstat where I closed stdin pre-curses, which caused the first port SUCCESSFULLY returned from socket() to be returned as 0 (being reused) which actually doesn't work. This is a bug in socket() I personally would say, but either way, when I eliminated the close statement wulflogger now connects to the first host with no problem first try. I implemented the request by adding a -c count flag to both wulflogger and wulfstat. -c 1 is the behavior requested, but somebody may have use for the greater flexibility permitted by it being a variable. I also updated both Usage and the man page for both applications, in the case of wulflogger including an example fragment that might go into a cron job to graze for down hosts and in the both cases adding a short section on debugging (I've written the code to be tremendously self-debugging to make it relatively easy to maintain or augment). Still to implement: a) I want to add a ping to the connection engine to precede the xmlsysd connection attempt. ping actually is a bit of a pain -- the usual iputils implementation requires suid root. nmap, however, has three or four distinct ways of "pinging" that don't require root privileges, and eventually I'll try stealing one although the code is a lot more complex than I'd like for a simple task. Anybody with a SHORT/SIMPLE version of userspace (e.g. ack) ping in C should feel free to let me know where to find it. b) I need to do something about tracking running jobs in wulflogger, and figure out a better display for them in wulfstat. c) I still have fantasies of writing gwulfstat on top of gtk. This could be a very cool application. d) And wulfweb needs love as well, although that is straightforward web programming at this point -- wulflogger is the real tool involved. Anyway, those of you who are using it, enjoy. Those who aren't, consider giving xmlsysd/wulf[stat,logger,web] a try. It is a fairly simple way to monitor an entire cluster (tested with order <100 hosts, don't know how or if it scales to ~1000) in a lightweight fashion with adjustable time granularity. Those of you who are also LAN managers might consider using it to monitor your LAN status as well. The default wulfstat/wulflogger display is something like: # Name Status Timestamp load1 load5 load15 rx byts tx byts si so pi po ctxt intr users lilith up 1084891476.44 0.01 0.04 0.01 9761 7171 0 0 0 22 148 170 asixteencharname up 1084891476.44 0.01 0.04 0.01 9761 7171 0 0 0 22 148 170 lucifer up 1084891610.24 0.00 0.02 0.00 226 709 0 0 0 9 135 104 uriel up 1084887238.42 0.00 0.00 0.00 1030 1672 0 0 0 5 36 114 caine down eve up 1084888284.75 0.00 0.00 0.00 685 1168 0 0 0 11 21 109 serpent up 1084877687.98 0.00 0.00 0.00 1116 1707 0 0 0 6 41 187 tyrial up 1084891762.44 0.00 0.00 0.00 3146 3064 0 0 0 9 208 218 abel down archangel up 1084888715.71 0.00 0.00 0.00 119 1376 0 0 0 30 28 105 (used to look at my home cluster, with one machine turned off and one machine down awaiting a reinstall.) There is a display that only looks at load, a display only for network traffic, one for network usage, even one that tells you uptime and duty cycle (cpu cycles used/cpu cycles available) from the last boot. All GPL v2b... http://www.phy.duke.edu/~rgb/Beowulf/beowulf.php I suggest rebuilding the source rpm or working from tarball, although people running RH 9 can probably install the binary rpms without disaster. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] memory prefetch on Athlon64
- Next message: [Beowulf] wulfware list
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
