[Beowulf] Monitoring crashing machines
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Loic Tortay tortay at cc.in2p3.frTue Sep 9 08:52:16 PDT 2008
- Previous message: [Beowulf] Monitoring crashing machines
- Next message: [Beowulf] Monitoring crashing machines
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Carsten Aulbert wrote: [server console management for many servers with conserver] > We use conserver to get serial console access to almost all our machines. Below is the forwarded answer to your messages from my coworker who's in charge of this. The tools he created for interfacing IPMI and conserver are in the conserver "contrib" section (this may be what you refered to as the IPMI interface for conserver). If you want to contact him directly, his e-mail address is similar to mine, juste replace 'tortay' with 'wernli'. Loïc. -------- Original Message -------- > Initially, conserver.com looked nice and we also found an IPMI > interface for it, but that comes with two downsides: (1) it blocks > IPMI access (I have yet to find out if a secondary user can use SoL > when another user is using this already, but I doubt it) and (2) it > simply does not catch messages appearing in dmesg (simple ones like > plugging in a USB keyboard), but that may be a configuration problem > on our side. We are using conserver(.com) on 6 linux boxes (quite old horses) for managing more than 1500 servers. Most of the latter are being handled by ipmitool SOL. On some - however rare - servers, I believe ipmi access is indeed restricted to one open connection. If you happen to be unlucky on this side (which I seriously doubt), it won't be an issue for the console access, as conserver is designed to let you share these (while logging all their output, which is what we're doing). As for the dmesg issue, you're just missing the "console=ttySx,baudrate" kernel parameter, which should come after "console=tty0" if you want init to talk to the serial line, or before for speaking to the monitor. > Also we tried (r)syslog but somehow this does not get all the messages > either, even when using something like *.* @loghost. this is however true, and is one of the reasons we got into the trouble of having consoles (ipmi or other) open for all our servers at any time. It can be very precious to grep through all the console logfiles to catch that error message which was hidden everywhere else. > For the time being we are experimenting with using "script" in many > "screen" environment which should be able to monitor ipmitool's SoL > output, but somehow that strikes me as inefficient as well. conserver scales extremely well and will be your best friend (if you don't have a dog that is). > So, my question boils down to: How do people solve this problem? feel free to private email me if you need the details -- | Loïc Tortay <tortay at cc.in2p3.fr> - IN2P3 Computing Centre |
- Previous message: [Beowulf] Monitoring crashing machines
- Next message: [Beowulf] Monitoring crashing machines
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
