[Beowulf] Re: [Ganglia-general] Configuring nodes on a scyld cluster

Mon Aug 24 04:53:10 PDT 2009

On Mon, Aug 24, 2009 at 05:40, Michael Muratet<mmuratet at hudsonalpha.org> wrote:
> Greetings
>
> I'm not sure if this is more appropriate for the beowulf or ganglia
> list, please forgive a cross-post. I have been trying to get ganglia
> (v 3.0.7) to record info from the nodes of my scyld cluster. gmond was

If I recall, Scyld clusters (and the successor, ClusterWare), run a
modified version of Ganglia, mostly for the data display, but not
collection.  For data collection, they run their own program called
'bproc', which does some of the same things as gmond.  There was a
short discussion about bproc/gmond in the Beowulf mailing list about a
year or year and half ago.

Also, the hacked-up version of ganglia that they ship is based off
2.5.7 I think, so there is a good reason to upgrade.

However, it should work, but with some tweaking.

> not installed on any of the compute nodes nor was gmond.conf in /etc
> of any of the compute nodes when we got it from the vendor. I didn't
> see much in the documentation about configuring nodes but I did find a
> 'howto' at http://www.krazyworks.com/installing-and-configuring-
> ganglia/. I have been testing on one of the nodes as follows. I copied
> gmond from /usr/sbin on the head node to the subject compute node /usr/
> sbin. I ran gmond --default_config and saved the output and changed it
> thus:
>
> scyld:etc root$ bpsh 5 cat /etc/gmond.conf
> /* This configuration is as close to 2.5.x default behavior as possible
>    The values closely match ./gmond/metric.h definitions in 2.5.x */
> globals {
>   daemonize = yes
>   setuid = yes
>   user = nobody
>   debug_level = 0
>   max_udp_msg_len = 1472
>   mute = no
>   deaf = no
>   host_dmax = 0 /*secs */
>   cleanup_threshold = 300 /*secs */
>   gexec = no
> }
>
> /* If a cluster attribute is specified, then all gmond hosts are
> wrapped inside
>  * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all
> <HOSTS> will
>  * NOT be wrapped inside of a <CLUSTER> tag. */
> cluster {
>   name = "mendel"
>   owner = "unspecified"
>   latlong = "unspecified"
>   url = "unspecified"
> }
>
> /* The host section describes attributes of the host, like the
> location */
> host {
>   location = "unspecified"
> }
>
> /* Feel free to specify as many udp_send_channels as you like.  Gmond
>    used to only support having a single channel */
> udp_send_channel {
>   port = 8649
>   host = 10.54.50.150 /* head node's IP */
> }
>
> /* You can specify as many udp_recv_channels as you like as well. */
>
> /* You can specify as many tcp_accept_channels as you like to share
>    an xml description of the state of the cluster */
> tcp_accept_channel {
>   port = 8649
> }
>
> I modified gmond on the head node thus:
>
> /* This configuration is as close to 2.5.x default behavior as possible
>    The values closely match ./gmond/metric.h definitions in 2.5.x */
> globals {
>   daemonize = yes
>   setuid = yes
>   user = nobody
>   debug_level = 0
>   max_udp_msg_len = 1472
>   mute = no
>   deaf = no
>   host_dmax = 0 /*secs */
>   cleanup_threshold = 300 /*secs */
>   gexec = no
> }
>
> /* If a cluster attribute is specified, then all gmond hosts are
> wrapped inside
>  * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all
> <HOSTS> will
>  * NOT be wrapped inside of a <CLUSTER> tag. */
> cluster {
>   name = "mendel"
>   owner = "unspecified"
>   latlong = "unspecified"
>   url = "unspecified"
> }
>
> /* The host section describes attributes of the host, like the
> location */
> host {
>   location = "unspecified"
> }
>
> /* Feel free to specify as many udp_send_channels as you like.  Gmond
>    used to only support having a single channel */
>
> /* You can specify as many udp_recv_channels as you like as well. */
> udp_recv_channel {
>   port = 8649
> }
>
> /* You can specify as many tcp_accept_channels as you like to share
>    an xml description of the state of the cluster */
> tcp_accept_channel {
>   port = 8649
> }
>
> I started gmond on the compute node bpsh 5 gmond and restarted gmond
> and gmetad. I don't see my node running gmond. ps -elf | grep gmond on
> the compute node returns nothing. I tried to add gmond as a service on
> the compute node with the script at the krazy site  but I get:
>
> scyld:~ root$ bpsh 5 chkconfig --add gmond
> service gmond does not support chkconfig

Looks like the startup script for gmond doesn't natively support
chkconfig.  This isn't a huge problem.  You will, however, have to
manually create symlinks in /etc/rc3.d that point into /etc/init.d.
Basically, you want two links that look something like this:

  /etc/rc3.d/S99gmond -> /etc/init.d/gmond
  /etc/rc3.d/K01gmond -> /etc/init.d/gmond

I'd do this on the head node *only*.

Scyld clusters are a bit funny if you have never used them before.
There's almost *nothing* on the compute nodes except local data
partitions, and they don't run a 'normal' userspace either.

> and
>
> scyld:~ root$ bpsh 5 service gmond start
> /sbin/service: line 3: /etc/init.d/functions: No such file or directory

Unsurprising...  Run 'bpsh 5 ls -l /etc' and you will see why this
error occurs:  there's probably almost nothing in /etc at all.

> I am at a loss over what to try next, it seems this should work. Any
> and all suggestions will be appreciated.

Try running gmond directly, with debugging turned on (as a test):

  bpsh 5 /usr/sbin/gmond -c /etc/gmond.conf -d 2

and see what it complains about.

-- 
Jesse Becker
GPG Fingerprint -- BD00 7AA4 4483 AFCC 82D0  2720 0083 0931 9A2B 06A2