Nagios で Ganglia を 監視

Gangliaにあるcheck_ganglia.py をコピーすれば終わりみたいだけど見つからない。


[root@L2 nagios]# find / -name *ganglia* -print
/usr/lib/libganglia-3.0.7.so.0
/usr/lib/libganglia.so
/usr/lib/libganglia-3.0.7.so.0.0.0
/usr/share/ganglia
/usr/share/ganglia/templates/Rocks/images/ganglia.jpg
/usr/share/ganglia/ganglia.php
/usr/share/ganglia/get_ganglia.php
/usr/share/doc/ganglia-3.0.7
/usr/share/doc/ganglia-web-3.0.7
/usr/include/ganglia.h
/usr/bin/ganglia-config
/etc/ganglia
/etc/httpd/conf.d/ganglia.conf
/var/lib/ganglia
/var/www/html/ganglia
[root@L2 nagios]#


yumでインストールしているから・・・・
しょうがないのでソースから持ってくる(スマートじゃない?)


[root@L2 tmp]# wget http://sourceforge.net/projects/ganglia/files/ganglia%20monitoring%20cor
e/3.1.7/ganglia-3.1.7.tar.gz/download

Resolving sourceforge.net... 216.34.181.60
Connecting to sourceforge.net|216.34.181.60|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.1.7/ganglia-3.1.7.tar.gz?r=&ts=1300890563&use_mirror=jaist [following]

Resolving downloads.sourceforge.net... 216.34.181.59
Connecting to downloads.sourceforge.net|216.34.181.59|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://jaist.dl.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.1.7/ganglia-3.1.7.tar.gz [following]

Resolving jaist.dl.sourceforge.net... 150.65.7.130, 2001:200:141:feed::feed
Connecting to jaist.dl.sourceforge.net|150.65.7.130|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1278023 (1.2M) [application/x-gzip]
Saving to: `ganglia-3.1.7.tar.gz'

100%[==================================================>] 1,278,023 341K/s in 3.7s

2011-03-23 23:29:21 (341 KB/s) - `ganglia-3.1.7.tar.gz' saved [1278023/1278023]

[root@L2 tmp]#
[root@L2 tmp]# tar zxf ./ganglia-3.1.7.tar.gz
[root@L2 tmp]# cd ./ganglia-3.1.7/contrib/
[root@L2 contrib]#
[root@L2 contrib]# ls
check_ganglia.py ganglia_gmond.xml.in README.contrib
diskfree_report.php ganglia-hosts-lowercase.sh README-removespikes
ganglia_gmetad.xml ganglia-rrd-modify.pl removespikes.pl

[root@L2 contrib]# cp ./check_ganglia.py /usr/lib/nagios/plugins/

ゲッツ!!

設定ファイルの追加


[root@L2 nagios]# vi nagios.cfg

54 # Extended host/service info definitions are now stored along with
55 # other object definitions:
56 #cfg_file=/etc/nagios/hostextinfo.cfg
57 #cfg_file=/etc/nagios/serviceextinfo.cfg
58 cfg_file=/etc/nagios/ganglia-services.cfg

設定ファイル


[root@L2 nagios]# vi ./ganglia-services.cfg

define servicegroup {
servicegroup_name ganglia-metrics
alias Ganglia Metrics
}

define command {
command_name check_ganglia
command_line $USER1$/check_ganglia.py -h $HOSTNAME$ -m $ARG1$ -w $ARG2$ -c $ARG3$
}

define service {
use linux-service
name ganglia-service
hostgroup_name L-servers
service_groups ganglia-metrics
notifications_enabled 0
}


define service {
use ganglia-service
service_description load_one
check_command check_ganglia!load_one!4!5
}


define service {
use ganglia-service
service_description disk_free
check_command check_ganglia!disk_free!10!5
}



[root@L2 nagios]#

サービス再起動

しばらくすると監視が始まるが・・・
「CHECKGANGLIA UNKNOWN: Error while getting value "Host/value not found" 」
上記のようなエラーが発生

どうもローカルホストしか見ていない模様。

ローカル


[root@L2 plugins]# /usr/lib/nagios/plugins/check_ganglia.py -h L2 -m disk_free -w 1700 -c 1600
CHECKGANGLIA OK: disk_free is 6.84

リモート


[root@L2 plugins]# /usr/lib/nagios/plugins/check_ganglia.py -h L1 -m disk_free -w 1700 -c 1600
CHECKGANGLIA UNKNOWN: Error while getting value "Host/value not found"

これだと対象サーバに、nrpe + check_ganglia.py 入れる必要がありそうだけど。。。。
であれば nrpe でリソース監視してもよくね?

うーん。。