警告
本文最后更新于 2023-07-09,文中内容可能已过时。
nagios
是一款用于监控远程机器状态的开源软件,使用了服务端-客户端的设计架构。
本文将详细介绍服务端与客户端的安装步骤。
其中,特别需要注意的是:
-
目前 NRPE
只能支持 openssl-1.1.1
版本的 ssl
功能,否则会出现错误
1
2
3
4
5
6
7
8
|
ccxhAMXP.o: In function `get_dh2048':
/tmp/nrpe-4.1.0/src/./../include/dh.h:33: undefined reference to `DH_set0_pqg'
collect2: error: ld returned 1 exit status
make[1]: *** [nrpe] Error 1
make[1]: Leaving directory `/tmp/nrpe-4.1.0/src'
make: *** [all] Error 2
...
undefined reference to `SSL_get1_peer_certificate'
|
-
nagios
依赖 apache
提供的 httpd
,管理页面位于: http://127.0.0.1:80/nagios/。
-
NRPE
的服务端不需要安装以下功能:
1
2
3
4
5
|
## below only for host-client
## make install-config && \
## make install-inetd && \
## make install-init && \
## make install-groups-users
|
-
nagcmd
用户
1
2
3
4
5
6
7
|
groupadd nagcmd
usermod -G nagcmd nagios
usermod -G nagcmd apache
chown nagios:nagcmd /usr/local/nagios/var/rw
chown nagios:nagcmd /usr/local/nagios/var/rw/nagios.cmd
systemctl restart nagios
|
-
perl
实现命令行自动安装依赖包
1
2
3
4
5
6
7
8
9
10
|
# export PERL_MM_USE_DEFAULT=1 && \
# # cpan -i Digest::MD5 && \
# # cpan -i Nagios::Config && \
# perl -MCPAN -e "install Digest::MD5" && \
# perl -MCPAN -e "install Nagios::Config" && \
yum -y install perl-App-cpanminus.noarch
export PERL_CPANM_OPT="--prompt --reinstall -l ~/perl5 --mirror http://cpan.cpantesters.org"
cpanm Digest::MD5
cpanm Nagios::Config
|
-
在安装 nagios-graph
的时候,一定要注意允许修改 nagios.cfg
-
nagios.cfg
1
2
3
4
|
Modify the Nagios configuration? [n] y
Path of Nagios configuration file? [/usr/local/nagios/etc/nagios.cfg]
Path of Nagios commands file? [/usr/local/nagios/etc/objects/commands.cfg]
```
|
-
apache
1
2
|
Modify the Apache configuration? [n] y
Path of Apache configuration directory? [/etc/httpd/conf.d]
|
整个过程如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
|
checking required PERL modules
Carp...1.26
CGI...3.63
Data::Dumper...2.145
Digest::MD5...2.52
File::Basename...2.84
File::Find...1.20
MIME::Base64...3.13
POSIX...1.30
RRDs...1.4008
Time::HiRes...1.9725
checking optional PERL modules
GD...2.49
Nagios::Config...36
checking nagios installation
found nagios exectuable at /usr/local/nagios/bin/nagios
found nagios init script at /etc/init.d/nagios
checking web server installation
found apache executable at /usr/sbin/httpd
Destination directory (prefix)? [/usr/local/nagios]
Location of configuration files (etc-dir)? [/usr/local/nagios/etc/nagiosgraph]
Location of executables? [/usr/local/nagios/libexec]
Location of CGI scripts? [/usr/local/nagios/sbin]
Location of documentation (doc-dir)? [/usr/local/nagios/docs/nagiosgraph]
Location of examples? [/usr/local/nagios/docs/nagiosgraph/examples]
Location of CSS and JavaScript files? [/usr/local/nagios/share]
Location of utilities? [/usr/local/nagios/docs/nagiosgraph/util]
Location of state files (var-dir)? [/var/nagios]
Location of RRD files? [/var/nagios/rrd]
Location of log files (log-dir)? [/var/nagios]
Path of log file? [/var/nagios/nagiosgraph.log]
Path of CGI log file? [/var/nagios/nagiosgraph-cgi.log]
Base URL? [/nagios]
URL of CGI scripts? [/nagios/cgi-bin]
URL of CSS file? [/nagios/nagiosgraph.css]
URL of JavaScript file? [/nagios/nagiosgraph.js]
URL of Nagios CGI scripts? [/nagios/cgi-bin]
Path of Nagios performance data file? [/tmp/perfdata.log]
username or userid of Nagios user? [nagios]
username or userid of web server user? [apache]
Modify the Nagios configuration? [n] y
Path of Nagios configuration file? [/usr/local/nagios/etc/nagios.cfg]
Path of Nagios commands file? [/usr/local/nagios/etc/objects/commands.cfg]
Modify the Apache configuration? [n] y
Path of Apache configuration directory? [/etc/httpd/conf.d]
configuration:
ng_prefix /usr/local/nagios
ng_etc_dir /usr/local/nagios/etc/nagiosgraph
ng_bin_dir /usr/local/nagios/libexec
ng_cgi_dir /usr/local/nagios/sbin
ng_doc_dir /usr/local/nagios/docs/nagiosgraph
ng_examples_dir /usr/local/nagios/docs/nagiosgraph/examples
ng_www_dir /usr/local/nagios/share
ng_util_dir /usr/local/nagios/docs/nagiosgraph/util
ng_var_dir /var/nagios
ng_rrd_dir /var/nagios/rrd
ng_log_dir /var/nagios
ng_log_file /var/nagios/nagiosgraph.log
ng_cgilog_file /var/nagios/nagiosgraph-cgi.log
ng_url /nagios
ng_cgi_url /nagios/cgi-bin
ng_css_url /nagios/nagiosgraph.css
ng_js_url /nagios/nagiosgraph.js
nagios_cgi_url /nagios/cgi-bin
nagios_perfdata_file /tmp/perfdata.log
nagios_user nagios
www_user apache
modify_nagios_config y
nagios_config_file /usr/local/nagios/etc/nagios.cfg
nagios_commands_file /usr/local/nagios/etc/objects/commands.cfg
modify_apache_config y
apache_config_dir /etc/httpd/conf.d
apache_config_file
Continue with this configuration? [y]
|
-
需要修改 rrd
的权限,否则会报错:no data found in /var/nagios/rrd
1
2
3
4
5
|
## -------------------------------------
## 修改权限
## 解决 no data in /var/nagios/rrd
# chmod -R 777 /usr/local/nagios
chmod -R 777 /var/nagios
|
服务端
安装 nagios-core
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
sudo yum -y install httpd php gd gd-devel perl postfix && \
yum -y install perl perl-CGI && \
sudo useradd nagios -p nagios && \
sudo groupadd nagcmd && \
sudo usermod -G nagcmd nagios && \
sudo usermod -G nagcmd apache
export NAGIOS_CORE_VERSION=4.4.13
cd /tmp && \
wget -O nagioscore.tar.gz https://github.com/NagiosEnterprises/nagioscore/archive/nagios-${NAGIOS_CORE_VERSION}.tar.gz && \
tar -xvf nagioscore.tar.gz && \
cd nagioscore-nagios-${NAGIOS_CORE_VERSION} && \
./configure && \
make all && \
make install && \
make install-daemoninit && \
make install-config && \
make install-commandmode && \
make install-webconf
|
安装 nagios-plugin
1
2
3
4
5
6
7
8
9
10
|
NAGIOS_PLUGIN_VERSION=2.4.5
cd /tmp && \
wget --no-check-certificate -O nagios-plugins.tar.gz https://github.com/nagios-plugins/nagios-plugins/releases/download/release-${NAGIOS_PLUGIN_VERSION}/nagios-plugins-${NAGIOS_PLUGIN_VERSION}.tar.gz && \
tar -xvf nagios-plugins.tar.gz && \
cd nagios-plugins-${NAGIOS_PLUGIN_VERSION} && \
unset ZSH_VERSION && \
CFLAGS="-I/usr/local/openssl/include" LDFLAGS="-L/usr/local/openssl/lib64" \
./configure --with-nagios-user=nagios --with-nagios-group=nagios --with-openssl=/usr/local/openssl && \
make -j && make install
|
安装 nagios-nrpe
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
NAGIOS_NRPE_VERSION=4.1.0
cd /tmp && \
unset ZSH_VERSION && \
wget https://www.openssl.org/source/old/1.1.1/openssl-1.1.1t.tar.gz && \
tar -xvf openssl-1.1.1t.tar.gz && \
cd openssl-1.1.1t && \
./config --prefix=/usr/local --openssldir=/etc/ssl --libdir=lib enable-ssl3 enable-ssl3-method enable-weak-ssl-ciphers -DOPENSSL_NO_GOST zlib shared && \
make -j && make install && \
ln -sfn /usr/local/bin/openssl /usr/bin/openssl && \
ln -sfn /usr/local/include/openssl/ /usr/include/openssl && \
echo "/usr/local/lib/" >> /etc/ld.so.conf && \
ldconfig && \
wget --no-check-certificate -O nagios-nrpe.tar.gz https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-${NAGIOS_NRPE_VERSION}/nrpe-${NAGIOS_NRPE_VERSION}.tar.gz && \
tar -xvf nagios-nrpe.tar.gz && \
cd nrpe-${NAGIOS_NRPE_VERSION} && \
export LDFLAGS=-L/usr/local/lib && \
./configure --enable-command-args --with-nagios-user=nagios --with-nagios-group=nagios && \
make all && \
make install && \
## below only for host-client
## make install-config && \
## make install-inetd && \
## make install-init && \
## make install-groups-users
rm -rf /tmp/nagios*
|
启动服务
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
## 启动 http 服务
sudo systemctl restart httpd
## 生成密码
sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
## 启动服务
systemctl enable nagios
systemctl start nagios
systemctl status nagios
## 开放端口
#firewall-cmd --add-service=http
#firewall-cmd --add-service=https
#firewall-cmd --reload
|
检查一下插件是否已经安装好
1
2
3
4
5
|
## 远程服务在 192.168.1.162:5666
/usr/local/nagios/libexec/check_nrpe -H 192.168.1.162 -p 5666
## 这时候会调用远程客户端命令,如果显示以下内容则说明匹配成功了
NRPE v4.0.3
|
这样,可以访问: http://127.0.0.1:80/nagios/。
1
2
3
4
5
6
7
8
9
10
11
12
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Unauthorized</title>
</head><body>
<h1>Unauthorized</h1>
<p>This server could not verify that you
are authorized to access the document
requested. Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
</body></html>
|
添加监控客户机器
可以参考 /usr/local/nagios/etc/objects/localhost.cfg
, 比如这个客户端 /usr/local/nagios/etc/research-machines/m2.cfg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
# 中间的内容块是用于设置设备信息的
define host {
# use 关键字表示使用的模版,模版将在后续讲解,此处使用的是 linux-server 模版
use linux-server
# host_name 关键字表示机器的名字,也是在 Web 界面中显示的名字
host_name M2
# alias 表示机器的别名,一般用作机器别名的描述
alias M2@WuyaCapital
# address 设置该机器的 IP 地址,以便与数据的获取与被动监控的请求
address 192.168.1.162
# 最大的尝试次数,也就是在某服务监控出错再次运行监控命令获取数据的次数
max_check_attempts 3
# 检测的时间段
check_period 24x7
# 发送消息提醒的时间间隔
notification_interval 30
# 发送消息提醒的时间段
notification_period 24x7
}
define service{
use local-service,graphed-service ; Name of service template to use
host_name M2
service_description Current Users
check_command check_nrpe!check_users
check_interval 1
retry_interval 1
check_period 24x7
notification_interval 1
notification_period 24x7
notifications_enabled 1
register 1
}
define service{
use local-service,graphed-service ; Name of service template to use
host_name M2
service_description Total Procs
check_command check_nrpe!check_total_procs
check_interval 1
retry_interval 1
check_period 24x7
notification_interval 1
notification_period 24x7
notifications_enabled 1
register 1
}
|
同时,我们需要修改命令
1
2
3
4
5
6
7
|
vim /usr/local/nagios/etc/objects/commands.cfg
# 'check_NRPE' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p $ARG1$ -c $ARG2$
}
|
客户端
1
2
|
sudo yum -y install zlib-devel xinetd
sudo useradd nagios -p nagios
|
安装 nagios-plugin
同服务端的安装: nagios-plugin
安装 nagios-nrpe
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
NAGIOS_NRPE_VERSION=4.1.0
cd /tmp && \
unset ZSH_VERSION && \
wget https://www.openssl.org/source/old/1.1.1/openssl-1.1.1t.tar.gz && \
tar -xvf openssl-1.1.1t.tar.gz && \
cd openssl-1.1.1t && \
./config --prefix=/usr/local --openssldir=/etc/ssl --libdir=lib enable-ssl3 enable-ssl3-method enable-weak-ssl-ciphers -DOPENSSL_NO_GOST zlib shared && \
make -j && make install && \
ln -sfn /usr/local/bin/openssl /usr/bin/openssl && \
ln -sfn /usr/local/include/openssl/ /usr/include/openssl && \
echo "/usr/local/lib/" >> /etc/ld.so.conf && \
ldconfig && \
wget --no-check-certificate -O nagios-nrpe.tar.gz https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-${NAGIOS_NRPE_VERSION}/nrpe-${NAGIOS_NRPE_VERSION}.tar.gz && \
tar -xvf nagios-nrpe.tar.gz && \
cd nrpe-${NAGIOS_NRPE_VERSION} && \
export LDFLAGS=-L/usr/local/lib && \
./configure --enable-command-args --with-nagios-user=nagios --with-nagios-group=nagios && \
make all && \
make install && \
## below only for host-client
make install-config && \
make install-inetd && \
make install-init && \
make install-groups-users
rm -rf /tmp/nagios*
|
启动服务
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
|
systemctl enable xinetd
systemctl restart xinetd
systemctl status xinetd
systemctl enable nrpe
systemctl restart nrpe
systemctl status nrpe
## 检查是否已经启动 nrpe
netstat -at | egrep "nrpe|5666"
## 检查服务是否启程
/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v4.1.0
## 检查插件是否可用
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users
USERS OK - 5 users currently logged in |users=5;5;10;0
## 修改客户端监控项目与内容
vim /usr/local/nagios/etc/nrpe.cfg
# The following examples allow user-supplied arguments and can
# only be used if the NRPE daemon was compiled with support for
# command arguments *AND* the dont_blame_nrpe directive in this
# config file is set to '1'. This poses a potential security risk, so
# make sure you read the SECURITY file before doing this.
### MISC SYSTEM METRICS ###
# command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
# The following examples allow user-supplied arguments and can
# only be used if the NRPE daemon was compiled with support for
# command arguments *AND* the dont_blame_nrpe directive in this
# config file is set to '1'. This poses a potential security risk, so
# make sure you read the SECURITY file before doing this.
### MISC SYSTEM METRICS ###
command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$
#command[check_load]=/usr/local/nagios/libexec/check_load $ARG1$
#command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
#command[check_swap]=/usr/local/nagios/libexec/check_swap $ARG1$
#command[check_cpu_stats]=/usr/local/nagios/libexec/check_cpu_stats.sh $ARG1$
#command[check_mem]=/usr/local/nagios/libexec/custom_check_mem -n $ARG1$
## 允许接入的服务器ip
allowed_hosts=127.0.0.1
allowed_hosts=192.168.1.162
## 日志文件
log_file=/usr/local/nagios/var/nrpe.log
## 允许传递参数
## 这样服务器可以调用参数:/usr/local/nagios/libexec/check_nrpe -H127.0.0.1 -p56118 -c check_users -a "2 10"
## 在 command.cfg 可以这样使用参数传递:command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p $ARG1$ -c $ARG2$ -a $ARG3$
## 在 colo.cfg 调用命令:check_command check_nrpe!56118!check_users!"2 10"
##
dont_blame_nrpe=1
|
编写 nagios-plugin
show-users
代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
|
#!/bin/bash
#
# Copyright Hari Sekhon 2007
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#
# Nagios Plugin to list all currently logged on users to a system.
# Modified by Rob MacKenzie, SFU - rmackenz@sfu.ca
# Added the -w and -c options to check for number of users.
version=0.3
# This makes coding much safer as a varible typo is caught
# with an error rather than passing through
set -u
# Note: resisted urge to use <<<, instead sticking with |
# in case anyone uses this with an older version of bash
# so no bash bashers please on this
# Standard Nagios exit codes
OK=0
WARNING=1
CRITICAL=2
UNKNOWN=3
usage(){
echo "usage: ${0##*/} [--simple] [ --mandatory username ] [ --unauthorized username ] [ --whitelist username ]"
echo
echo "returns a list of users on the local machine"
echo
echo " -s, --simple show users without the number of sessions"
echo " -m username, --mandatory username"
echo " Mandatory users. Return CRITICAL if any of these users are not"
echo " currently logged in"
echo " -b username, --blacklist username"
echo " Unauthorized users. Returns CRITICAL if any of these users are"
echo " logged in. This can be useful if you have a policy that states"
echo " that you may not have a root shell but must instead only use "
echo " 'sudo command'. Specifying '-u root' would alert on root having"
echo " a session and hence catch people violating such a policy."
echo " -a username, --whitelist username"
echo " Whitelist users. This is exceptionally useful. If you define"
echo " a bunch of users here that you know you use, and suddenly"
echo " there is a user session open for another account it could"
echo " alert you to a compromise. If you run this check say every"
echo " 3 minutes, then any attacker has very little time to evade"
echo " detection before this trips."
echo
echo " -m,-u and -w can be specified multiple times for multiple users"
echo " or you can use a switch a single time with a comma separated"
echo " list."
echo " -w integer, --warning integer"
echo " Set WARNING status if more than INTEGER users are logged in"
echo " -c integer, --critical integer"
echo " Set CRITICAL status if more than INTEGER users are logged in"
echo
echo
echo " -V --version Print the version number and exit"
echo
exit $UNKNOWN
}
simple=""
mandatory_users=""
unauthorized_users=""
whitelist_users=""
warning_users=0
critical_users=0
while [ "$#" -ge 1 ]; do
case "$1" in
-h|--help) usage
;;
-V|--version) echo $version
exit $UNKNOWN
;;
-s|--simple) simple=true
;;
-m|--mandatory) if [ "$#" -ge 2 ]; then
if [ -n "$mandatory_users" ]; then
mandatory_users="$mandatory_users $2"
else
mandatory_users="$2"
fi
shift
else
usage
fi
;;
-b|--blacklist) if [ "$#" -ge 2 ]; then
if [ -n "$unauthorized_users" ]; then
unauthorized_users="$unauthorized_users $2"
else
unauthorized_users="$2"
fi
shift
else
usage
fi
;;
-a|--whitelist) if [ "$#" -ge 2 ]; then
if [ -n "$whitelist_users" ]; then
whitelist_users="$whitelist_users $2"
else
whitelist_users="$2"
fi
shift
else
usage
fi
;;
-w|--warning) if [ "$#" -ge 2 ]; then
if [ $2 -ge 1 ]; then
warning_users=$2
fi
shift
else
usage
fi
;;
-c|--critical) if [ "$#" -ge 2 ]; then
if [ $2 -ge 1 ]; then
critical_users=$2
fi
shift
else
usage
fi
;;
*) usage
;;
esac
shift
done
mandatory_users="`echo $mandatory_users | tr ',' ' '`"
unauthorized_users="`echo $unauthorized_users | tr ',' ' '`"
whitelist_users="`echo $whitelist_users | tr ',' ' '`"
# Must be a list of usernames only.
userlist="`who|grep -v "^ *$"|awk '{print $1}'|sort`"
usercount="`who|wc -l`"
errormsg=""
exitcode=$OK
if [ -n "$userlist" ]; then
if [ -n "$mandatory_users" ]; then
missing_users=""
for user in $mandatory_users; do
if ! echo "$userlist"|grep "^$user$" >/dev/null 2>&1; then
missing_users="$missing_users $user"
exitcode=$CRITICAL
fi
done
for user in `echo $missing_users|tr " " "\n"|sort -u`; do
errormsg="${errormsg}user '$user' not logged in. "
done
fi
if [ -n "$unauthorized_users" ]; then
blacklisted_users=""
for user in $unauthorized_users; do
if echo "$userlist"|sort -u|grep "^$user$" >/dev/null 2>&1; then
blacklisted_users="$blacklisted_users $user"
exitcode=$CRITICAL
fi
done
for user in `echo $blacklisted_users|tr " " "\n"|sort -u`; do
errormsg="${errormsg}Unauthorized user '$user' is logged in! "
done
fi
if [ -n "$whitelist_users" ]; then
unwanted_users=""
for user in `echo "$userlist"|sort -u`; do
if ! echo $whitelist_users|tr " " "\n"|grep "^$user$" >/dev/null 2>&1; then
unwanted_users="$unwanted_users $user"
exitcode=$CRITICAL
fi
done
for user in `echo $unwanted_users|tr " " "\n"|sort -u`; do
errormsg="${errormsg}Unauthorized user '$user' detected! "
done
fi
if [ $warning_users -ne 0 -o $critical_users -ne 0 ]; then
unwanted_users=`who`
if [ $usercount -ge $critical_users -a $critical_users -ne 0 ]; then
exitcode=$CRITICAL
elif [ $usercount -ge $warning_users -a $warning_users -ne 0 ]; then
exitcode=$WARNING
fi
OLDIFS="$IFS"
IFS=$'\n'
for user in $unwanted_users; do
errormsg="${errormsg} --- $user"
done
IFS="$OLDIFS"
fi
if [ "$simple" == "true" ]
then
finallist=`echo "$userlist"|uniq`
else
finallist=`echo "$userlist"|uniq -c|awk '{print $2"("$1")"}'`
fi
else
finallist="no users logged in"
fi
if [ "$exitcode" -eq $OK ]; then
echo "USERS OK:" $finallist
exit $OK
elif [ "$exitcode" -eq $WARNING ]; then
echo "USERS WARNING: [users: "$finallist"]" $errormsg
exit $WARNING
elif [ "$exitcode" -eq $CRITICAL ]; then
echo "USERS CRITICAL: [users: "$finallist"]" $errormsg
exit $CRITICAL
else
echo "USERS UNKNOWN:" $errormsg"[users: "$finallist"]"
exit $UNKNOWN
fi
exit $UNKNOWN
|
配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
Plugin worked properly. Just set below things.
On Nagios Server:
* Create a file "show_users" in your libexec directory.
* Copy all the "show_users.txt" contents in "show_users" file
* chmod 755 show_users
* chown nagios:nagios show_users
* Open your host configuration file & type below configuration.
define service{
use generic-service ; Inherit values from a template
host_name Dell NFS Server
service_description Logged Users
check_command check_nrpe!show_users
On NRPE Client:
* Copy show_users file in "libexec" directory
* vim nrpe.cfg, add below line
command[show_users]=/usr/local/nagios/libexec/show_users
Save & Exit & restart NRPE/Xinetd service
That's it.
|
配置 nagios-graph
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
locate /etc/httpd/conf/httpd.conf
## cgi
## 先安装 perl-core
yum install -y perl-core
## 进入 perl 命令行模式,开始安装
perl -MCPAN -e shell
1) install Digest::MD5
2) install Nagios::Config
## 修改权限
chmod 777 -R /usr/local/nagios
## 修改 templates.cfg, 一定要改成 /nagios/cgi-bin/....
define service {
name graphed-service
action_url /nagios/cgi-bin/show.cgi?host=$HOSTNAME$&service=$SERVICEDESC$' onMouseOver='showGraphPopup(this)' onMouseOut='hideGraphPopup()' rel='/nagiosgraph/cgi-bin/showgraph.cgi?host=$HOSTNAME$&service=$SERVICEDESC$&period=week&rrdopts=-w+450+-j
register 0
}
## 测试打开 http://127.0.0.1:8080/nagios/cgi-bin/showconfig.cgi
|
配置 pnp4nagios
1
2
3
|
sudo /etc/init.d/npcd restart
systemctl restart npcd
systemctl restart httpd
|
安装 pnp4nagios
1
2
3
4
5
6
7
8
9
10
11
12
13
|
cd /tmp
wget -O pnp4nagios.tar.gz https://github.com/lingej/pnp4nagios/archive/0.6.26.tar.gz
tar -xvf pnp4nagios.tar.gz
cd pnp4nagios-0.6.26
./configure --with-rrdtool=/usr/bin/rrdtool --with-perfdata-dir=/usr/local/nagios/share/perfdata --with-perfdata-spool-dir=/usr/local/nagios/share/spool --with-nagios-user=nagios --with-nagios-group=nagios
make all
make install
make install-webconf
make install-config
make install-init
|
配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
|
cd /usr/local/pnp4nagios/etc
mv misccommands.cfg-sample misccommands.cfg
mv nagios.cfg-sample nagios.cfg
mv rra.cfg-sample rra.cfg
cd /usr/local/pnp4nagios/etc/pages/
mv web_traffic.cfg-sample web_traffic.cfg
cd ../check_commands
mv check_all_local_disks.cfg-sample check_all_local_disks.cfg
mv check_nrpe.cfg-sample check_nrpe.cfg
mv check_nwstat.cfg-sample check_nwstat.cfg
## 启动服务
sudo /etc/init.d/npcd restart
systemctl restart npcd
systemctl restart httpd
## 修改 /usr/local/nagios/etc/objects/templates.cfg
cp /usr/local/nagios/etc/objects/templates.cfg /usr/local/nagios/etc/objects/templates.cfg.orig
## ----------------------------------------------
vim /usr/local/nagios/etc/objects/templates.cfg
define host {
name host-pnp
register 0
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=_HOST_
process_perf_data 1
}
define service {
name srv-pnp
register 0
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=$SERVICEDESC$
process_perf_data 1
}
define host{
name gchost
use host-pnp
max_check_attempts 1
normal_check_interval 2
retry_check_interval 1
check_period 24x7
contact_groups myself_group
notification_interval 2
notification_period 24x7
notification_options d,u,r
check_command check-host-alive
}
define service{
name myself_temp
use srv-pnp
max_check_attempts 2
normal_check_interval 2
retry_check_interval 1
check_period 24x7
notification_interval 2
notification_period 24x7
notification_options w,u,c,r
contact_groups myself_group
check_command check-host-alive
register 0
}
## 修改 /usr/local/nagios/etc/nagios.cfg
cd /usr/local/nagios/etc/nagios.cfg /usr/local/nagios/etc/nagios.cfg.orig
## -------------------------------------
vim /usr/local/nagios/etc/nagios.cfg
process_performance_data=1
enable_environment_macros=1
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
host_perfdata_file_mode=a
service_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
service_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file
service_perfdata_file_processing_command=process-service-perfdata-file
## 修改 /usr/local/nagios/etc/objects/commands.cfg
cp /usr/local/nagios/etc/objects/commands.cfg /usr/local/nagios/etc/objects/commands.cfg.orig
## --------------------------------------------------
vim /usr/local/nagios/etc/objects/commands.cfg
define command{
command_name process-service-perfdata-file
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata
}
define command{
command_name process-host-perfdata-file
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/host-perfdata
}
## 修改 apache 配置 /etc/httpd/conf/httpd.conf
cd /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.orig
## -----------------------------------
vim /etc/httpd/conf/httpd.conf
Alias /pnp4nagios "/usr/local/pnp4nagios/share"
<Directory "/usr/local/pnp4nagios/share">
AllowOverride None
Order allow,deny
Allow from all
#
# Use the same value as defined in nagios.conf
#
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user
<IfModule mod_rewrite.c>
#Turn on URL rewriting
RewriteEngine On
Options symLinksIfOwnerMatch
#Installation directory
RewriteBase /pnp4nagios/
#Protect application and system files from being viewed
RewriteRule "^(?:application|modules|system)/" - [F]
#Allow any files or directories that exist to be displayed directly
RewriteCond "%{REQUEST_FILENAME}" !-f
RewriteCond "%{REQUEST_FILENAME}" !-d
#Rewrite all other URLs to index.php/URL
RewriteRule "^.*$" "index.php/$0" [PT]
</IfModule>
</Directory>
## 查看效果:http://<ip>:<port>/pnp4nagios/
## 检查各项安装是否符合条件,然后删除这个页面
mv /usr/local/pnp4nagios/share/install.php /usr/local/pnp4nagios/share/install.php.orig
## 复制鼠标悬停效果: 在下载的源代码里面,如 /tmp/pnp4nagios-0.6.26/contrib
cp /tmp/pnp4nagios-0.6.26/contrib/ssi/* /usr/local/nagios/share/ssi/
## 测试启动
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
systemctl restart httpd
systemctl restart npcd
systemctl restart nagios
## 应用到监控项目
## 由于 pnp4nagios 与 nagios-graph 冲突,二者只能选一个使用
vim /usr/local/nagios/etc/colo-machines/colo118.cfg
define service {
# use local-service,graphed-service,srv-pnp ; Name of service template to use
use local-service,srv-pnp ; Name of service template to use
host_name Colo118
service_description Current Proc Memory valv_out_zz1
check_command check_nrpex!15118!check_proc_mem!"-w10240 -c20480 --pattern=.*valv.out.zz1.*"
check_interval 3
retry_interval 1
check_period 24x7
notification_interval 1
notification_period 24x7
notifications_enabled 1
register 1
}
|
grafana
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
|
cd /tmp
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.0.0-1.x86_64.rpm
## 安装 grafana
sudo yum install grafana-enterprise-10.0.0-1.x86_64.rpm
systemctl daemon-reload
systemctl enable grafana-server.service
systemctl start grafana-server.service
systemctl restart grafana-server.service
## 安装 PNP plugin
sudo grafana-cli plugins install sni-pnp-datasource
# service firewalld restart
# firewall-cmd --permanent --add-port=3000/tcp
# firewall-cmd --reload
# service firewalld restart
## 下载 api
cd /usr/local/pnp4nagios/share/application/controllers/
wget -O api.php "https://github.com/lingej/pnp-metrics-api/raw/master/application/controller/api.php"
systemctl restart grafana-server.service
## 配置 pnp4nagios
cp /etc/httpd/conf.d/pnp4nagios.conf /etc/httpd/conf.d/pnp4nagios.conf.orig
sed -i '/Allow from all/a\ Allow from 127.0.0.1 ::1' /etc/httpd/conf.d/pnp4nagios.conf
sed -i '/Require valid-user/a\ Require all granted' /etc/httpd/conf.d/pnp4nagios.conf
sed -i '/Require valid-user/a\ Require ip 127.0.0.1 ::1' /etc/httpd/conf.d/pnp4nagios.conf
sed -i 's/Allow from all/#&/' /etc/httpd/conf.d/pnp4nagios.conf
sed -i 's/AuthName/#&/' /etc/httpd/conf.d/pnp4nagios.conf
sed -i 's/AuthType Basic/#&/' /etc/httpd/conf.d/pnp4nagios.conf
sed -i 's/AuthUserFile/#&/' /etc/httpd/conf.d/pnp4nagios.conf
sed -i 's/Require valid-user/#&/' /etc/httpd/conf.d/pnp4nagios.conf
systemctl restart httpd.service
## 配置 grafana, 首次登录,账户:admin,密码:admin
## 配置 pnp data source
## 生成密码,使用这个在 grafana 登录 pnp
htpasswd -b /usr/local/nagios/etc/htpasswd.users nagiosadmin *********
## 打开页面, http://127.0.0.1:3000
|
企业微信通知
企业微信对于机器人与群助手的渲染不一样,导致 \n
的判断不同
1
2
3
4
|
if re.search('Robot', str(type(w)), re.IGNORECASE):
msgx = cli.args.msg.replace('\\n', '\n')
else:
msgx = cli.args.msg.replace('\\n', '<br>').replace('\n', '<br>')
|
参考:
建立 python 通知脚本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
from wepy.utils.init import *
if __name__ == '__main__':
## -------------------------------------------
cli = CliParser("wechat from command line")
cli.add("who", type=str, default='wx_test')
cli.add("msg", type=str, default="")
cli.add("level", type=str, default='info')
cli.show()
## -------------------------------------------
if cli.args.who not in globals():
msg = f"""
{cli.args.who=} does not exist
"""
log.err(msg)
raise Exception(msg)
w = globals().get(cli.args.who)
if re.search('Robot', str(type(w)), re.IGNORECASE):
msgx = cli.args.msg.replace('\\n', '\n')
else:
msgx = cli.args.msg.replace('\\n', '<br>').replace('\n', '<br>')
w.send(msgx, cli.args.level)
|
这个是模板,我们可以通过继承来实现具体的 contact1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
vim /usr/local/nagios/etc/objects/templates.cfg
define contact {
name generic-contact ; The name of this contact template
service_notification_period 24x7 ; service notifications can be sent anytime
host_notification_period 24x7 ; host notifications can be sent anytime
# service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
service_notification_options u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
# service_notification_commands notify-service-by-email,notify-service-by-wechat ; send service notifications via email
# host_notification_commands notify-host-by-email,notify-host-by-wechat ; send host notifications via email
service_notification_commands notify-service-by-wechat ; send service notifications via email
host_notification_commands notify-host-by-wechat ; send host notifications via email
register 0 ; DON'T REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
vim /usr/local/nagios/etc/objects/contacts.cfg
define contact {
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
# email nagios@localhost ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
email william.lian.fang@gmail.com ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
host_notification_commands notify-host-by-wechat
host_notification_options d,u,r
host_notification_period 24x7
service_notification_commands notify-service-by-wechat
service_notification_options w,u,c,r ; 可以修改不同的设置
service_notification_period 24x7
}
define contact {
contact_name test ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Dev ; Full name of user
email william.lian.fang@gmail.com ;
host_notifications_enabled 1
service_notifications_enabled 1
host_notification_period 24x7
service_notification_period 24x7
host_notification_options d,u,r,f,s,n
service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
host_notification_commands notify-host-by-wechat-test
service_notification_commands notify-service-by-wechat-test
}
|
还可以设置 contact-groups
1
2
3
4
5
6
7
8
|
vim /usr/local/nagios/etc/objects/contacts.cfg
define contactgroup {
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin,test
}
|
添加监控 commands
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
vim /usr/local/nagios/etc/objects/commands.cfg
##### notify-host-by-wechat command definition
define command{
command_name notify-host-by-wechat
command_line /usr/local/python3/bin/python3 /app/wechat_cli.py --who='wx_test' --msg=$(/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n") --level='$HOSTSTATE$'
}
##### notify-service-by-wechat command definition
define command{
command_name notify-service-by-wechat
command_line /usr/local/python3/bin/python3 /app/wechat_cli.py --who='robot_secops' --msg="***** Nagios *****\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n$SERVICEOUTPUT$" --level='$SERVICESTATE$'
}
define command{
command_name notify-host-by-wechat-test
command_line /usr/local/python3/bin/python3 /app/wechat_cli.py --who='wx_test' --msg=$(/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n") --level='$HOSTSTATE$'
}
define command{
command_name notify-service-by-wechat-test
command_line /usr/local/python3/bin/python3 /app/wechat_cli.py --who='wx_test' --msg="***** Nagios *****\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n$SERVICEOUTPUT$" --level='$SERVICESTATE$'
}
|
设置监控项目
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
define host {
## --------------------------------
contacts nagiosadmin ; 以上的 contacts 组
notifications_enabled 1 ; 0, 1
## --------------------------------
}
define service {
## --------------------------------
contacts nagiosadmin ; 以上的 contacts 组
notifications_enabled 1 ; 0, 1
## --------------------------------
}
define service
# use local-service,graphed-service,srv-pnp ; Name of service template to use
use local-service,srv-pnp ; Name of service template to use
host_name Colo118
service_description NTP time
check_command check_nrpex!15118!check_ntp_time!"--benchmark=$(echo $(date +%Y-%m-%dT%H:%M:%S.%6N)) --warning=300000 --critical=500000"
check_interval 1
retry_interval 1
check_period 24x7
notification_interval 1
notification_period 24x7
notifications_enabled 1
register 1
contacts test ; 用法发送监控
}
|
Ref