The purpose of this post is to specify the installation and configuration of Nagios Monitor.
This document is intended to be used by System Administrators who want to monitor their Linux Machines.
This section provides the details of the proposed solution.
Nagios is a system and network monitoring application. It watches hosts and services that you specify, alerting you when things go bad and when they get better.
Some of the many features of Nagios include:
- Monitoring of network services (SMTP, POP3, HTTP, NNTP, PING, etc.)
- Monitoring of host resources (processor load, disk usage, etc.)
- Simple plugin design that allows users to easily develop their own service checks
- Parallelized service checks
- Ability to define network host hierarchy using "parent" hosts, allowing detection of and distinction between hosts that are down and those that are unreachable
- Contact notifications when service or host problems occur and get resolved (via email, pager, or user-defined method)
- Ability to define event handlers to be run during service or host events for proactive problem resolution
- Automatic log file rotation
- Support for implementing redundant monitoring hosts
- Optional web interface for viewing current network status, notification and problem history, log file, etc.
The only requirement of running Nagios is a machine running Linux (or UNIX variant) and a C compiler. You will probably also want to have TCP/IP configured, as most service checks will be performed over the network.
You are not required to use the CGIs included with Nagios. However, if you do decide to use them, you will need to have the following software installed.
1. A web server (preferrably Apache)
2. Thomas Boutell's gd library version 1.6.3 or higher (required by the statusmap and trends CGIs)
You can check for new versions of Nagios at http://www.nagios.org.
This quick start guide is intended to provide you with simple instructions on how to install Nagios from source (code) and have it monitoring your local machine inside of 20 minutes. No advanced installation options are discussed here - just the basics that will work for 95% of users who want to get started.
Quickstart installation guide for the Fedora distribution:
(Please visit the URL http://nagios.sourceforge.net/docs/3_0/toc.html for other distributions).
During portions of the installation you'll need to have root access to your machine.
Make sure you've installed the following packages on your Fedora installation before continuing.
- Apache
- PHP
- GCC compiler
- GD development libraries
You can use yum to install these packages by running the following commands (as root):
yum install httpd php
yum install gcc glibc glibc-common
yum install gd gd-devel
Become the root user.
su -l
Create a new nagios user account and give it a password.
/usr/sbin/useradd -m nagios
passwd nagios
Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd apache
Create a directory for storing the downloads.
mkdir ~/downloads
cd ~/downloads
Download the source code tarballs of both Nagios and the Nagios plugins (visit http://www.nagios.org/download/ for links to the latest versions). These directions were tested with Nagios 3.1.1 and Nagios Plugins 1.4.11.
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
wget http://osdn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz
Extract the Nagios source code tarball.
cd ~/downloads
tar xzf nagios-3.2.0.tar.gz
cd nagios-3.2.0
Run the Nagios configure script, passing the name of the group you created earlier like so:
./configure --with-command-group=nagcmd
Compile the Nagios source code.
make all
Install binaries, init script, sample config files and set permissions on the external command directory.
make install
make install-init
make install-config
make install-commandmode
Sample configuration files have now been installed in the /usr/local/nagios/etc directory. These sample files should work fine for getting started with Nagios. You'll need to make just one change before you proceed...
Edit the /usr/local/nagios/etc/objects/contacts.cfg config file with your favorite editor and change the email address associated with the nagiosadmin contact definition to the address you'd like to use for receiving alerts.
vi /usr/local/nagios/etc/objects/contacts.cfg
Install the Nagios web config file in the Apache conf.d directory.
make install-webconf
Create a nagiosadmin account for logging into the Nagios web interface. Remember the password you assign to this account - you'll need it later.
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
Restart Apache to make the new settings take effect.
service httpd restart
Extract the Nagios plugins source code tarball.
cd ~/downloads
tar xzf nagios-plugins-1.4.11.tar.gz
cd nagios-plugins-1.4.11
Compile and install the plugins.
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
Add Nagios to the list of system services and have it automatically start when the system boots.
chkconfig --add nagios
chkconfig nagios on
Verify the sample Nagios configuration files.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors, start Nagios.
service nagios start
Login to the Web Interface
You should now be able to access the Nagios web interface at the URL below. You'll be prompted for the username (nagiosadmin) and password you specified earlier.
http:// ; /nagios
Click on the "Service Detail" navbar link to see details of what's being monitored on your local machine. It will take a few minutes for Nagios to check all the services associated with your machine, as the checks are spread out over time.
Congratulations! You sucessfully installed Nagios.
We want to define a service template for remote servers, in case we wish to change any of the default parameters for remote servers. Edit /usr/local/nagios/etc/objects/templates.cfg and add the following lines:
# Remote service definition template - This is NOT a real service, just a template!
define service{
name remote-service ; The name of this service template
use generic-service ; Inherit default values from the generic-service definition
max_check_attempts 4 ; Re-check the service up to x times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every x minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined
register 0 ;
}
If you have ssh on a non-standard port on your localhost, you will need to open up /usr/local/nagios/etc/objects/localhost.cfg and edit the following lines:
# Define a service to check SSH on a non-standard port.
define service{
use local-service ; Name of service template to use
host_name localhost
service_description SSH
check_command check_ssh!-p 12345
notifications_enabled 0
}
We will need to edit /usr/local/nagios/etc/objects/commands.cfg for adding the remote commands:
################################################################################
#
# REMOTE COMMANDS
#
################################################################################
# 'check_remote_disk' command definition
define command{
command_name check_remote_disk
command_line $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_disk -w $ARG2$ -c $ARG3$ -p $ARG4$'
}
# 'check_remote_users' command definition
define command{
command_name check_remote_users
command_line $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_users -w $ARG2$ -c $ARG3$'
}
# 'check_remote_load' command definition
define command{
command_name check_remote_load
command_line $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_load -w $ARG2$ -c $ARG3$'
}
# 'check_remote_procs' command definition
define command{
command_name check_remote_procs
command_line $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_procs -w $ARG2$ -c $ARG3$ -s $ARG4$'
}
# 'check_remote_swap' command definition
define command{
command_name check_remote_swap
command_line $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_swap -w $ARG2$ -c $ARG3$'
}
Finally, create the directory where we will store all of the SSH identity files:
mkdir /usr/local/nagios/etc/keys
chown nagios:nagios /usr/local/nagios/etc/keys
chmod 750 /usr/local/nagios/etc/keys
Remove the ssh login banner from /etc/ssh/sshd_config if it is set.
Now its time to prepare the remote machines to be connected by Nagios.
First, create user for remote commands:
First, create user for remote commands:
/usr/sbin/useradd -m nagios
After the user creation create a folder nagios under /usr/local :
mkdir /usr/local/nagios
chown nagios:nagios /usr/local/nagios
chmod 755 /usr/local/nagios
We need to copy the contents of a nagios folder from an existing NRPE server under the directory /usr/local/.
scp –rp root@<NagiosServer>:/usr/local/nagios/* /usr/local/nagios/
Next, create the SSH login key:
cd /home/nagios
mkdir .ssh
ssh-keygen -t dsa -b 1024 -f .ssh/id_dsa
(just hit enter each time it asks for the passphrase -- we want it blank)
cat .ssh/id_dsa.pub >> .ssh/authorized_keys
chown -R nagios:nagios .ssh
chmod 750 .ssh
chmod 640 .ssh/*
cat .ssh/id_dsa
Now copy the contents of .ssh/id_dsa on the remote machine to the directory /usr/local/nagios/etc/keys on the server machine. Name the file what ever you specified as its hostname in the nagios config file for that server. do a:
chown nagios:nagios
chmod 600
Now accept the key once for nagios. Do this on the server, not the remote:
su - nagios
/usr/local/nagios/libexec/check_by_ssh -p 22 -l nagios -i /usr/local/nagios/etc/keys/ -H 192.168.0.x -C '/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /'
Enter 'yes' to accept the ssh key, then rerun the command:
/usr/local/nagios/libexec/check_by_ssh -p 22 -l nagios -i /usr/local/nagios/etc/keys/ -H 192.168.0.x -C '/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /'
It should succeed the second time.
Note: You may need to edit /etc/sysconfig/iptables and add a rule to accept ICMP pings from your Nagios server.
Note: You may need to edit /etc/sysconfig/iptables and add a rule to accept ICMP pings from your Nagios server.
Here is an example configuration file to be used for each server.
Note that many of the remote commands have 22 as the first parameter. Set this to something else if SSH is not listening on port 22 on the remote machine.
Note that many of the remote commands have 22 as the first parameter. Set this to something else if SSH is not listening on port 22 on the remote machine.
###############################################################################
###############################################################################
#
# HOST DEFINITION
#
###############################################################################
###############################################################################
# Define a host for the remote machine
define host{
use linux-server ; Name of host template to use
; This host definition will inherit all variables that are defined
; in (or inherited by) the linux-server host template definition.
host_name remote1
alias remote1.example.com
address remote1.example.com
}
###############################################################################
###############################################################################
#
# HOST GROUP DEFINITION
#
###############################################################################
###############################################################################
# Define an optional hostgroup for Linux machines
#define hostgroup{
# hostgroup_name linux-servers ; The name of the hostgroup
# alias Linux Servers ; Long name of the group
# members localhost ; Comma separated list of hosts that belong to this group
# }
###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################
# Define a service to "ping" the remote machine
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
# Define a service to check the disk space of the root partition
# on the remote machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description Root Partition
check_command check_remote_disk!22!20%!10%!/
}
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description Boot Partition
check_command check_remote_disk!22!20%!10%!/boot
}
# Define a service to check the number of currently logged in
# users on the remote machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description Current Users
check_command check_remote_users!22!20!50
}
# Define a service to check the number of currently running procs
# on the remote machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description Total Processes
check_command check_remote_procs!22!250!400!RSZDT
}
# Define a service to check the load on the remote machine.
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description Current Load
check_command check_remote_load!22!5.0,4.0,3.0!10.0,6.0,4.0
}
# Define a service to check the swap usage the remote machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description Swap Usage
check_command check_remote_swap!22!20!10
}
# Define a service to check SSH on the remote machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description SSH
check_command check_ssh!-p 22
notifications_enabled 0
}
# Define a service to check HTTP on the remote machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
define service{
use remote-service ; Name of service template to use
host_name remote1
service_description HTTP
check_command check_http
notifications_enabled 0
}
Save the above file as .cfg into /usr/local/nagios/etc/objects
Now edit the nagios.cfg file under /usr/local/nagios/etc and add the following lines into that for monitoring the remoteserver1 server
cfg_file=/usr/local/nagios/etc/objects/<remoteservername1>.cfg
Similarly for remoteserver2, remoteserver3, ..., remoteN configuration files under /usr/local/nagios/etc/objects and the nagios.cfg file should like
cfg_file=/usr/local/nagios/etc/objects/<remoteservername1>.cfg
cfg_file=/usr/local/nagios/etc/objects/<remoteservername2>.cfg
.......................................................................................
cfg_file=/usr/local/nagios/etc/objects/<remoteservernamen>.cfg
After editing the configuration files, follow the commands to verify the nagios and restart the nagios.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors, restart Nagios.
service nagios stop
service nagios start
Or
service nagios restart
Browse Nagios and you will be able to locate the added remote servers under Hosts.
http:// /nagios
1 comment:
Good one. Keep it up.
Post a Comment