Tuesday, March 5, 2013

Nagios Installation Guide

The purpose of this post is to specify the installation and configuration of Nagios Monitor.
This document is intended to be used by System Administrators who want to monitor their Linux Machines.
This section provides the details of the proposed solution.
Overview of Nagios
Nagios is a system and network monitoring application. It watches hosts and services that you specify, alerting you when things go bad and when they get better.
Some of the many features of Nagios include:
  • Monitoring of network services (SMTP, POP3, HTTP, NNTP, PING, etc.)
  • Monitoring of host resources (processor load, disk usage, etc.)
  • Simple plugin design that allows users to easily develop their own service checks
  • Parallelized service checks
  • Ability to define network host hierarchy using "parent" hosts, allowing detection of and distinction between hosts that are down and those that are unreachable
  • Contact notifications when service or host problems occur and get resolved (via email, pager, or user-defined method)
  • Ability to define event handlers to be run during service or host events for proactive problem resolution
  • Automatic log file rotation
  • Support for implementing redundant monitoring hosts
  • Optional web interface for viewing current network status, notification and problem history, log file, etc.
The only requirement of running Nagios is a machine running Linux (or UNIX variant) and a C compiler. You will probably also want to have TCP/IP configured, as most service checks will be performed over the network.
You are not required to use the CGIs included with Nagios. However, if you do decide to use them, you will need to have the following software installed.
1.     A web server (preferrably Apache)
2.     Thomas Boutell's gd library version 1.6.3 or higher (required by the statusmap and trends CGIs)
Downloading The Latest Version
You can check for new versions of Nagios at http://www.nagios.org.
This quick start guide is intended to provide you with simple instructions on how to install Nagios from source (code) and have it monitoring your local machine inside of 20 minutes. No advanced installation options are discussed here - just the basics that will work for 95% of users who want to get started.
Quickstart installation guide for the Fedora distribution: 
(Please visit the URL http://nagios.sourceforge.net/docs/3_0/toc.html for other distributions).
During portions of the installation you'll need to have root access to your machine.
Make sure you've installed the following packages on your Fedora installation before continuing.
  • Apache
  • PHP
  • GCC compiler
  • GD development libraries
You can use yum to install these packages by running the following commands (as root):
yum install httpd php
yum install gcc glibc glibc-common
yum install gd gd-devel
Become the root user.
su -l
Create a new nagios user account and give it a password.
/usr/sbin/useradd -m nagios
passwd nagios
Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd apache
Create a directory for storing the downloads.
mkdir ~/downloads
cd ~/downloads
Download the source code tarballs of both Nagios and the Nagios plugins (visit http://www.nagios.org/download/ for links to the latest versions). These directions were tested with Nagios 3.1.1 and Nagios Plugins 1.4.11.
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
wget http://osdn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.11.tar.gz
Extract the Nagios source code tarball.
cd ~/downloads
tar xzf nagios-3.2.0.tar.gz
cd nagios-3.2.0
Run the Nagios configure script, passing the name of the group you created earlier like so:
./configure --with-command-group=nagcmd
Compile the Nagios source code.
make all
Install binaries, init script, sample config files and set permissions on the external command directory.
make install
make install-init
make install-config
make install-commandmode
Sample configuration files have now been installed in the /usr/local/nagios/etc directory. These sample files should work fine for getting started with Nagios. You'll need to make just one change before you proceed...
Edit the /usr/local/nagios/etc/objects/contacts.cfg config file with your favorite editor and change the email address associated with the nagiosadmin contact definition to the address you'd like to use for receiving alerts.
vi /usr/local/nagios/etc/objects/contacts.cfg
Install the Nagios web config file in the Apache conf.d directory.
make install-webconf
Create a nagiosadmin account for logging into the Nagios web interface. Remember the password you assign to this account - you'll need it later.
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
Restart Apache to make the new settings take effect.
service httpd restart
Extract the Nagios plugins source code tarball.
cd ~/downloads
tar xzf nagios-plugins-1.4.11.tar.gz
cd nagios-plugins-1.4.11
Compile and install the plugins.
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
Add Nagios to the list of system services and have it automatically start when the system boots.
chkconfig --add nagios
chkconfig nagios on
Verify the sample Nagios configuration files.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors, start Nagios.
service nagios start

Login to the Web Interface
You should now be able to access the Nagios web interface at the URL below. You'll be prompted for the username (nagiosadmin) and password you specified earlier.

http:// ;/nagios

Click on the "Service Detail" navbar link to see details of what's being monitored on your local machine. It will take a few minutes for Nagios to check all the services associated with your machine, as the checks are spread out over time.
Congratulations! You sucessfully installed Nagios.

We want to define a service template for remote servers, in case we wish to change any of the default parameters for remote servers. Edit /usr/local/nagios/etc/objects/templates.cfg and add the following lines:
# Remote service definition template - This is NOT a real service, just a template!

define service{
        name                            remote-service          ; The name of this service template
        use                             generic-service         ; Inherit default values from the generic-service definition
        max_check_attempts              4                       ; Re-check the service up to x times in order to determine its final (hard) state
        normal_check_interval           10                      ; Check the service every x minutes under normal conditions
        retry_check_interval            1                       ; Re-check the service every minute until a hard state can be determined
        register                        0                       ;
        }

If you have ssh on a non-standard port on your localhost, you will need to open up /usr/local/nagios/etc/objects/localhost.cfg and edit the following lines:

# Define a service to check SSH on a non-standard port.

define service{
        use                             local-service         ; Name of service template to use
        host_name                       localhost
        service_description             SSH
        check_command                   check_ssh!-p 12345
        notifications_enabled           0
        }
We will need to edit /usr/local/nagios/etc/objects/commands.cfg for adding the  remote commands:
################################################################################
#
# REMOTE COMMANDS
#
################################################################################

# 'check_remote_disk' command definition
define command{
        command_name    check_remote_disk
        command_line    $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_disk -w $ARG2$ -c $ARG3$ -p $ARG4$'
        }

# 'check_remote_users' command definition
define command{
        command_name    check_remote_users
        command_line    $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_users -w $ARG2$ -c $ARG3$'
        }

# 'check_remote_load' command definition
define command{
        command_name    check_remote_load
        command_line    $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_load -w $ARG2$ -c $ARG3$'
        }

# 'check_remote_procs' command definition
define command{
        command_name    check_remote_procs
        command_line    $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_procs -w $ARG2$ -c $ARG3$ -s $ARG4$'
        }

# 'check_remote_swap' command definition
define command{
        command_name    check_remote_swap
        command_line    $USER1$/check_by_ssh -p $ARG1$ -l nagios -i /usr/local/nagios/etc/keys/$HOSTNAME$ -H $HOSTADDRESS$ -C '/usr/local/nagios/libexec/check_swap -w $ARG2$ -c $ARG3$'
        }

Finally, create the directory where we will store all of the SSH identity files:
 mkdir /usr/local/nagios/etc/keys
 chown nagios:nagios /usr/local/nagios/etc/keys
 chmod 750 /usr/local/nagios/etc/keys
Remove the ssh login banner from /etc/ssh/sshd_config if it is set.
Set Up Remote Machine
Now its time to prepare the remote machines to be connected by Nagios.
First, create user for remote commands:
 /usr/sbin/useradd -m nagios
After the user creation create a folder nagios under /usr/local :
mkdir /usr/local/nagios
chown nagios:nagios /usr/local/nagios
chmod 755 /usr/local/nagios
We need to copy the contents of a nagios folder from an existing NRPE server under the directory /usr/local/.
scp –rp root@<NagiosServer>:/usr/local/nagios/* /usr/local/nagios/
Next, create the SSH login key:
 cd /home/nagios
 mkdir .ssh
 ssh-keygen -t dsa -b 1024 -f .ssh/id_dsa
(just hit enter each time it asks for the passphrase -- we want it blank)
 cat .ssh/id_dsa.pub >> .ssh/authorized_keys
 chown -R nagios:nagios .ssh
 chmod 750 .ssh
 chmod 640 .ssh/*
 cat .ssh/id_dsa
Now copy the contents of .ssh/id_dsa on the remote machine to the directory /usr/local/nagios/etc/keys on the server machine. Name the file what ever you specified as its hostname in the nagios config file for that server. do a:
 chown nagios:nagios
 chmod 600
Now accept the key once for nagios. Do this on the server, not the remote:
 su - nagios
 /usr/local/nagios/libexec/check_by_ssh -p 22 -l nagios -i /usr/local/nagios/etc/keys/ -H 192.168.0.x -C '/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /'
Enter 'yes' to accept the ssh key, then rerun the command:
 /usr/local/nagios/libexec/check_by_ssh -p 22 -l nagios -i /usr/local/nagios/etc/keys/ -H 192.168.0.x -C '/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /'
It should succeed the second time.
Note: You may need to edit /etc/sysconfig/iptables and add a rule to accept ICMP pings from your Nagios server.

Here is an example configuration file to be used for each server.
Note that many of the remote commands have 22 as the first parameter. Set this to something else if SSH is not listening on port 22 on the remote machine.

###############################################################################
###############################################################################
#
# HOST DEFINITION
#
###############################################################################
###############################################################################

# Define a host for the remote machine

define host{
        use                     linux-server            ; Name of host template to use
                                                        ; This host definition will inherit all variables that are defined
                                                        ; in (or inherited by) the linux-server host template definition.
        host_name               remote1
        alias                   remote1.example.com
        address                 remote1.example.com
        }

###############################################################################
###############################################################################
#
# HOST GROUP DEFINITION
#
###############################################################################
###############################################################################

# Define an optional hostgroup for Linux machines

#define hostgroup{
#        hostgroup_name  linux-servers ; The name of the hostgroup
#        alias           Linux Servers ; Long name of the group
#        members         localhost     ; Comma separated list of hosts that belong to this group
#        }



###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################


# Define a service to "ping" the remote machine

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
        }

# Define a service to check the disk space of the root partition
# on the remote machine.  Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             Root Partition
        check_command                   check_remote_disk!22!20%!10%!/
        }

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             Boot Partition
        check_command                   check_remote_disk!22!20%!10%!/boot
        }

# Define a service to check the number of currently logged in
# users on the remote machine.  Warning if > 20 users, critical
# if > 50 users.

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             Current Users
        check_command                   check_remote_users!22!20!50
        }

# Define a service to check the number of currently running procs
# on the remote machine.  Warning if > 250 processes, critical if
# > 400 users.

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             Total Processes
        check_command                   check_remote_procs!22!250!400!RSZDT
        }

# Define a service to check the load on the remote machine.

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             Current Load
        check_command                   check_remote_load!22!5.0,4.0,3.0!10.0,6.0,4.0
        }

# Define a service to check the swap usage the remote machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             Swap Usage
        check_command                   check_remote_swap!22!20!10
        }

# Define a service to check SSH on the remote machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             SSH
        check_command                   check_ssh!-p 22
        notifications_enabled           0
        }

# Define a service to check HTTP on the remote machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.

define service{
        use                             remote-service         ; Name of service template to use
        host_name                       remote1
        service_description             HTTP
        check_command                   check_http
        notifications_enabled           0
        }

Save the above file as .cfg into /usr/local/nagios/etc/objects
Now edit the nagios.cfg file under /usr/local/nagios/etc and add the following lines into that for monitoring the remoteserver1 server
cfg_file=/usr/local/nagios/etc/objects/<remoteservername1>.cfg
Similarly for remoteserver2, remoteserver3, ..., remoteN configuration files under /usr/local/nagios/etc/objects and the nagios.cfg file should like
cfg_file=/usr/local/nagios/etc/objects/<remoteservername1>.cfg
cfg_file=/usr/local/nagios/etc/objects/<remoteservername2>.cfg
.......................................................................................
cfg_file=/usr/local/nagios/etc/objects/<remoteservernamen>.cfg
 After editing the configuration files, follow the commands to verify the nagios and restart the nagios.
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors, restart Nagios.
service nagios stop
service nagios start
Or
service nagios restart
Browse Nagios and you will be able to locate the added remote servers under Hosts.
http:// /nagios

1 comment:

Anonymous said...

Good one. Keep it up.