VSAN: Reimagining Storage in vSphere

Download THIS FREE WHITE PAPER to Learn about VMware Virtual SAN (VSAN), its basic requirements, and how it works.. FOR FREE !!

Sponsored by: Global Knowledge®

Friday, 18 July 2014

Monitoring Common Services using Nagios

Monitoring Common Services using Nagios


In this tutorial, I'll walk you through simple steps to setup few commonly used Nagios Plugins for monitoring commonly used services.

NOTE: In my previous post, I walked through a fast and easy way to install and configure Nagios for monitoring your infrastructure. You can refer to the steps HERE 

Nagios comes with a wide range of built-in scripts for monitoring a variety of services. You should remember however, that the default locations for these plugins differs from Linux distributions.

Default location for CentOS: /usr/lib64/nagios/plugins
Default location for Ubuntu: /usr/lib/nagios/plugins

Based on the steps followed in my earlier tutorial, I have devised a simple 3 step process to effectively add and view a Nagios plugin from the Nagios UI:
1) Add the respective plugin command definition in the /etc/nagios/objects/commands.cfg file
2) Add the corresponding service associated with that plugin in the /etc/nagios/servers/clients.cfg file
3) Restart the Nagios Service (service nagios restart)

I'll be following the same procedure in the rest of the tutorial as well.

There are 6 plugins that I have used in this tutorial, most of them available with the standard Nagios distribution:
1)check_by_ssh
2) check_disk
3) check_load
4) check_swap
5) check_users
6) check_tcp

Below is a screenshot of the default plugins located on my Nagios server @ /usr/lib64/nagios/plugins folder



Working with Plugins

1) check_by_ssh
This plugin is used to execute a command on your remote host using SSH.

Syntax

check_by_ssh -H <host> -C <command> [-n name] [-s servicelist] [-n name] [-O outputfile]


Example:
# check_by_ssh -H 192.168.50.152 -n 1h -s c1:c2 -C uptime -C uname -O /tmp/mylogs

NOTE: Use /tmp to store the output of this Plugin's commands as other folders give out a permission denied error.



If you check the output file on your Nagios Server, you should see the output of remotely executed commands as shown below:


2) check_disk
As the name applies, this plugin is used to check the amount of free space on a system. You can use this plugin to raise alerts if the free space is less than that of a desired value.

Syntax
check_disk -w <warning_limit> -c <critical_limit> -p <path/ partition>

Example:
# check_disk -w 10% -c 5% -p /tmp -p /var
The following example checks the /tmp and /var directory and raises a warning if free disk space falls below 10% and a critical alert if it falls below 5%


Example:
# check_disk -w 100000 -c 50000 -p /
The following example checks the root (/) directory and raises a warning if free disk space falls below 100 MB and a critical alert if it falls below 500 MB.



Example:
# check_disk -w 10% -c 5% -p /tmp -p /var -C -w 100000 -c 50000 -p /
You can also combine two or more commands into a single statement by using the -C (Clear threshold) flag.



If you want to monitor a particular Plugin using the Nagios UI, Then you need to follow the three steps that I had laid out earlier, for e.g. I want to monitor the Disk Usage of my remote client server (lamp.cloud.com: 192.168.50.152), in particularly, its /tmp directory.

So as per the steps, the first thing I do is add the check_disk command definition to my Nagios server's commands.cfg file:

# vi /etc/nagios/objects/commands.cfg

define command {
command_name    check_disk
command_line       $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
    }

Once done, save the file and exit the editor.



Next, we define the service for the command:

# vi /etc/nagois/servers/clients.cfg

define service {
use    generic-service
host_name    lamp
service_description    Check tmp at 10 percent and 5 percent
check_command    check_disk!10%!5%!/tmp
    }

NOTE: The commands.cfg and clients.cfg files follow a very particular syntax and are case-sensitive. So make sure you don't have any other special characters such as # or () in these files.

Save the file and close the editor.



The last step is to restart the Nagios service. You must remeber that the Nagios service will not start up if there are any errors in your configuration files. So make sure that both your commands.cfg and clients.cfg files are semantically correct.

# service nagios restart



Login to the Nagios UI and select your Host. You should see the newly added plugin configured and working as shown: 



You can dig deeper into the service by selecting it and viewing the service state info as shown below



3) check_load
This plugin checks the current systems Load Average.

Syntax
check_load -w <warning_load1>,<warning_load5>,<warning_load15> -c <critical_load1>,<critical_load5>,<critical_load15>

Example:
# check_load -w 15,10,5 -c 30,20,10
The following example will generate a warning status if the load average exceeds the values: 15 for Load1, 10 for Load5 and 5 for Load15. Similarly, it will generate critical status if the load average exceeds the values: 30 for Load1, 20 for Load5and 10 for Load15. 



To monitor this service from the Nagios UI, you follow the same 3 step process once again:

First, add the command definition in the commands.cfg file:

define command {
command_name    check_load
command_line       $USER1$/check_load -w $ARG1$,$ARG2$,$ARG3$ -c $ARG4$, $ARG5$, $ARG6$
    }

Once done, save the file and close the editor.



Next, add the service definition for check_load in the clients.cfg file:

define service {
use    generic-service
host_name    lamp
service_description    Check Load
check_command    check_load!15,10,5!30,20,10
    }

Save the file and close the editor.



Remember to restart the Nagios service before you check the Plugin status in the  Nagios UI. The newly added service should be visible as shown below.



4) check_swap
This plugin is used to check the availability of swap space of a system. It returns alerts if the current value of the swap space is less than that of the set threshold. 

Syntax
check_swap -w <used_percentage> -c <used_percentage>
(or)
check_swap -w <bytes_free> -c <bytes_free>

Example:
# check_load -w 20% -c 10%
The following example will check the swap space utilization on the current host. If the free swap space is less than 20%, the plugin will generate a warning, else it will generate a critical alert if the free swap space falls below 10% 



To monitor this service from the Nagios UI, first add the command definition in the commands.cfg file:

define command {
command_name    check_swap
command_line       $USER1$/check_swap -w $ARG1$ -c $ARG2$
    }

Once done, save the file and close the editor.



Add its corresponding service definition as well:

define service {
use    generic-service
host_name    lamp
service_description    Check Swap Space
check_command    check_swap!20!10
    }

Restart the Nagios service and view the updated service from the Nagios UI.



5) check_users
This plugin checks the number of users currently logged in on the local system and generates an error if the number exceeds the thresholds specified.


Syntax
check_users -w <users> -c <users>

Example:
# check_users -w 2 -c 3
The following example will raise a warning if the number of currently logged in users is 2, or an alert if the number exceeds to 3.


To monitor this service from the Nagios UI, first add the command definition in the commands.cfg file:

define command {
command_name    check_users
command_line       $USER1$/check_users -w $ARG1$ -c $ARG2$
    }

Once done, save the file and close the editor.


Add its corresponding service definition as well:

define service {
use    generic-service
host_name    lamp
service_description    Check Currently Logged in Users
check_command    check_swap!2!3
    } 

There's a typo in the screenshot below, pls avoid that.. ;)



Restart the Nagios service and check the UI for the plugin info to show up.



6) check_tcp
This plugin is used to check TCP connections with a particular host.


Syntax
check_tcp -H <hostname> -p <TCP_port> -w {warning_time} -c {critical_time}

Example:
# check_tcp -H 192.168.50.152 -p 80
The following example will raise a warning if the number of currently logged in users is 2, or an alert if the number exceeds to 3.
 

To monitor this service from the Nagios UI, first add the command definition in the commands.cfg file:

define command {
command_name    check_tcp
command_line       $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ 
    }

Once done, save the file and quit the editor.


Add its corresponding service definition as well:

define service {
use    generic-service
host_name    lamp
service_description    Test Port connectivity on TCP Port 80
check_command    check_tcp!80
    } 



Thats all there is to it. Now restart the Nagios Service and view the updated service from the Nagios UI as shown below. 



Coming up next, Monitoring MySQL Databases with Nagios!! So stay tuned!!