Вы находитесь на странице: 1из 58

GnuGroup International

Nagios
IT infrastructure monitoring tool
ILG
Insight GNU/Linux Group
Reinventing the way you,
Think,
Learn,
Work

NAGIOS

MODULE - 2

www.gnugroup.org

Index of module - 2

External Commands

NagiosQL

Event Handlers

Security Considerations

Volatile Services

Freshness Checks

State Stalking

Flapping

Using Templates

Oject Inheritance

Passive Check /NSCA

Clustering - Distributed
Monitoring
Redundant and Failover
Network Monitoring www.gnugroup.org

External Commands

Nagios can process commands


from external applications
(including the CGIs) and alter
various aspects of its monitoring
functions based on the commands
it receives.
External applications can submit
commands by writing to the
command file, which is
periodically processed by the
Nagios daemon.

www.gnugroup.org

External Commands
Enabling External Commands
In order to have Nagios process external commands, make sure you do the following:

Enable external command checking with the check_external_commands option.

Set the frequency of command checks with the command_check_interval option.

Specify the location of the command file with the command_file option.

Setup proper permissions on the directory containing the external command file, as described in the quickstart
guide.

When Does Nagios Check For External Commands?

At regular intervals specified by the command_check_interval option in the main configuration file

Immediately after event handlers are executed. This is in addtion to the regular cycle of external command checks
and is done to provide immediate action if an event handler submits commands to Nagios.

Using External Commands

External commands can be used to accomplish a variety of things while Nagios is running.

Example of what can be done include temporarily disabling notifications for services and hosts, temporarily
disabling service checks, forcing immediate service checks, adding comments to hosts and services, etc.

www.gnugroup.org

Event Handlers
Event handlers are optional system commands (scripts or executables) that are run whenever a
host or service state change occurs.

An obvious use for event handlers is the ability for Nagios to proactively fix problems before anyone
is notified. Some other uses for event handlers include:

Restarting a failed service

Entering a trouble ticket into a helpdesk system

Logging event information to a database

Etc

When Are Event Handlers Executed?

Event handlers are executed when a service or host:

Is in a SOFT problem state

Initially goes into a HARD problem state

Initially recovers from a SOFT or HARD problem state


.

www.gnugroup.org

Event Handlers

Event Handler Types


There are different types of optional event handlers that you can define to handle host and state changes:

Global host event handler


Global service event handler
Host-specific event handlers
Service-specific event handlers

Enabling Event Handlers


Event handlers can be enabled or disabled on a program-wide basis by using the enable_event_handlers
in your main configuration file.
Host- and service-specific event handlers can be enabled or disabled by using the event_handler_enabled
directive in your host and service definitions.
Host- and service-specific event handlers will not beexecuted if the global enable_event_handlers option is disabled.

www.gnugroup.org

Event Handlers
Example of Event Handlers
Host file directive
define service{
use

local-service

host_name

localhost

service_description
check_command

daemons
check_nrpe!check_daemons

event_handler

restart-services

}
In your commands.cfg file, make sure you have event_handler defined something like:
define command{
command_name
command_line

restart-services
/usr/local/nagios/libexec/eventhandlers/restart-services \

$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$


}
The problem we have is that the event_handler runs as the Nagios user, which tyipcally will not be able to restart a service.
Edit the SUDOERS file (visudo) and add something like the lines below to the end of the file.
User_Alias NAGIOS = nagios,nagcmd
Cmnd_Alias NAGIOSCOMMANDS = /sbin/service
Defaults:NAGIOS !requiretty
NAGIOS

ALL=(ALL)

www.gnugroup.org

NOPASSWD: NAGIOSCOMMANDS

Volatile Services
Nagios has the ability to distinguish between "normal" services and "volatile"
services.

The is_volatile option in each service definition allows you to specify whether a
specific service is volatile or not.
For most people, the majority of all monitored services will be non-volatile (i.e.
"normal"). However, volatile services can be very useful when used properly...
What Are They Useful For?
Volatile services are useful for monitoring...

Things that automatically reset themselves to an "OK" state each time they
are checked

Events such as security alerts which require attention every time there is a
problem (and not just the first time)

www.gnugroup.org

Freshness Check
Nagios supports a feature that does "freshness" checking on the
results of host and service checks.
The purpose of freshness checking is to ensure that host and
service checks are being provided passively by external
applications on a regular basis.
Freshness checking is useful when you want to ensure that passive
checks are being received as frequently as you want.
This can be very useful in distributed and failover monitoring
environments.

www.gnugroup.org

10

Freshness Check
How Does Freshness Checking Work?

Nagios periodically checks thefreshness of the results for all hosts


services that have freshness checking enabled.
A freshness threshold is calculated for each host or service.
For each host/service, the age of its last check result is compared with
the freshness threshold.
If the age of the last check result is greater than the freshness
threshold, the check result is considered "stale".
If the check results is found to be stale, Nagios will force an active
check of the host or service by executing the command specified by in
the host or service definition.

www.gnugroup.org

11

Freshness Check
Enabling Freshness Checking

Enable freshness checking on a program-wide basis with the check_service_freshness and


check_host_freshness directives.

Use service_freshness_check_interval and host_freshness_check_interval options to tell Nagios how


often in should check the freshness of service and host results.

Enable freshness checking on a host- and service-specific basis by setting the check_freshness option
in your host and service definitions to a value of 1.

Configure freshness thresholds by setting the freshness_threshold option in your host and service
definitions.

Configure the check_command option in your host or service definitions to reflect a valid command
that should be used to actively check the host or service when it is detected as stale.

The check_period option in your host and service definitions is used when Nagios determines when a
host or service can be checked for freshness, so make sure it is set to a valid timeperiod.

www.gnugroup.org

12

Freshness Check
An example of a service that might require freshness checking might be
one that reports the status of your nightly backup jobs.

Perhaps you have a external script that submit the results of the backup
job to Nagios once the backup is completed.
In this case, all of the checks/results for the service are provided by an
external application using passive checks.
In order to ensure that the status of the backup job gets reported every
day, you may want to enable freshness checking for the service.
If the external script doesnt submit the results of the backup job, you can
have Nagios fake a critical result by doing something like this.

www.gnugroup.org

13

Freshness Check
For example, the following service definition will accept passive checks but will report an error if they are not

present:
define service
{

Use generic-service

host_name linuxbox02

service_description SSH

check_command no-passive-check-results

check_freshness 1

freshness_threshold 43200

active_checks_enabled 1

passive_checks_enabled 1

The freshness_threshold option specifies the number of seconds after which an active check should be
performed. In this case, it is set to 12 hours.

www.gnugroup.org

14

Freshness Check
It is also necessary to define a command that will be run if no
passive check results have been provided.
The following command will use the check_dummy plugin to report an error:
define command
{
command_name no-passive-check-results
command_line $USER1$/check_dummy 2 "No passive check
results"
}

www.gnugroup.org

15

Freshness Check
Enabling Freshness Checking

Enable freshness checking on a program-wide basis with the check_service_freshness and


check_host_freshness directives.

Use service_freshness_check_interval and host_freshness_check_interval options to tell Nagios how


often in should check the freshness of service and host results.

Enable freshness checking on a host- and service-specific basis by setting the check_freshness option
in your host and service definitions to a value of 1.

Configure freshness thresholds by setting the freshness_threshold option in your host and service
definitions.

Configure the check_command option in your host or service definitions to reflect a valid command
that should be used to actively check the host or service when it is detected as stale.

The check_period option in your host and service definitions is used when Nagios determines when a
host or service can be checked for freshness, so make sure it is set to a valid timeperiod.

www.gnugroup.org

16

State Stalking

State "stalking" is a feature which is probably not going to used by


most users.
When enabled, it allows you to log changes in the output service
and host checks even if the state of the host or service does not
change.
When stalking is enabled for a particular host or service, Nagios will
watch that host or service very carefully and log any changes it
sees in the output of check results.
As youll see, it can be very helpful to you in later analysis of the log
files.

www.gnugroup.org

17

State Stalking

How Does It Work?


Under normal circumstances, the result of a host or service check is only
logged if the host or service has changed state since it was last checked.
There are a few exceptions to this, but for the most part, thats the rule.
If you enable stalking for one or more states of a particular host or
service, Nagios will log the results of the host or service check if the
output from the check differs from the output from the previous check.
Take the following example of eight consecutive checks of a service:

www.gnugroup.org

18

State Stalking

Take the following example of eight consecutive checks of a service:

www.gnugroup.org

19

State Stalking

Why is this? With state stalking enabled, Nagios would have examined the
output from each service check to see if it differed from the output of the
previous check.
If the output differed and the state of the service didnt change between the
two checks, the result of the newer service check would get logged.
The decision to to enable state stalking for a particular host or service will also
depend on the plugin that you use to check that host or service.
If the plugin always returns the same text output for a particular state, there is
no reason to enable stalking for that state.
You can enable state stalking for hosts and services by using the
stalking_options directive in host and service definitions.
Volatile services are similar, but will cause notifications and event handlers
to run. Stalking is purely for logging purposes.
www.gnugroup.org

20

Flapping

Flapping is a situation where a host or service changes states very


rapidlyconstantly switching between working correctly and not
working at all.
This can happen for various reasonsa service might crash after a
short period of operating correctly or due to performing some
maintenance by system administrators.
Nagios can detect that a host or service is flapping, if Nagios is
configured to do so.
It does so by analyzing previous results, in terms of how many state
changes between have happened and within a specific period of time.
Nagios keeps a history of the 21 most recent checks and analyzes
changes within that history.

www.gnugroup.org

21

Using Templates

Templates in Nagios allow you to create a set of parameters that


can then be used in the definitions of multiple hosts, services, and
contacts.
The main purpose of templates is to keep parameters that are
generic to all objects, or a group of objects, in one place.
This way, you can avoid putting the same directives in
hundreds of objects, and your configuration is more maintainable.
It is also good to start using templates for hosts and services, and
decide how they should be used.

www.gnugroup.org

22

Object Inheritance

Basics
There are three variables affecting recursion
and inheritance that are present in all object
definitions..
The first variable is name. Its just a "template"
name that can be referenced in other object
definitions so they can inherit the objects
properties/variables. Template names must be
unique amongst objects of the same type,

The second variable is use. This is where you


specify the name of the template object that you
want to inherit properties/variables from. The
name you specify for this variable must be
defined as anotherobjects template named
(using the name variable)

define someobjecttype{
object-specific variables ...
name template_name
use name_of_template_to_use
register
[0/1]
}

The Third, Register, Defining templates in


Nagios is very similar to defining actual objects.
You simplydefine the template as the required
object type. The only difference is that you
needto specify the register directive and
specify a value, of 0 for it. This will tell
www.gnugroup.org
Nagiosthat it should not treat this as an actual
object, but as a template.

23

Object Inheritance

Basics
There are three variables affecting recursion
and inheritance that are present in all object
definitions..
The first variable is name. Its just a "template"
name that can be referenced in other object
definitions so they can inherit the objects
properties/variables. Template names must be
unique amongst objects of the same type,

The second variable is use. This is where you


specify the name of the template object that you
want to inherit properties/variables from. The
name you specify for this variable must be
defined as anotherobjects template named
(using the name variable)

define someobjecttype{
object-specific variables ...
name template_name
use name_of_template_to_use
register
[0/1]
}

The Third, Register, Defining templates in


Nagios is very similar to defining actual objects.
You simplydefine the template as the required
object type. The only difference is that you
needto specify the register directive and
specify a value, of 0 for it. This will tell
www.gnugroup.org
Nagiosthat it should not treat this as an actual
object, but as a template.

24

Object Inheritance

www.gnugroup.org

25

Passive Check / NSCA

Another great feature that Nagios offers is the ability for third-party software or
other Nagios instances to report information on the status of services or hosts.
This way, Nagios does not need to schedule and run checks by itself, but other
applications can report information asit is available to them.
This means that your applications can send problem reports directly to Nagios
instead of just logging them
Nagios also offers a tool for sending passive check results for hosts and services
over a network. It is called NSCA (Nagios Service Check Acceptor).
It can be used to send results from one Nagios instance to another.
This mechanism includes password protection, along with encryption, to
preventinjection of false results in to Nagios. In this way, NSCA communication sent
over Internet is more secure.

www.gnugroup.org

26

Passive Check / NSCA

There are also different types of checks including external applications or devices
that want to report information directly to Nagios.
This can be done to gather all critical errors to a single, central place. These types
of checks are called Passive Checks.
For example, when a web application cannot connect to the database, it will let
Nagios know about it immediately.

It can also send reports after a database recovery, or periodically, even if


connectivity to the database has been consistently available, so that Nagios has an
up-to-date status.
This can be done in addition to active checks,to identify critical problems earlier.
Nagios also offers a way of combining the benefits of both active and passive
checks

www.gnugroup.org

27

Passive Check / NSCA

The first thing that needs to be done in order to use passive checks for your Nagios
setup is to make sure that you have the following options in your main Nagios
configuration file:

accept_passive_service_checks=1

accept_passive_host_checks=1

It would also be good to enable the logging of incoming passive checks

This makes determining the problem of not processing a passive check much
easier. The following directive allows it:

log_passive_checks=1

www.gnugroup.org

28

Passive Check / NSCA

Setting up hosts or services for passive


checking requires an object to be
defined and set up so as not to perform
active checks
define host

Use

10.1.1.45

active_checks_enabled
passive_checks_enabled
}

generic-host

host_name linbox1
Address

Configuring services is exactly the


same as with hosts

define service

0
1

Use

host_name linbox1

service_description PING

active_checks_enabled

passive_checks_enabled 1

ping-template

In this case, Nagios will never perform any active checks on its own and will only
rely on the results that are passed to it.

We can also configure Nagios so that if no new information has been provided
within a certain period of time, it www.gnugroup.org
will use active checks to get the current status of
the host or service by setting the active_checks_enabled option to 1

29

NSCA

NSCA is an application that allows the sending of results directly to the


Nagiosexternal command pipe.

NSCA consists of two partsthe server and the client.


The part responsible for receiving check results and passing them to Nagios is the
server.
This listens on a specific TCP port for NSCA clients passing information.
It accepts and authenticates incoming connections and passes these results to the
Nagios external command pipe.
All information is encrypted using the MCrypt library

www.gnugroup.org

30

NSCA

www.gnugroup.org

31

NSCA

www.gnugroup.org

32

NSCA

www.gnugroup.org

33

NSCA

NSCA is an application that allows the sending of results directly to the


Nagiosexternal command pipe.

NSCA consists of two partsthe server and the client.


The part responsible for receiving check results and passing them to Nagios is the
server.
This listens on a specific TCP port for NSCA clients passing information.
It accepts and authenticates incoming connections and passes these results to the
Nagios external command pipe.
All information is encrypted using the MCrypt library

www.gnugroup.org

34

Clustering

One of the first bottlenecks organizations will run into is performance when
monitoring a large number of hosts and services.
This can occur even earlier if you are using performance handlers on your service
or host checks.
One way to resolve performance problems is to cluster Nagios;
clustering is also very useful when there are a number of remote sites that need to
be monitored by Nagios
Usually, there are one or more Nagios instances that report information to a single
central Nagios instance.
The servers that reports information to another Nagios machine as a slave.
A Nagios instance that receives reports from oneor more slaves will be referred to
as a master.
.

www.gnugroup.org

35

One Nagios Instance

36

Clustering

One of the first bottlenecks organizations will run into is performance when
monitoring a large number of hosts and services.
This can occur even earlier if you are using performance handlers on your service
or host checks.
One way to resolve performance problems is to cluster Nagios;
clustering is also very useful when there are a number of remote sites that need to
be monitored by Nagios
Usually, there are one or more Nagios instances that report information to a single
central Nagios instance.
The servers that reports information to another Nagios machine as a slave.
A Nagios instance that receives reports from oneor more slaves will be referred to
as a master.
.

www.gnugroup.org

37

Many Nagios Instances

38

Clustering

www.gnugroup.org

39

Clustering
Data Flow

www.gnugroup.org

40

Clustering

www.gnugroup.org

41

Clustering

remote site Configuration


install Nagios as normal on the server and then change the following parameters in
nagios.cfg to allow it to function properly in our Nagios cluster:

enable_notifications = 0 ; # We do not want this instance sending out

notifications.

obsess_over_services=1 ; # We want the remote server to obsess over


services so all changes will be reported back to the master server.

oscp_command=nsca_send_result ; # This is a custom script shown next


With these configuration changes in place, the remote Nagios server will call the
command nsca_send_result after every service check executed on the remote
host.

The nsca_send_result script will then forward the service check results to the
master Nagios server.
www.gnugroup.org

42

Clustering

The nsca_send_result script will then forward the service check results to the
master Nagios server. Place the following definition for nsca_send_result in your
commands
configuration file (commands.cfg by default):
define command{
command_name nsca_send_result
command_line /usr/local/nagios/libexec/nsca_send_result
$HOSTNAME$ $SERVICEDESC$ $SERVICESTATE$
$SERVICEOUTPUT$
}

www.gnugroup.org

43

NagiosQL

NagiosQL is a powerful web-based


GUI tool that helps you configure
and manage your Nagios network
monitor.
NagiosQL is a web-based GUI tool
that you can use for the
administration work.

NagiosQLs features include these capabilities:

Build complex configurations

Manage and use all of your configurations

Create, delete, modify, and copy settings

Create and export configuration files

Create and download configuration files

Easy configuration import

Auto backup configuration files

Consistency checks

Syntax verification

User management

Instant activation of new configurations

MySQL database platform

www.gnugroup.org

44

NagiosQL
NagiosQLs installation requirements

Web server (Apache 2.x or greater preferred)

MySQL 5.x or greater

Nagios 2.x/3.x (local or remote)

PHP 5.2.0 or greater including:

PHP Module: Session

PHP Module: MySQL

PHP Module: gettext

PHP Module: filter

PHP Module: FTP (optional)

PECL Extension: SSH (optional)

Javascript activated in Web browser

www.gnugroup.org

45

NagiosQL

Extract the downloaded file

Open a terminal.
Change to the document root with the command cd /var/www/html.
Unpack the newly downloaded tar file with the command sudo tar xvzf
nagiosql_XXX.tar.gz (XXX is the release number).
Rename the newly created nagiosql32 directory to nagiosql with the command
sudo mv nagiosql32 nagiosql.

www.gnugroup.org

46

NagiosQL

Change the permissions of the necessary folders

You must run the following commands in order to give NagiosQL the proper permission to
install and run. (Note: This assumes your web server runs under the www-data user name; if it
doesnt, alter the commands to suit your setup.)

Nagios main configuration files

sudo chgrp www-data /etc/nagios

sudo chgrp www-data /etc/nagios/nagios.cfg

sudo chgrp www-data /etc/nagios/cgi.cfg

sudo chmod 775 /etc/nagios

sudo chmod 664 /etc/nagios/nagios.cfg

sudo chmod 664 /etc/nagios/cgi.cfg

www.gnugroup.org

47

NagiosQL

Change the permissions of the necessary folders

NagiosQL configuration

sudo chmod 6755 /etc/nagiosql

sudo chown www-data.nagios /etc/nagiosql

sudo chmod 6755 /etc/nagiosql/hosts

sudo chown www-data.nagios /etc/nagiosql/hosts

sudo chmod 6755 /etc/nagiosql/services

sudo chown www-data.nagios /etc/nagiosql/services

www.gnugroup.org

48

NagiosQL

NagiosQL backup configuration

sudo chmod 6755 /etc/nagiosql/backup

sudo chown www-data.nagios /etc/nagiosql/backup

sudo chmod 6755 /etc/nagiosql/backup/hosts

sudo chown www-data.nagios /etc/nagiosql/backup/hosts

sudo chmod 6755 /etc/nagiosql/backup/services

sudo chown www-data.nagios /etc/nagiosql/backup/services

Amend already existing files

sudo chmod 644 /etc/nagiosql/*.cfg

sudo chown www-data.nagios /etc/nagiosql/*.cfg

sudo chmod 644 /etc/nagiosql/hosts/*.cfg

sudo chown www-data.nagios /etc/nagiosql/hosts/*.cfg

sudo chmod 644 /etc/nagiosql/services/*.cfg

sudo chown www-data.nagios /etc/nagiosql/services/*.cfg

The Nagios binary must be executable by the Apache user


sudo chown nagios.www-data /usr/sbin/nagios

sudo chmod 750 /usr/sbin/nagios-

www.gnugroup.org

49

NagiosQL

Change the permissions of the necessary folders

NagiosQL configuration

sudo chmod 6755 /etc/nagiosql

sudo chown www-data.nagios /etc/nagiosql

sudo chmod 6755 /etc/nagiosql/hosts

sudo chown www-data.nagios /etc/nagiosql/hosts

sudo chmod 6755 /etc/nagiosql/services

sudo chown www-data.nagios /etc/nagiosql/services

www.gnugroup.org

50

NagiosQL
Configuration
Directory Structure
/etc/nagiosql/
"

-> Common configuration files


/hosts

-> Host configuration files

"

/services

-> Service configuration files

"

/backup/

-> Backups of the common configuration files

"

"

/hosts

-> Backups of the host configuration files

"

"

/services

-> Backups of the service configuration files

www.gnugroup.org

51

NagiosQL

: Begin the web install


You should be able to fire up your browser and point it to
http://ADDRESS_TO_SERVER/nagiosql/install/ (ADDRESS_TO_SERVER is the
actual address of the server hosting NagiosQL), where you can begin the web-based
installation.

www.gnugroup.org

52

NagiosQL
NagiosQL will make sure everything passes muster for the installation. If anything fails, this
screen will give you plenty of information about the problem.

www.gnugroup.org

53

NagiosQL
Configure a database
The installer creates a database for you.

www.gnugroup.org

54

NagiosQL
Log in. You log in to your NagiosQL site by pointing your browser to
http://ADDRESS_TO_SERVER/nagiosql/

www.gnugroup.org

55

NagiosQL
Admin screen

www.gnugroup.org

56

Security Considerations
Best Practices

Use a Dedicated Monitoring Box


Dont Run Nagios As Root

Nagios doesnt need to run as root, so dont do it. You can tell Nagiosto drop privileges after startup and run as another
user/group by using the nagios_user and nagios_group directives in the main config file. If you need to execute event handlers
or plugins which require root access, you might want to try using sudo.

Lock Down The Check Result Directory

Make sure that only the nagios user is able to read/write in the check result path. If users other than nagios (or root) are able to
write to this directory, they could send fake host/service check results to the Nagios daemon.

Lock Down The External Command File.


If you enable external commands, make sure you setproper permissions on the /usr/local/nagios/var/rw directory. You only
want the Nagios user (usuallynagios) and the web server user (usually nobody, httpd, apache2, or www-data) to have
permissions to write to the command file.

Require Authentication In The CGIs.

Use Full Paths In Command Definitions.

Hide Sensitive Information With $USERn$ Macros

secure Communication Channels

.and many more.....................

www.gnugroup.org

57

Thanks
&
Questions / Answers??????

www.gnugroup.org

58

Вам также может понравиться