Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Nagios

News See also Books Recommended Links Perl

 

    Humor Etc

Nagios was formerly known as Netsaint. It was written by Ethan Galstad.

Nagios (formerly Netsaint) is a daemon written in C that is designed to monitor networked hosts and services. It has the ability to notify contacts (via email, pager or other methods) when problems arise and are resolved. Host and service checks are performed by external "plugins", making it easy to write custom checks in your language of choice. Several CGIs are included in order to allow you to view the current and historical status via a Web browser, and a WAP interface is also provided to allow you to acknowledge problems and disable notifications from an internet-ready cellphone.

Nagios monitors a wide variety of system properties, including system performance metrics such as load average and free disk space; the presence of important services like HTTP and SMTP; and per-host network availability and reachability. It also allows the system administrator to define what constitutes a significant event on each host--for example, how high a load average is "too high"--and what to do when such conditions are detected.

In addition to detecting problems with hosts and their important services, Nagios also allows the system administrator to specify what should be done as a result. A problem can trigger an alert to be sent to a designated recipient via various communication mechanisms (such as email, Unix message, pager). It is also possible to define an event handler: a program that is run when a problem is detected. Such programs can attempt to solve the problem encountered, and they can also proactively prevent some serious problems when they get triggered by warning conditions.

The information that Nagios collects is displayed in a series of automatically generated Web pages. This format is quite convenient in that it allows a system administrator to view network status information from various points throughout the network.

The narrow column on the left of the display lists links to all of the possible Nagios displays (the one for the current display has been highlighted in the illustration). The Tactical Overview shows very general statistics about the overall network status. In this case, 20 hosts are being monitored, and 16 are currently up. Three hosts are down, and one is unreachable from the monitoring system, presumably because the gateway to it is down. Of the problems on the three hosts that are down, one has been acknowledged by a system administrator. The display also indicates that there are three services that have "critical" status (probably indicating a failure), and two others are in a "warning" state.

Each of the problem indicator displays also functions as a link to another Web page giving details about that particular item.

Figure 3 illustrates the detailed display that can be obtained for an individual host (or device). Here we see some detailed information about a host named leah. Once again, there are several sections to the display. The host name and IP address appear in the upper left of the display, along with an icon that the system administrator has assigned to this host. Here, the icon suggests that the system's operating system is some version of Windows; conventionally, icons are keyed to the operating system type. The table in the upper right gives some overall uptime and reachability statistics about the host over the period that the current monitoring session has been running.

The table below the operating system icon, titled "Host State Information" provides information about the current status of the host, including whether or not it is up, how long it has been that way, when it was last checked, and the command used to perform the check, and the settings of various configuration parameters (such as host notifications and event handler).

The box titled "Host Commands" contains a series of links, which allow the system administrator to perform many different monitoring-related actions on this host. The various items are described in Table 1. Examining the list will give you further details about Nagios' capabilities.

Table 1. Available actions in the Nagios Host Information display

Item Meaning
Disable checks of this host Stop monitoring this host for availability.
Acknowledge this host problem Respond to a current problem (discussed below).
Disable notifications for this host Don't send alerts if this host is unavailable.
Delay next host notification Delay the next alert for host unavailability.
Schedule downtime for this host. Cancel scheduled downtime for this host Define or cancel schedule downtime. During downtime, host unavailability is not considered a problem
Disable notifications for all services on this host. Enable notifications for all services on this host. Don't/do send alerts if a service on this host fails.
Schedule an immediate check of all services on this host Check all services as soon as possible (rather than waiting for their next scheduled time).
Disable checks of all services on this host
Enable checks of all services on this host
 
Disable or enable checking service health on this host.
Disable event handler for this host Prevent the event handler from running when a problem is detected on this host.
Disable flap detection for this host Don't try to detect flaps (rapid up-down or on-off oscillations) on this host or its services.

The second menu item allows you to acknowledge any current problem. Acknowledging simply means "I know about the problem, and it is being handled." Nagios marks the corresponding event as such, and future alerts are suppressed until the item returns to its normal state. This process also allows you to enter a comment explaining the situation, an action that is helpful when more than one administrator regularly examines the monitoring data.

If you don't like all of these table-oriented status displays, Nagios also has the capability to use graphical ones. For example, Figure 4 illustrates a map created for the small network being monitored here. The map is laid out to indicate three separate groups of hosts, with host taurus serving as a gateway between the group at the upper left and the ones at the bottom of the window.

Much more complex network topologies can be represented in an analogous way. See the Nagios Web site for example screen shots.

Configuring Nagios

Initially, configuring Nagios can seem daunting, and there is a fair amount of startup overhead to getting things going. But keep in mind that:

Nagios uses the following configuration files:

The package provides sample starter versions of all of these file. We will consider some aspects of these file types in the remainder of this article.

Nagios configuration files are generally stored in /usr/local/nagios/etc

The nagios.cfg File

This configuration file contains directives that apply to the entire Nagios monitoring system. Here is an annotated sample version illustrating some of its most important features:

# File locations
log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
resource_file=/usr/local/nagios/etc/resource.cfg
lock_file=/usr/local/nagios/var/nagios.lock
... 

The first part of the configuration file specifies various file locations, including the general log file, files holding service check command and notification and event handler command definitions (checkcommands and misccommands). Other cfg_file directives are used by the administrator to specify the object definition files in use at that site (indicated by the one in red). Locations for other types of files follow. The lock file holds the PID of the current nagios process.
 

# Logging settings
log_rotation_method=d 
log_archive_path=/usr/local/nagios/var/archives
use_syslog=1
log_host_retries=1
log_event_handlers=1
...

These directives specify logging settings, including how often logs are rotated (here, daily), the archive directory for old files, whether to log significant problems to syslog as well, and whether to log individual event types.

# Global settings
nagios_user=nagios
nagios_group=nagios
date_format=us
admin_email=nagadmin
admin_pager=19995551212

These lines specify various global settings, including the user/group as which the nagios daemon runs, the output format for dates (here, US style), and the administrator's email address. The final item sets the value of the $ADMINPAGER$ macro, which can be used in command definitions.

# Package-wide event handlers
enable_event_handlers=1
global_host_event_handler=global-event-command
global_service_event_handler=global-svc-command

Settings related to event handlers. You can optionally define a single event handler for all host failures and service failures in this file if appropriate. Commands are defined in an object configuration file.

# Concurrent checks and time-outs
max_concurrent_checks=0
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
...

These directives control the number of maximum checks that can be made at the same time (0 means an unlimited number), as well as time-outs for various types of commands (values in seconds).

# Retained status information
retain_state_information=1
retention_update_interval=60
use_retained_program_state=1

These lines tell Nagios to retain information about host and service status between sessions, saving the values every 60 seconds, and reloading them when the facility starts up.

# Passive service checks
accept_passive_service_checks=1
check_service_freshness=1

These directives enable "passive checks": status data produced by external commands which Nagios imports periodically.

# Save Nagios data for later use
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata

These directives allow you to save Nagios data externally for long term analysis or other purposes. The commands specified here must be defined in some object configuration file. The simplest such command simply writes the command's output to an external file: e.g., echo $OUTPUT$ >> file, but you can perform whatever action is appropriate (e.g., send the data to an RRDTool or other database).

Note that the directives appear in a slightly different order in the sample nagios.cfg file provided with the package.

Object Configuration Files

The bulk of Nagios configuration occurs in the object configuration files. These files define hosts and services to be monitored, how various status conditions should be interpreted, and what actions should be taken when they occur. These files are used to define the following items:

The items in red will need to be defined for virtually every Nagios installation; the ones in black are optional. In the sample Nagios configuration provided with the package, each type of object is defined in a separate configuration file (named after the object type, excluding any spaces). However, you can arrange your definitions in any form that makes sense to you.

Hosts and Host Groups

All of these items are defined via templates: named sets of attributes and settings that can be easily applied to any number of actual objects. For example, here is a template definition for hosts:

define host{
; Template name
   name                      normal  
; This is only a template (not a real host)
   register                       0 

; Host notifications are enabled
   notifications_enabled          1  
; Command to check if host is available
   check_command   check-host-alive  
; Recheck failures this many times
   max_check_attempts            
; Repeat failure notifications every 2 hours
   notification_interval        120  
; When to check (time period name)
   notification_period         24x7  
; Notify when down, unreachable and on recovery
   notification_options       d,u,r  
; Host event handler is enabled
   event_handler_enabled          1  
; Event handler command (defined elsewhere)
   event_handler            host-eh  
; Flap detection is disabled
   flap_detection_enabled         0  
; Save performance data
   process_perf_data              1 
; Save status information across restarts 
   retain_status_information      1  
}

This template defines a variety of host-monitoring settings (which are explained in the comments following the semicolons). Here is a host definition that uses this template:

define host{
; Template on which to base host
   use                        normal  
; Note the attribute is not "name" as above
   host_name                  beulah  
; Longer description
   alias            beulah: SuSE 8.1  
; IP address
   address              192.168.1.44  
; Overrides template value
   max_check_attempts              8  
}

Other hosts may be defined in a similar way. Host definitions themselves can also be used as templates, provided that a name attribute is included.

Once hosts have been defined, they may be placed into host groups via directives like this one:

define hostgroup{
   hostgroup_name      bldg2
   alias               Building 2
   contact_groups      admins1
   members             beulah,callisto,ariadne,leah,lovelace,valley
}

This definition creates the host group named bldg2, consisting of six hosts (all previously defined via define host directives). The contact_groups attribute specifies who to send notifications to, and it is defined elsewhere (as we'll see).

You can use as many host groups as you want to. Hosts can be part of multiple host groups, and host groups themselves may be nested.

Services

Here are two service templates and a service definition:

define service{  ; Define defaults for all services
   name                  generic
   register                    0
; Check service every 30 minutes
   normal_check_interval      30  
; Retry failing checks every 3 minutes, up to 5 times
   retry_check_interval        3  
   max_check_attempts          5
   event_handler_enabled       1
   check_period             24x7
; Repeat notifications for failures every 2 hours
   notification_interval     120  
   notification_period     6to22
; Notify contacts about critical failures/recoveries
   notification_options      c,r  
   notifications_enabled       1
   contact_groups         admins
} 

define service{  ; Define the SMTP service
   use                    generic
   name              generic-smtp
   register                     0

   service_description Check SMTP
   check_command       check_smtp
   event_handler          eh_smtp
   contact_groups      mailadmins
}

define service{  ; Define services to be monitored
   use               generic-SMTP
; Monitor SMTP for all hosts in this host group
   host_groups          mailhosts  
}

The first template (generic) defines some settings, which can be applied to a variety of service types. The second template, generic-SMTP, uses the first template as a starting point and adds to them in order to create a generic SMTP monitoring service. Specifically, it defines a check command, an event handler, and a contact group that are appropriate for the SMTP service. The final define service stanza sets up SMTP monitoring for all of the hosts in the mailhosts host group.

Contacts and Contact Groups

Here are two stanzas defining a contact and a contact group:

define contact{
   contact_name                    nagadmin
   alias                           Nagios Admin
; When to notify about service problems
   service_notification_period     6to22  
; When to notify about host problems
   host_notification_period        24x7   
; Notify on critical problems and recoveries
   service_notification_options    c,r    
; Notify on host down and recoveries
   host_notification_options       d,r    
   service_notification_commands   notify-by-email
   host_notification_commands      host-notify-by-epager
   email                           nagios-admins@ahania.com
   pager                           $ADMINPAGER$
}

define contactgroup{
   contactgroup_name               mailadmins
   alias                           Mail Admins
   members                         mailadm,chavez,catfemme
}

The first stanza defines a contact named nagadmin. It also defines what events to notify this contact about and the time periods during which notifications should be sent. The commands to use to generate the alerts are also specified, along with arguments to them (see below).

Time Periods

Time period definitions are quite simple. Here are the definitions of the two time periods we have used so far:

define timeperiod{
   timeperiod_name 24x7
   alias           24 Hours A Day, 7 Days A Week
   sunday          00:00-24:00
   monday          00:00-24:00
   tuesday         00:00-24:00
   wednesday       00:00-24:00
   thursday        00:00-24:00
   friday          00:00-24:00
   saturday        00:00-24:00
}

define timeperiod{
   timeperiod_name 6to22
   alias           Weekdays, 6 AM to 10 PM
   Monday          06:00-22:00
   Tuesday         06:00-22:00
   Wednesday       06:00-22:00
   Thursday        06:00-22:00
   Friday          06:00-22:00
}

Note that only the applicable days need be included in the definition.

Commands

The commands referred to in many of the preceding object definitions also must be defined. For example, here is the SMTP service check command definition:

define command{
   command_name  check_smtp
   command_line  $USER1$/check_smtp -H $HOSTADDRESS$
}

This command runs the check_smtp script stored in the directory defined in the macro $USER1$ (defined in the resource.cfg file--see below); this macro conventionally holds the path to the Nagios plug-ins directory. The command is passed the option -H, followed by the IP address of the host to be checked (the latter is expanded from the built-in $HOSTADDRESS$ macro).

You can determine the syntax for any plug-in by running it with the --help option. You can also extend Nagios by adding custom plug-ins of your own. See the documentation for details on how to accomplish this.

Event handers are defined in the same way, as in this example:

define command{
   command_name  eh_smtp
   command_line  /usr/local/nagios/eh/fix_mail $HOSTADDRESS$ $STATETYPE$
}

Here, we define the command named eh_smtp. It specifies the full path to a program to run, passing two arguments: the host's IP address and the value of the $STATETYPE$ macro. This item is set to HARD for critical failures and SOFT for warnings.

Here are the definitions of commands used for notifications (we've wrapped the command_line setting for clarity):

define command{
   command_name  notify-by-email
   command_line  /usr/bin/printf "%b" "***** Nagios 1.0 *****\n\n
                 Notification Type: $NOTIFICATIONTYPE$\n\n
                 Service: $SERVICEDESC$\n
                 Host: $HOSTALIAS$\n
                 Address: $HOSTADDRESS$\n
                 State: $SERVICESTATE$\n\n
                 Date/Time: $DATETIME$\n\n
                 Additional Info:\n\n$OUTPUT$" | 
     /usr/bin/mail -s "** $NOTIFICATIONTYPE$
     alert - $HOSTALIAS$/$SERVICEDESC$ 
                 is $SERVICESTATE$ **" $CONTACTEMAIL$
}

This command constructs a simple email message using the printf command and many built-in Nagios macros. It then sends the message using the mail command, specifying the recipient as the $CONTACTEMAIL$ macro. The latter contains the value of the corresponding email attribute for the host or service that is generating the alert.

The cgi.cfg File

The cgi.cfg configuration file has several different functions with the Nagios system. Among the most important is authentication, allowing Nagios and its data to be restricted to appropriate people. Here are some sample directives related to authorization:

use_authentication=1
authorized_for_configuration_information=netsaintadmin,root,chavez
authorized_for_all_services=netsaintadmin,root,chavez,maresca

The first entry enables the access control mechanism. The next two entries specify users who are allowed to view Nagios configuration information and services status information (respectively). Note that all users also must be authenticated to the Web server using the usual Apache htpasswd mechanism.

This same configuration file is also used to store settings for icon-based status displays, as in these examples:

hostextinfo[janine]=;redhat.gif;;redhat.gd2;;168,36;,,;
hostextinfo[ishtar]=;apple.gif;;apple.gd2;;125,36;,,;

These entries specify extended attributes for the hosts defined in the entries labeled janine and ishtar. The filenames in this example specify images files for the host in status tables (GIF format--see Figure 3) and in the status map (GD2 format), and the two numeric values specify the device's location--for example, x and y coordinates--within the 2D status map. (Figure 4 provides an example status map display).

The resource.cfg File

The final configuration file we will consider is the resource.cfg file. It is used to define site-specific macros, conventionally named $USER1$ through $USER32$:

# $USER1$ = path to plugins directory
$USER1$=/usr/lib/nagiosplugins    
...

# Store a username and password (hidden)
$USER3$=administrator
$USER4$=somepassword

The first macros defines the path to the Nagios plug-ins directory; this usage is assumed by the supplied sample configuration files.

The other two macros are used in this case to store a username and password. These items can be used in command definitions for added security. The resource.cfg file itself can be protected against all non-root access without compromising the ability of CGI programs to run successfully.

Pre-Checking a Nagios Configuration

Since Nagios configuration is somewhat involved, the package provides a command that can be used to verify it prior to running the program. Here is an example of its use:

# cd /usr/local/nagios/etc
# /usr/local/nagios/bin/nagios -v nagios.cfg

This will check the Nagios configuration, which uses nagios.cfg as its main configuration file.

Old News ;-)

[Jun 25, 2008] freshmeat.net Project details for check_oracle_health

About:
check_oracle_health is a plugin for the Nagios monitoring software that allows you to monitor various metrics of an Oracle database. It includes connection time, SGA data buffer hit ratio, SGA library cache hit ratio, SGA dictionary cache hit ratio, SGA shared pool free, PGA in memory sort ratio, tablespace usage, tablespace fragmentation, tablespace I/O balance, invalid objects, and many more.

Release focus: Major feature enhancements

Changes:
The tablespace-usage mode now takes into account when tablespaces use autoextents. The data-buffer/library/dictionary-cache-hitratio are now more accurate. Sqlplus can now be used instead of DBD::Oracle.

[Jun 11, 2008] check_lm_sensors 3.1.0  by Matteo Corti

About: check_lm_sensors is a Nagios plugin to monitor the values of on-board sensors and hard disk temperatures on Linux systems.

Changes: The plugin now uses the standard Nagios::Plugin CPAN classes, fixing issues with embedded perl.

[May 6, 2008] freshmeat.net Project details for check_logfiles

Perl plugin: check_logfiles is a plugin for Nagios which checks logfiles for defined patterns

check_logfiles 2.3.3 (Default)
Added: Sun, Mar 12th 2006 15:09 PDT (2 years, 1 month ago)
Updated:
Tue, May 6th 2008 10:37 PDT (today)
About:

check_logfiles is a plugin for Nagios which checks logfiles for defined patterns. It is capable of detecting logfile rotation. If you tell it how the rotated archives look, it will also examine these files. Unlike check_logfiles, traditional logfile plugins were not aware of the gap which could occur, so under some circumstances they ignored what had happened between their checks. A configuration file is used to specify where to search, what to search, and what to do if a matching line is found.

[Aug 10, 2007] {Book} Building a Monitoring Infrastructure with Nagios by David Josephsen

A short, superficial into book (190). Killing phaze from the review below: "This is the book you should pass to your manager so (s)he understands why and how an open solution like Nagios is the better choice and can be used for achieving surpassing solutions. "
Warning: Several reviews of this book looks like plants: written by the author who has a single networking book review or just a single review.
Spot on for a well structured book with many WOW-factors,

May 17, 2007 By  Nils Valentin (Tokyo, Japan) - See all my reviews
(REAL NAME)    --- DISCLAIMER: This is a requested review by PTR, however any opinions expressed within the review are my personal ones. ---

Introduction - 6p

CHAPTER 1 Best Practices - 12p
CHAPTER 2 Theory of Operations - 26p
CHAPTER 3 Installing Nagios - 11p
CHAPTER 4 Configuring Nagios - 23p
CHAPTER 5 Bootstrapping the Configs - 10p
CHAPTER 6 Watching - 46p
CHAPTER 7 Visualization - 42p
CHAPTER 8 Nagios Event Broker Interface - 19p
APPENDIX A Configure Options - 3p
APPENDIX B nagios.cfg and cgi.cfg - 9p
APPENDIX C Command-Line Options - 10p
Index - 14p

The book is with 190 pages (230p. when including appendix and index) very compact. It teaches you Nagios in a way I have never heard / read before. I must assume that the authors clear structured style - which runs through the book like a red line - must be responsible for the excellent outcome.

The book starts in the introduction with the title "Do it right the first time" and that hits it right on the spot. What make out the features of this little portable knowledgebase is the exceptional well thought through contents and its explanations by the author. David is not filling pages by explaining each and every parameter, but rather showing you the big picture, and explaining how to approach new issues or how one technical solution is better over another.

This is the book you should pass to your manager so (s)he understands why and how an open solution like Nagios is the better choice and can be used for achieving surpassing solutions.

The book itself basically is divided in two sections:

Background, setup and configuration - Chapters 1-5
Advanced Topics - Chapters 6-8

I did find any of the chapters to have a nice balance of the amount of information needed but some EXCEPTIONAL good parts of book where:

Chapter 1 Best practices
Chapter 2 - the part about scheduling
Chapters 6-8 as a whole

Chapter 6 has a thorough explanations on monitoring the different OS's (especially the Windows part !!) or other applications.

Chapter 7 for its overall thoroughness of how to visualize your data to reach the next level of a better understanding of the systems / network you are monitoring.

Chapter 8 is describing a filesystem based status interface. The NEB module will write a file with its current status code for each service. I have to admit that some technical details went over my head, but I thought that was pretty cool !!


The featured points above is what I found to be exceptionally good and most likely the strongest sales points for this little portable knowledgebase. That doesnt mean that the other not mentioned parts of the book are weak, mind you.

Funny enough the above mentioned points where EXACTLY the points which I haven't seen explained this thorough anywhere before.

So David's book was exactly spot on for me.

Summary:

To sum it all up in very simple words: This is a hell of a book !!

Its the most compact, well structured book on Nagios that I have seen to date. It contains many WOW-factors. While reading each chapter you can virtually "feel" how Davids explanations and tips and tricks already helped you to avoid time consuming pitfalls.

So this book is not about "to buy or not to buy", this is an investment you dont want to miss !!

I was especially impressed by the thoroughness the book is written by from the first page. Also the contents of the first chapter wasnt new to me, the way it was explained already provided many of those A-ha moments.

The main asset of the book is not the description of the tools itself, but rather the tought and considerations the author put into it and the sharing of those thoughts in a way that the reader can actually visualize how and why one solution is better over another, without actually having to go to the "luxury to experience the pitfalls" in a live disaster scenario.


PS: AFTER I finished reading the book I re-read the "Editorial Review" Amazon gave above and found it pretty well describing the actual book and what you should expect.

>> You can find more reviews on Nagios related books including a comparison by deploying my profile. <<

Nagios Looking Glass Getting started

With the Nagios Looking Glass (NLG) tool, developer Andy Shellam has tried to resolve a common problem for network administrators running Nagios. What happens if you want to provide access to up-to-date information from Nagios without giving users access to the full Nagios console? Providing read-only access to the Nagios console can be complicated, and can occasionally require network re-structuring or can even pose a security risk.

NLG is designed to fix those issues by taking a feed from Nagios status data via an HTTP connection and displaying it on a public Web server. It works in a client-server model with a PHP-based polling server installed on your Nagios server. A receiver client, also PHP-based, is installed on your Web server. If you want to use NLG locally, you can also run the client and the server together on your Nagios server. The receiver client creates an AJAX-enabled page based on a template. You can also customize this template to display whatever you require.

You can see a demo of NLG at http://looking-glass.andyshellam.eu/demo/.

[Jul 18, 2007] Nagios Looking Glass Getting started by James Turnbull


02.05.2007

Nagios also comes with a Web-based console, extensible Nagios Event Broker (NEB), that allows you to integrate Nagios with other tools, like database back-ends, and a large collection of monitoring commands and capabilities. It's current release, version 2.0, is stable and production ready. You can take a look at Nagios at http://www.nagios.com.

Development of Nagios has not stopped with version 2.0, though. Nagios' principal developer, Ethan Galstad, has recently released some information on the status and potential features of the next release, version 3.0. Galstad's announcement also suggests an alpha release of version 3.0 could be scheduled as early as the end of February 2007.

Features: What's new in Nagios 3.0

So what's new with version 3.0? Well, a lot. Let's walk through the major new features and look at how some of Nagios' old features have been expanded or changed.

One of the interesting features introduced in Nagios 2.0 was adaptive monitoring. Adaptive monitoring allowed a Nagios configuration to be changed during runtime. For example, you can change the command being used to check a host, based on changing conditions in your environment. In the new version, this functionality is expanded to include the ability to change the times during which checks are scheduled to occur. This allows you to turn on/off checks at specific times according to conditions in your environment.

Notifications have also been enhanced, now allowing a delay to be added to first notifications. Notifications can be generated when flapping is disabled and, most importantly, notifications can now be sent out when a scheduled downtime starts, ends or is cancelled.

Objects and templates haven't been forgotten either. One particularly useful change is the ability to use multiple templates for objects. Another is the addition of custom variables in host, service and contact objects. Version 2.0 only allows the application of one template to an object. Multiple templates offer greater flexibility and power, which will make a significant difference to the configuration of objects.

Custom variables allow you to define your own directives in object definitions and, therefore, attach additional information about an object to its definition. These variables can be retrieved and used elsewhere in your Nagios environment. For instance, you could define the SNMP community strings for a host in its definition and then use these later in a check or external command.

Other object and template changes include: merging service and host-extended information object data into service and host object definitions, and adding group member directives to the host and service group objects.

Enhancements to external commands are also present, including the ability to process commands found in an external file. The suggested use of this functionality is for passive checks with long output or complicated scripting. A further added to Nagios 3.0 is that external command checking is now turned on by default. In previous versions, such checking was set off by default.

Host and service logic alterations have also been made. Most notably, host checks now run asynchronously in parallel with each other. This should help balance overall check performance. Another enhancement is the ability to cache host and service check results and a function to enable the predictive checking of dependent hosts and services.

The ability to output multiple lines of data from host and service checks has also been added. Previously, Nagios 2.0 was limited to a single line of output from checks, thus reducing the utility of some checks. Now, multiple lines can be received and processed by Nagios and the size of plug-in output has been correspondingly increased to 2Kbs.

A number of performance optimizations have been included in Nagios 3.0, as well as enhancements to the Nagios Event Broker and the embedded Perl interpreter. Also worth mentioning are updates to macros and to status, comment and retention data.

[Jul 18, 2007] Using modules in Nagios Event Broker by James Turnbull

"The most well-known NEB module is the NDO Utilities module. The NDO Utilities module is written by Nagios' developer, Ethan Galstad, and is designed to output events and data from Nagios to standard file or a Unix socket. "

01.29.2007 | searchenterpriselinux.techtarget.com

The Nagios enterprise monitoring tool generates a variety of events. The principal events generated are the results of monitoring applications, databases, devices, services and hosts. Also generated is performance data and notification events such as outages and downtime. There are a number of ways to integrate and utilize these events. The most advanced and effective event integration mechanism is the Nagios Event Broker (NEB).

NEB uses callback routines that are executed when events occur in the Nagios server. Using NEB you can write broker modules that can process these events. NEB allows you to output and integrate events into a variety of tools including MySQL databases, SNMP traps, syslog messages or use the event data in a variety of other applications and tools.

Nagios Event Broker functions and triggers

NEB uses shared code libraries called modules that are hooked into the Nagios server when it is executed. Each module can register callback procedures that are able to receive and process events. When an event occurs, NEB checks for the presence of a registered callback and, if detected, sends the event to the module. The module receives the event and performs whatever actions are coded into it.

The broker can process a large number of events including, amongst others:

You can see a full list of the callbacks in the nebcallbacks.h include file located in the include directory of the Nagios source package.

Enabling Nagios Event Broker

NEB should be enabled by default when you compile Nagios (unless you disable it). If you want to ensure that NEB gets compiled then specify the --enable-neb configure option when configuring Nagios.

# ./configure --enable-neb

Registering modules with Nagios Event Broker

Modules are included into the Nagios configuration by using broker_module configuration options in the nagios.cfg configuration file. For example:

broker_module=/usr/local/nagios/bin/testmodule.o 

This line would load a module called testmodule.o located in the /usr/local/nagios/bin directory. You can also specify a configuration file for a module like so:

broker_module=/usr/local/nagios/bin/testmodule.o config_file=/usr/local/nagios/etc/testmodule.cfg

You need to restart Nagios for any newly defined modules to take effect.

Writing modules for Nagios Event Broker

NEB Modules can be written in C or C++. You can see an example of a module in the Nagios package. Located in the module directory off the root of the package is the helloworld module. You can create it by compiling the helloworld.c file.

# gcc -shared -o helloworld.o helloworld.c

You can then add this module to Nagios using the broker_module directive in the nagios.cfg configuration file. Restart Nagios and the module is now loaded.

The Helloworld module is extremely simple. Helloworld logs a message to the default Nagios log file when Nagios is started and stopped and when aggregated status updates start and finish. The message looks like:

  
[1137151111] helloworld: An aggregated status update just started.
[1137151112] helloworld: An aggregated status update just finished.

You can review the contents of this module (which includes some basic inline documentation)

Available modules for Nagios Event Broker

There are not a lot of NEB modules available, so far. The most well-known NEB module is the NDO Utilities module. The NDO Utilities module is written by Nagios' developer, Ethan Galstad, and is designed to output events and data from Nagios to standard file or a Unix socket. It also comes with a module, NDO2DB, that can write Nagios data to a MySQL or PostgreSQL database. It should provide (together with the helloworld module) a good introduction to NEB and help you get started on writing your own modules.

You can also find the following NEB modules:

  • NEB module that logs to a socket based on client requests
  • A NEB module (as yet unreleased) that does event correlation with Nagios and SEC.
  • A NEB module that helps integrate Cacti with Nagios.

    Further help with Nagios Event Broker

    There is not a lot of documentation available for NEB thus far. The only major piece of documentation available is about the NEB API. You can also review the Nagios source code relevant to NEB, particularly the include files.

    As always the Nagios development and user mailing lists are good starting places for assistance.

  • [Apr 12, 2007] Sys Admin Taming Nagios by David Josephsen

    Dec 05, 2005 ( Sys Admin)

    In the past few years, Nagios has become the industry standard open source systems monitoring tool. If you're using an open source app to monitor the availability, state, or utilization of your servers or network gear, then chances are you are using Nagios to do it. To those who have worked with it, this is no surprise. The lightweight design of Nagios offloads the actual query logic into "plug-ins", which are easily created, modified, and re-purposed by sys admins. The lack of complex query logic leaves the Nagios daemon free to manage scheduling and notifications and to handle UI.

    Nagios's "keep it simple" approach makes it straightforward to administer, network transparent, and amazingly flexible.

    Two excellent articles by Syed Ali in previous editions of Sys Admin covered the installation and configuration of Nagios. In this article, I'll pick up where those articles left off and provide some creative solutions to problems commonly faced by sys admins working with Nagios to monitor the health and performance of systems.

    [Apr 10, 2007] Nagios network monitoring felled by SNMP false alarms By Jack Loftus

    It is still unclear why false alerts were generated. Is this just a plug for Hyperic ?
    There was nothing technically wrong with the HP ProLiant servers at Mynewplace.com, an online rental services agency based in San Francisco, but the IT staff kept on getting beeped at 4 a.m. with alerts that eventually proved to be false alarms.

    So while the servers were fine, the IT staff wasn't. Entire days were being wasted each month diagnosing their clutch of 50 HP ProLiant DL145s and DL385s running Red Hat Enterprise Linux 4 AS and ES, said John Shin, Mynewplace.com's director of systems. Shin decided he needed to make some changes. .

    Struggling with network monitoring

    "We were struggling with monitoring," Shin said, but that may have been an understatement. Things were so bad, in fact, that at one point last year he contemplated disabling the monitoring application altogether because it was doing more harm than good.

    The application was Nagios, a popular open source systems and network monitoring application that provides alerts for user-defined hosts and services. In Shin's network, however, it was triggering false alarms because of simple network management protocol [SNMP] incompatibilities with Mynewplace.com's open source application server, Resin 3.0. Resin is based on a Java implementation of the PHP scripting language and is maintained and supported by San Diego-based Caucho Technology Inc.

    Nagios, JVM and Resin 3.0 woes

    Since Resin and Nagios were not directly compatible, Shin would expose the application stack's Java virtual machines (JVMs) through SNMP and monitor the environment that way. Unfortunately, response times under those conditions were sluggish, he said.

    "Nagios was not really the problem," Shin said. "It was the JVM stack not being able to respond to it correctly. It was recording events in SNMP that were then watched by Nagios and that made things crawl. There were a lot of man hours wasted, and it would trigger the 4 a.m. pages."

    In spite of its popularity on open source repositories like SourceForge.net, Nagios has its detractors. In a recent interview about Nagios with SearchEnterpriseLinux.com, Zenoss Inc. CEO Bill Karpovich criticized Nagios for its lack of enterprise-level support. "The maintainers never thought of it as a project that an IT manager would use to monitor an entire enterprise environment," he said. Zenoss is an open source startup vendor in the systems management space.

    ... ... ...

    The feature-rich, expensive offerings from HP and the other members of the "big four" – IBM, CA and BMC – have spawned the "little four" (a phrase coined by analyst firm RedMonk), comprised of Hyperic, Zenoss, Qlusters and GroundWork. Executives from those companies have bet their chips on the valuable midmarket for customer wins like Mynewplace.com.

    Compared with OpenView, offerings from the "little four" were priced approximately two-and-a-half times less on average, Shin found, although he would not cite specific dollar amounts. OpenView had another strike against it: "It did not have the framework in place to monitor some of our key applications," namely Resin and Postgres, Shin said.

    Linux Today - Linux.com Complex Service Checks with Nagios

    [Feb 05, 2007] Looking ahead to Nagios 3.0 by James Turnbull

    Nagios is a free, open source enterprise monitoring tool designed to run on Linux. It has extensive monitoring and management capabilities that allow you to check applications, databases and network devices, as well as Windows and Unix/Linux hosts and services. It is easy to install, fast to configure and highly customizable.

    Nagios also comes with a Web-based console, extensible Nagios Event Broker (NEB), that allows you to integrate Nagios with other tools, like database back-ends, and a large collection of monitoring commands and capabilities. It's current release, version 2.0, is stable and production ready. You can take a look at Nagios at http://www.nagios.com.

    Development of Nagios has not stopped with version 2.0, though. Nagios' principal developer, Ethan Galstad, has recently released some information on the status and potential features of the next release, version 3.0. Galstad's announcement also suggests an alpha release of version 3.0 could be scheduled as early as the end of February 2007.

    Features: What's new in Nagios 3.0

    So what's new with version 3.0? Well, a lot. Let's walk through the major new features and look at how some of Nagios' old features have been expanded or changed.

    One of the interesting features introduced in Nagios 2.0 was adaptive monitoring. Adaptive monitoring allowed a Nagios configuration to be changed during runtime. For example, you can change the command being used to check a host, based on changing conditions in your environment. In the new version, this functionality is expanded to include the ability to change the times during which checks are scheduled to occur. This allows you to turn on/off checks at specific times according to conditions in your environment.

    Notifications have also been enhanced, now allowing a delay to be added to first notifications. Notifications can be generated when flapping is disabled and, most importantly, notifications can now be sent out when a scheduled downtime starts, ends or is cancelled.

    Objects and templates haven't been forgotten either. One particularly useful change is the ability to use multiple templates for objects. Another is the addition of custom variables in host, service and contact objects. Version 2.0 only allows the application of one template to an object. Multiple templates offer greater flexibility and power, which will make a significant difference to the configuration of objects.

    Custom variables allow you to define your own directives in object definitions and, therefore, attach additional information about an object to its definition. These variables can be retrieved and used elsewhere in your Nagios environment. For instance, you could define the SNMP community strings for a host in its definition and then use these later in a check or external command.

    Other object and template changes include: merging service and host-extended information object data into service and host object definitions, and adding group member directives to the host and service group objects.

    Enhancements to external commands are also present, including the ability to process commands found in an external file. The suggested use of this functionality is for passive checks with long output or complicated scripting. A further added to Nagios 3.0 is that external command checking is now turned on by default. In previous versions, such checking was set off by default.

    Host and service logic alterations have also been made. Most notably, host checks now run asynchronously in parallel with each other. This should help balance overall check performance. Another enhancement is the ability to cache host and service check results and a function to enable the predictive checking of dependent hosts and services.

    The ability to output multiple lines of data from host and service checks has also been added. Previously, Nagios 2.0 was limited to a single line of output from checks, thus reducing the utility of some checks. Now, multiple lines can be received and processed by Nagios and the size of plug-in output has been correspondingly increased to 2Kbs.

    A number of performance optimizations have been included in Nagios 3.0, as well as enhancements to the Nagios Event Broker and the embedded Perl interpreter. Also worth mentioning are updates to macros and to status, comment and retention data.

    To see a full list of the changes, or if you wish to try Nagios 3.0 before its alpha release, you can download a current CVS snapshot from http://www.nagios.org/development/cvs.php . The Changelog file contained in the snapshot provides a reasonably full list of the proposed changes.

    [Feb 06, 2007] SearchOpenSource: Nagios Looking Glass: Getting Started


    Notes:
    • Those pages are written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
    • This is a Spartan WHYFF (We Help You For Free) site. It cannot replace the best teachers and the best books.
    • The site contain some obsolete pages as it develops like a living tree... Some links on older pages are broken. Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.

    Search Amazon by keywords:

    Google   
    Open directory

    Research Index

     

    Recommended Links


    In case of broken links please try to use Google search. If you find the page please notify us about new location
    Google     


    Books

    Nagios: System and Network Monitoring

    Best for Nagios admins who want specific details on plug-ins, September 4, 2006
    By  Richard Bejtlich "TaoSecurity.com" (Washington, DC) - See all my reviews
    (TOP 500 REVIEWER)    (REAL NAME)   
    I recently received review copies of Pro Nagios 2.0 (PN2) by James Turnbull and Nagios: System and Network Monitoring (NSANM) by Wolfgang Barth. I read PN2 first, then NSANM. Both are excellent books, but I expect potential readers want to know which is best for them. The following is a radical simplification, and I could honestly recommend readers buy either (or both) books. If you are completely new to Nagios and want a very well-organized introduction, I recommend PN2. If you are somewhat familiar with Nagios and want detailed descriptions of a wide variety of Nagios plug-ins, I recommend NSANM.

    NSANM strengths lie in the depth of coverage of certain elements when compared to PN2. PN2 devotes 7 pages to host checks, while NSANM's Ch 7 offers 21 pages. PN2 supplies 8 pages on service checks, but NSANM's Ch 6 gives 46 pages. This level of detail can be very useful. For example, NSANM's explanation of check_squid also shows to to configure Sguid to allow access to its cache manager.

    NSANM shares more information on certain background protocols like SNMP. PN2's SNMP section is about 7 pages, whereas NSANM's Ch 11 is 36 pages. NSANM demonstrates more aspects of Nagios' Web interface and the CGI programs generating pages. I thought author Wolfgang Barth made very effective use of diagrams, like the network topology explanation in Ch 4, the service checks in Ch 5, and notification in Ch 12.

    NSANM includes some material not mentioned in PN2, like using Nagios with Cygwin. Sometimes the books are very complementary, as shown by PN2's discussion of NSClient++ and NSANM's overview of NSClient and NC_Net.

    NSANM is lacking coverage of security, redundancy, and failover, however. PN2 does address these critical issues. Beware the some of the "chapters" in NSANM are very short -- like Ch 8 (2 pages!) and Ch 19 (barely 6 pages). I think short sections like those should have been integrated into longer chapters or moved into the appendices.

    Overall, NSANM is a very good book. I believe new Nagios readers should read PN2, and strongly consider NSANM as a complementary reference volume.

     

    Pro Nagios 2.0 (Expert's Voice in Open Source)

    Building a Monitoring Infrastructure with Nagios by David Josephsen

    A short, superficial into book (190). Killing phaze from the review below: "This is the book you should pass to your manager so (s)he understands why and how an open solution like Nagios is the better choice and can be used for achieving surpassing solutions. "
    Warning: Several reviews of this book looks like plants: written by the author who has a single networking book review or just a single review.
    Spot on for a well structured book with many WOW-factors,

    May 17, 2007 By  Nils Valentin (Tokyo, Japan) - See all my reviews
    (REAL NAME)    --- DISCLAIMER: This is a requested review by PTR, however any opinions expressed within the review are my personal ones. ---

    Introduction - 6p

    CHAPTER 1 Best Practices - 12p
    CHAPTER 2 Theory of Operations - 26p
    CHAPTER 3 Installing Nagios - 11p
    CHAPTER 4 Configuring Nagios - 23p
    CHAPTER 5 Bootstrapping the Configs - 10p
    CHAPTER 6 Watching - 46p
    CHAPTER 7 Visualization - 42p
    CHAPTER 8 Nagios Event Broker Interface - 19p
    APPENDIX A Configure Options - 3p
    APPENDIX B nagios.cfg and cgi.cfg - 9p
    APPENDIX C Command-Line Options - 10p
    Index - 14p

    The book is with 190 pages (230p. when including appendix and index) very compact. It teaches you Nagios in a way I have never heard / read before. I must assume that the authors clear structured style - which runs through the book like a red line - must be responsible for the excellent outcome.

    The book starts in the introduction with the title "Do it right the first time" and that hits it right on the spot. What make out the features of this little portable knowledgebase is the exceptional well thought through contents and its explanations by the author. David is not filling pages by explaining each and every parameter, but rather showing you the big picture, and explaining how to approach new issues or how one technical solution is better over another.

    This is the book you should pass to your manager so (s)he understands why and how an open solution like Nagios is the better choice and can be used for achieving surpassing solutions.

    The book itself basically is divided in two sections:

    Background, setup and configuration - Chapters 1-5
    Advanced Topics - Chapters 6-8

    I did find any of the chapters to have a nice balance of the amount of information needed but some EXCEPTIONAL good parts of book where:

    Chapter 1 Best practices
    Chapter 2 - the part about scheduling
    Chapters 6-8 as a whole

    Chapter 6 has a thorough explanations on monitoring the different OS's (especially the Windows part !!) or other applications.

    Chapter 7 for its overall thoroughness of how to visualize your data to reach the next level of a better understanding of the systems / network you are monitoring.

    Chapter 8 is describing a filesystem based status interface. The NEB module will write a file with its current status code for each service. I have to admit that some technical details went over my head, but I thought that was pretty cool !!


    The featured points above is what I found to be exceptionally good and most likely the strongest sales points for this little portable knowledgebase. That doesnt mean that the other not mentioned parts of the book are weak, mind you.

    Funny enough the above mentioned points where EXACTLY the points which I haven't seen explained this thorough anywhere before.

    So David's book was exactly spot on for me.

    Summary:

    To sum it all up in very simple words: This is a hell of a book !!

    Its the most compact, well structured book on Nagios that I have seen to date. It contains many WOW-factors. While reading each chapter you can virtually "feel" how Davids explanations and tips and tricks already helped you to avoid time consuming pitfalls.

    So this book is not about "to buy or not to buy", this is an investment you dont want to miss !!

    I was especially impressed by the thoroughness the book is written by from the first page. Also the contents of the first chapter wasnt new to me, the way it was explained already provided many of those A-ha moments.

    The main asset of the book is not the description of the tools itself, but rather the tought and considerations the author put into it and the sharing of those thoughts in a way that the reader can actually visualize how and why one solution is better over another, without actually having to go to the "luxury to experience the pitfalls" in a live disaster scenario.


    PS: AFTER I finished reading the book I re-read the "Editorial Review" Amazon gave above and found it pretty well describing the actual book and what you should expect.

    >> You can find more reviews on Nagios related books including a comparison by deploying my profile. <<



    Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

    Last modified: June 25, 2008