Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Unix System Monitoring

News See also Recommended Links SE Perl Pipes Perfomance monitoring
Mon Spong Big Sister Nagios SAR TEC SSH-based monitoring
Probe Architecture The structure of the event Communication with "mothership" Delivery of probes Aggregation and pre-filtering of events Event correlation engine Enterprise Logs Collection and Analysis Infrastructure
Website monitoring Web log analysis Sample simple monitoring scripts Syslog monitoring History Humor Etc
 

Any sufficiently large and complex monitoring package written in C contains buggy and ad hoc implementation of at least 66% of Perl interpreter.

Reintrpretation of P. Greenspun quote about Lisp

A computer system consists of processors, memory, and I/O devices. Users log in to the system and run applications or jobs. Some application are running from the startup or cron.  Many important system resources can be monitored for events and faults, and many system management tools are available with which to monitor them.  the key question in monitoring is the level of complexity that is optimal. I suspect that it is much lower that most of current implementations demonstrate and raising above this level is counterproductive. That means that simpler open source monitoring systems has tremendous advantages over the complex one in the long run.

We can define several categories of monitoring:

  1. Monitoring System Configuration Changes This category includes monitoring for changes in hardware and software configurations that can be caused by an operating system upgrade, patches applied to the system, changes to kernel parameters, or the installation of a new software application, for example. The root cause of system problems can often be traced back to an inappropriate hardware or software configuration change. Therefore, it is important to keep accurate records of these changes, because the problem that a change causes may remain latent for a long period before it surfaces. Adding or removing hardware devices typically requires the system to be restarted, so configuration changes can be tracked indirectly (in other words, remote monitoring tools would notice system status changes). However, software configuration changes, or the installation of a new application, are not tracked in this way, so reporting tools are needed. Also, more systems are becoming capable of adding hardware components online, so hardware configuration tracking is becoming increasingly more important.
     

  2. Monitoring System Faults. After ensuring that the configuration is correct, the first thing to monitor is the overall condition of the system. Is the system up? Can you talk to it, ping it, run a command? If not, a fault may have occurred. Detecting system problems ranges from determining whether the system is up to determining whether it is behaving properly. If the system either isn't up or is up but not behaving properly, then you must determine which system component or application is having a problem.
     

  3. Monitoring System Resource Utilization. For an application to run correctly, it may need certain system resources such as the amount of CPU or I/O bandwidth an application is entitled to use during a time interval. Other examples include the number of open files or sockets, message segments, and system semaphores that an application has. Usually an application (and operating system) has fixed limits for each of these resources, so monitoring their use is important. If they are exhausted, the system may no longer function properly. Another aspect of resource utilization is studying the amount of resources that an application has used. You may not want a given workload to use more than a certain amount of CPU time or fixed amount of disk space. Some resource management tools, such as quota, can help with this.
     

  4. Monitoring System Security. system's availability can be impacted through unauthorized use. Performance and resource controls are not useful if the system is used for the wrong purposes.
     

  5. Monitoring System Performance. Monitoring the performance of system resources can help to indicate problems with the operation of the system. Bottlenecks in one area usually impact system performance in another area. CPU, memory, and disk I/O bandwidth are the important resources to watch for performance bottlenecks.  establish baselines you should monitor system during typical usage periods. Understanding what is "normal"  helps to identify when system resources are scares during  a particular periods (for example "rush hours"). Resource management tools are available that can help you to allocate system resources among applications and users.

"The big troika" - OpenView, Tivoli, Unicenter  dominate large enterprise space. They are very complex and expensive products, products which require dedicated staff and provide relatively low return on investment. Especially taking into account the TCO which dramatically increases with each new version due to overcomplexity.  I think that it is true to state that in a way dominant vendors painted themselves into a corner by raising the complexity far above the level normal sysadmin can bear.  My experience with big troika is mainly in "classic" Tivoli before Candle (aka Tivoli Monitoring 6.1) and Micromuse acquisitions, but still I think this statement reflects the reality of "big vendors ESM products". Also those are old products and the architecture became by-and-large outdated. 

And there is a need to increate complexity for such vendors that cannot be resisted: the feeling of insecurity and the desire to protect their franchise vendors are logically pushed by events into the road of converted them into a monsters. They still can be very scalable (Tivoli definitely is) despite overcomplexity, but the flexibility of the solutions and the quality of interface leaves much to be desired.

That opens some space for open source montroing solution which can be much simpler and more rely on espanslished protocols (for example HTTP and SSH). Important fact which favors simpler solutions is that in any organization, usefulness of the monitoring package is limited to the ability of personnel to tweak it to the environment.  Packages with tuning that are above the head of the personnel can actually be harmful. 

In any organization, usefulness of the monitoring package is limited to the ability of personnel to tweak it to the environment.  Packages with the complexity of tuning that are above the head of the personnel can actually be harmful (Tivoli Monitoring 5.1 with its complex API and JavaScript-based extensions is a nice example of the genre)

Since adequate (and very expensive) training for those products is often skipped as an overhead, it' not surprising that many companies will never get more than the most basic functionality for a very expensive (and theoretically very capable) product. And basic functionality is better provided by simple free packages. So extremes meet. This situation might be called a system monitoring paradox. That's exactly what makes Tivoli consultants (and OpenView consultants) happy.

It costs quite a lot to maintain and customize those tools even in large enterprise environment where money for this are readily available. Keeping good monitoring specialist on the job is also a problem as once person become really good in scripting they tend to move to other, more interesting, areas like web development.  The strong point of big troika is support and availability of professional services but the costs are prohibitive.  But it is important to understand that complex products to a certain extent reflect the environmental complexity and not all tasks can be performed by simple products although 80% might be s a reasonable estimate. 

That means that the $3.6 billion market for enterprize system management software is ripe for competition from products that utilize scripting languages instead of trying to foresee each and every need the enterprise can have. Providing simple scripting framework for writing probes and customizing interface lower the barrier of entry and is not in the interests of large vendors as it can lower their profits.  They cannot compete in this space. What is interesting is that scripting-based monitoring solutions are pretty powerful and proved to be competitive with much more complex "compiled" or Java-based offerings. There are multiple scripting-based offerings from startups and even individual developers which can deliver 80% of the benefits of  big troika products for 20% of cost of less and without millions of lines of Java code, an army of consultants and IT managers and annual conferences for big brass.  

Scripting languages beat Java in area of monitoring hands down and if a monitoring product is written in a scripting language this should be considered to be a strategic advantage.  Advantage that is worth to fight for.

Scripting languages beat Java in area of monitoring hands down and if a monitoring product is written in a scripting language this should be considered to be a strategic advantage.  Advantage that is worth to fight for.

First of all because codebase is more maintainable and flexible. Integration of plug-ins written in the same scripting language is simpler. Debugging problems is much simpler. Everything is simpler.  But at the same time I would like to warn that open source is not a panacea and it has its own (often hidden) costs and pitfalls. In a corporate environment other things equal you are better off with open source solution behind which there is at least a start-up.  Badly configured or buggy monitoring package can be a big security risk. In no way that means that, say, Tivoli installations in real world are secure, but they are more obscure and security via obscurity works pretty well in a real world ;-)

Let's reiterate the key problems with moster, "enterprise ready", packages:

Architectural Issues

If you are designing a monitoring solution you need to solve almost a dozen  of pretty complex design problem. The ingenuity and flexibility the solution for each of those problems represent the quality of architecture. Among those that we consider the most important are:

  1. Probe architecture.  Probe architecture should provide a simple and flexible way to integrate existing capability of the system (especially existing system utilities) and convert then into usable alerts. Probes can communicate two major things: status information (for example the current CPU utilization is 0.2) and event information (for example disk utilization fro a particular partition exceeded the given threshold). 

    Often the interface with the "mothership" is delegated to a special agent which contains all the complex machinery necessary for transmitting event to the event server. In this case probes communicate with the agent. Such an agent can be a stand alone executable that is invoked by each probe via pipe (sendevent type of the agent).  In this case HTML/XML  based protocols are natural (albeit more complex and more difficult to parse then necessary), although keyword-value pairs are also pretty competitive and much simpler. For keyword value pairs you need a special long multiline value option, though. Unix provides the necessary syntax in "here" documents. 

    For efficiency an agent can be coded in C, although on modern machines this is not strictly necessary. In case of HTML any command like browser like lynx can be used as a "poor man agent". In this case the communication with the server needs to be organized via forms.

    SMTP mail, as bad as it is, also proved to be a viable communication channel for for transmitting events from probes to the "mothership".
     
  2. The structure of the event. This structure of event should be convenient for transmitting of  information from the probe and usually consist of a certain number of predefined fields (hostname, timestamp, name of the probe, etc)  and any number of user defininable fields. Generally C-structure based events are flexible enough for description of a large variety of events and also convenient for representing events hierarchically so that you can reuse more basic events for creation of derivatives (inheritance).  The ability of create new event using inheritance is really convenient. In this sense BAROC is not that bad (although fixed length strings sucks badly and should be replaced with variable length strings.   Description of event also should provide for default values (like in BAROC) and possibly tag fields can be ignored in duplicate detection.
     
  3. Protocol for communication between problems and "mothership". The reliability and the cost of communicating between probes and "mothership"  are important.  Reuse of existing protocol such ast HTTP, SMTP or SNMP or some combination provides some important advantages over the reinventing the wheel.
     
  4. The protocol for delivery of probes to remote locations and running them ( agents or protocols like ssh in case of "agentless" design)
     
  5. Aggregation and pre-filtering of events Those are the simplest type of correlation and due to its important it should be considered separately and designed and implemented on a different level than full fledged correlation solution. Here regular expression capabilities are more then enough and you do not need anything more complex. The common solution, used, for example, in Tivoli is to use  gateways for this purpose.  Gateways can be just another instance of the same "master system" or different more specialized version".

    One simple and effective way of aggregation is converting events into "tickets": groups of events that corresponds to a serviceable entity (for example a server)
     
  6. Event correlation engine This engine should provide a flexible way to filter and correlate events.  This is a pretty complex part of the monitoring solution as correlation engine operated  on the "window" of current events and that windows should be constantly updated and provide view of certain number of past events in a round robin fashion.  Perl arrays are a good approximation of functionality required for such an  event window (updatable slots, the order is important, there should be capability of deletion after certain amount of time even if the event was not displaced by more current events. The simplest correlation engines are usually SQL based and they operate against a special database that is totally memory based.  More complex are Prolog-based. I do not see why a scripting language like Perl cannot be used as correlation engine with a proper library.
     
  7. The way to schedule and run remote probes with the ability to rerun failed (can be done via local scheduling and, say, ssh protocol or on the local host with possibility of remote updates of schedules, or remote scheduling or some combination (for example remote schedule can be generated for the next 24 hours, but "master schedule" from which it is derives can be maintained on the mothership to cut complexity and simplify maintenance.  
     
  8. The sub-architecture of collecting information from probes and displaying them on both status of the systems (dashboard) and the events log.  Typically Webservers is used for both dashboard and for event log but there are big differences between systems in implementation details. The simplest event log can be implemented via SMTP browser. And typically SMTP browsers are more flexible that many more specialized solutions. This is actually a strong argument for using SMTP messages format.  For dashboards most advanced monitoring packages now use AJAX, some use Java, etc.  Actually finance.yahoo.com can serve an a source of inspiration for flexible and robust dashboard.
     
  9. The way of forwarding events information to the "action scripts" or other systems.    That's really determine the flexibility of the system as in the current enterprize environment no systems can fill all needs.  So ability to play nice both on horizontal and vertical integration levels is really important.

Those question make sense for users too: if you are able to answer those seven question for a particular monitoring solution that means that you pretty much understand the particular system architecture.

Not all components of the architecture need to be implemented. The most essential are probes. At the beginning everything else can be reused from other system/protocols. Even on larger scale you can assemble your own monitoring solution just by integrating of ssh, Perl/Python/PHP and Apache server.  Both HTTP and SMTP can be used as a remote communication protocols. SSH proved to adequate adequate as a agent and data delivery mechanism. You can even run proves via ssh (so called agentless solution).

The simples script that can run probes can look something like this:

$POLLING_INTERVAL=15

while(true); do

     for probe in /usr/local/monitor/probes/* ; do

           $probe >> /tmp/probe_pipe # execute probe and send output for to named pipe

     done

    sleep $POLLING_INTERVAL

done

As for representation of the results on the "mothership" server things are more complex here and creating  convenient even viewer and dashboard is a large and complex task. Still basic functionality can achieved without too much effort using apache, SMTP mail browser and some SCI scripts. Again modifiability is more important then fancy capabilities.

For example you can write a Perl script that generates a HTML table which contains the status of your devices. In such a table color bars can represent the status of the server ( for example, Green=GOOD : Yellow=LATENCY >100ms : Red=UNREACHABLE). See Set up customized network monitoring with Perl. I actually like very much http://finance.yahoo.com  interface and consider it to be a good prototype for generic system monitoring as it is customizable and fits the need of server monitoring reasonably well. For example, the concept of portfolios is directly transferable to the concept of groups of servers or locations). 

Similarly any Web-mail implementation represents an almost complete implementation of event log. If it is written in a scripting language it can be gradually adapted to the needs (instead of trying to reinvent the bicycle and writing the event log software from scratch). I would like to reiterate it again that this is a very strong argument for SMTP-based or SMTP compatible structure of events.  

Using paradigm of small reusable components are key to creation of flexible monitoring system. Even in Windows environment you now can do wonders using free Microsoft "Linux for Windows" ( SFU 3.5. ).  SSH solves pretty complex problem of component delivery and updates over secure channel, so other things equal it might be preferable to installation of often buggy and insecure (and that includes many misconfigured Tivoli installations) local agents. Actually this is not completely true: local installation of Perl can serve as a very powerful local agent with probes scripts sending information, for example to Web server. And Perl is installed by default on all major Unixes and Linux. In the most primitive way refreshing of information from probes can be implemented as automatic refresh of HTML pages in frames. But there are multiple open source monitoring packages were people worked on refining those ideas for several years and you need critically analyze them and select the package that is most suitable for you.

Still simplicity pays great dividends  in monitoring as you can add your own customarization with much less efforts.

Simplicity pays great dividends  in monitoring as you can add your own customarization with much less efforts.

I would recommend to start with a very simple package written in Perl (which every sysadmin should know ;-) and later when you get understanding of issues and compromises inherent in the design you can move up in complexity. Return on investment in fancy graphs is usually less then expected (outside presentations to executives), but your mileage may vary. If you need graphic output then you definitely need a more complex package that does the necessary heavy lifting for you. It does not make much sense to reinvent the bicycle again and again.

I would recommend to start with a very simple package written in Perl (which every sysadmin should know ;-) and later when you get understanding of issues and compromises inherent in the design you can move up in complexity.  

The key question with adopting an open source package is were you can find time and patience to evaluate them.  I hope that this page (and relevant subpages) might provide some starting points and hints on where to look.  Also with AJAX the flexibility and quality of open source Web server based monitoring consoles dramatically increased.  Again, for the capabilities of the AJAX technology you can look at finance.yahoo.com

Even if the company anticipates getting a commercial product, creating a prototype using an open source tools might pay off in the major way, giving the ability to cut though the thick layer of vendor hype into the actual capabilities of a particular commercial  application.  Even in production environment the simplicity and flexibility can compensate for less polished interface and lack of certain more complex capabilities, so in this area open source tools looks very competitive to complex and expensive commercial tools like Tivoli.  The tales about overcomplexity of Tivoli product are simply legendary and we will not repeat them here. But one lesson emerges: simple applications can complete with very complex commercial monitoring solutions for one simple reason: overcomplexity undermines both reliability and flexibility, the two major criteria for monitoring application.  Consider criteria for the monitoring application to be close to criteria for the handguns or rifles: it should not jam in sand and water.

Overcomplexity undermines both reliability and flexibility

Classification of open source  monitoring packages based on their complexity

There are several interesting open source monitoring products each of which tries "to reinvent the bicycle" in a different ways (and/or covert it into moped ;-)  by adding heartbeat, graphic and statistical packages, AJAX, improving the security  and storing events in backend database.  But again the essence of monitoring is reliability and flexibility, not necessary the availability of eye popping excel-style graphs.  Monitoring Unix system is a tool by sysadmins for sysadmins and should be useful primarily to them not for the occasional demonstration to vice-president of the company. That means that are not all open source belong to the same category and we need to distinguish between them based on implementation language and complexity of the codebase. Like in boxing there should be several categories (usage of scripting language and the size of codebase if the main create used here):

Weight Examples
Featherweight mon (Perl)
Lightweight Spong (Perl)
Middleweight Big Sister (Perl)
Super middleweight OpenSMART (Perl), ZABBIX (PHP, C)
Light heavyweight Nagios (C), OpenNMS (Java)
Heavyweight Tivoli (mainly C++, some Java), OpenView, Unicenter

Some useful features in monitoring packages

One very useful feature is concept of  server groups -- servers that have similar characteristics. That gives that ability to perform group probes and/or configuration files changes. For example HTTP servers evolved into highly specialized class of servers and can benefit from less generic scripts to monitor key components.  the same is true for DNS server, mail server and database servers.

Another useful feature is hierarchical HTML pages layout that provides a nice general picture (in most primitive form using 3-5 animated icons for "big picture" (OK, warnings, problems, serious problems, dead) with the ability of more detailed multilevel drilling "in depth" for each icon. Generic groupings of servers can include, for example: 

Dr. Nikolai Bezroukov


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index


Old News ;-)

Always listen to experts.
They'll tell you what can't be done, and why.

Then do it.

-- Robert Heinlein

[Sep 3, 2008] TraffStats 0.11.3 by Klaus Zerwes zero-sys.net

| freshmeat.net

About: TraffStats is a monitoring and traffic analysis application that uses SNMP to collect data from any enabled device. It has the ability to generate graphs (using jpgraph) with the option to compare and sum up different devices. It has a multiuser-design with rights-management and support for multiple languages.

[Aug 27, 2008] MUSCLE 4.28 by Jeremy Friesner

About: MUSCLE (Multi User Server Client Linking Environment) is an N-way messaging server and networking API. It includes client-side networking APIs for various languages, including C, C++, C#, Delphi, Java, and Python. MUSCLE lets programs communicate over a network via streams of serialized Message objects. The included server program ("muscled") lets its clients message each other and store information in its server-side hierarchical database. The database supports flexible queries via hierarchical wildcarding, and "live" updates via a subscription mechanism.

Changes: This release compiles again under Win32. A fork() vs forkpty() option has been added to the ChildProcessDataIO class. Directory and FilePathInfo classes have been added. There are other minor changes.

[Jul 17, 2008] SourceForge.net fsheal

Useful Perl-script

FSHeal aims to be a general filesystem tool that can scan and report vital "defective" information about the filesystem like broken symlinks, forgotten backup files, and left-over object files, but also source files, documentation files, user documents, and so on. It will scan the filesystem without modifying anything and reporting all the data to a logfile specified by the user which can then be reviewed and actions taken accordingly.

[Jul 16, 2008] httping 1.2.9  by Folkert van Heusden

About: httping is a "ping"-like tool for HTTP requests. Give it a URL and it will show how long it takes to connect, send a request, and retrieve the reply (only the headers). It can be used for monitoring or statistical purposes (measuring latency).

Changes: Binding to an adapter did not work and "SIGPIPE" was not handled correctly. Both of these problems were fixed.

[Jun 25, 2008] freshmeat.net Project details for check_oracle_health

About:
check_oracle_health is a plugin for the Nagios monitoring software that allows you to monitor various metrics of an Oracle database. It includes connection time, SGA data buffer hit ratio, SGA library cache hit ratio, SGA dictionary cache hit ratio, SGA shared pool free, PGA in memory sort ratio, tablespace usage, tablespace fragmentation, tablespace I/O balance, invalid objects, and many more.

Release focus: Major feature enhancements

Changes:
The tablespace-usage mode now takes into account when tablespaces use autoextents. The data-buffer/library/dictionary-cache-hitratio are now more accurate. Sqlplus can now be used instead of DBD::Oracle.

[Jun 11, 2008] check_lm_sensors 3.1.0  by Matteo Corti

About: check_lm_sensors is a Nagios plugin to monitor the values of on-board sensors and hard disk temperatures on Linux systems.

Changes: The plugin now uses the standard Nagios::Plugin CPAN classes, fixing issues with embedded perl.

[May 6, 2008] Ortro 1.3.0  by Luca Corbo

PHP based

About: Ortro is a framework for enterprise scheduling and monitoring. It allows you to easily assemble jobs to perform workflows and run existing scripts on remote hosts in a secure way using ssh. It also tests your Web applications, creates simple reports using queries from databases (in HTML, text, CSV, or XLS), emails them, and sends notifications of job results using email, SMS, Tibco Rvd, Tivoli postemsg, or Jabber.

Changes: Key features such as auto-discovery of hosts and import/export tools are now available. The telnet plugin was improved and the mail plugin was updated. The PEAR libraries were updated.

[May 6, 2008] freshmeat.net Project details for check_logfiles

Perl plugin: check_logfiles is a plugin for Nagios which checks logfiles for defined patterns

check_logfiles 2.3.3 (Default)
Added: Sun, Mar 12th 2006 15:09 PDT (2 years, 1 month ago)
Updated:
Tue, May 6th 2008 10:37 PDT (today)
About:

check_logfiles is a plugin for Nagios which checks logfiles for defined patterns. It is capable of detecting logfile rotation. If you tell it how the rotated archives look, it will also examine these files. Unlike check_logfiles, traditional logfile plugins were not aware of the gap which could occur, so under some circumstances they ignored what had happened between their checks. A configuration file is used to specify where to search, what to search, and what to do if a matching line is found.

[May 5, 2008] Plash 1.19 by mseaborn

About: Plash is a sandbox for running GNU/Linux programs with minimum privileges. It is suitable for running both command line and GUI programs. It can dynamically grant Gtk-based GUI applications access rights to individual files that you want to open or edit. This happens transparently through the Open/Save file chooser dialog box, by replacing GtkFileChooserDialog. Plash virtualizes the file namespace and provides per-process/per-sandbox namespaces. It can grant processes read-only or read-write access to specific files and directories, mapped at any point in the filesystem namespace. It does not require modifications to the Linux kernel.

Changes: The build system for PlashGlibc has been changed to integrate better with glibc's normal build process. As a result, it is easier to build Plash on architectures other than i386, and this is the first release to support AMD-64. The forwarding of stdin/stdout/stderr that was introduced in the previous release caused a number of bugs that should now be fixed.

[May 5, 2008] Tcpreplay 3.3.0 (Stable) by Aaron Turner

About: Tcpreplay is a set of Unix tools which allows the editing and replaying of captured network traffic in pcap (tcpdump) format. It can be used to test a variety of passive and inline network devices, including IPS's, UTM's, routers, firewalls, and NIDS.

Changes: This release dramatically improves packet timing, introduces full fragroute support in tcprewrite, and improves Windows/Cygwin and FreeBSD support. Additionally, a number of smaller enhancements have been made and user discovered bugs have been resolved. All users are strongly encouraged to update.

[Apr 18, 2008] SourceForge.net- openQRM

Qlusters, maker of the open source systems management software OpenQRM, last week announced on SourceForge.net that the most recent release of its OpenQRM systems management software would be the last from Qlusters.

[Apr 18, 2008] managing-virtualization An Introduction to openQRM by Kris Buytaert

Imagine managing virtual machines and physical machines from the same console and creating pools of machines booted from identical images, one taking over from the other when needed. Imagine booting virtual nodes from the same remote iSCSI disk as physical nodes. Imagine having those tools integrated with Nagios and Webmin.

Remember the nightmare you ran into when having to build and deploy new kernels, or redeploy an image on different hardware? Stop worrying. Stop imagining. openQRM can do all of this.

openQRM, which just reached version 3.1, is an open source cluster resource management platform for physical and virtual data centers. In a previous life it was a proprietary project. Now it's open source and is succeeding in integrating different leading open source projects into one console. With a pluggable architecture, there is more to come. I've called it "cluster resource management," but it's really a platform to manage your infrastructure.

Whether you are deploying Xen, Qemu, VMWare, or even just physical machines, openQRM can help you manage your environment.

This article explains the different key concepts of openQRM

openQRM consists mainly of four components:

[Mar 18, 2008] Open (Source|System) Monitoring and Reporting Tool 1.2  by Ulrich Herbst

About: OpenSMART is a monitoring (and reporting) environment for servers and applications in a network. Its main features are a nice Web front end, monitored servers requiring only a Perl installation, XML configuration, and good documentation. It is easy to write more checks. Supported platforms are Linux, HP/UX, Solaris, AIX, *BSD, and Windows (only as a client).

Changes: New checks include mqconnect, which tests if a connection to a WebSphere MQ QueueManager is possible; mysqlconnect, which tests if a connection to a MySQL database is possible; readfile, which tests if a file in a (potentially network-based) filesystem is readable; and db2lck, which tests if there are critical lock situations on your DB2 database. Many bugs were fixed. A username and password can be specified. Recursive include functionality was added for osagent.conf.xml. Major performance improvements were made.

[Feb 26, 2008] dstat

freshmeat.net

dstat is a versatile replacement for vmstat, iostat, netstat, nfsstat, and ifstat. It includes various counters (in separate plugins) and allows you to select and view all of your system resources instantly; you can, for example, compare disk usage in combination with interrupts from your IDE controller, or compare the network bandwidth numbers directly with the disk throughput (in the same interval).

Release focus: Major feature enhancements

Changes:
Various improvements were made to internal infrastructure. C plugins are now possible too. New topcpu, topmem, topio/tiobio, and topoom process plugins were added along with new innodb, mysql, and mysql5 application plugins. A new vmknic VMware plugin was added. Various fixes and improvements were made to plugins and output.

Author:
Dag Wieers [contact developer]

[Feb 20, 2008] collectd 4.3.0 by Florian Forster

About: collectd is a small and modular daemon which collects system information periodically and provides means to store the values. Included in the distribution are numerous plug-ins for collecting CPU, disk, and memory usage, network interface and DNS traffic, network latency, database statistics, and much more. Custom statistics can easily be added in a number of ways, including execution of arbitrary programs and plug-ins written in Perl. Advanced features include a powerful network code to collect statistics for entire setups and SNMP integration to query network equipment.

Changes: Simple threshold checking and notifications have been added to the daemon. The hostname can now be set to the FQDN automatically. Inclusion files have been made more flexible by allowing shell wildcards and including entire directories. The new libvirt plugin is able to collect some statistics about virtual guest systems without additional software on the guests themselves. The perl plugin has been improved a lot. It can now handle multiple threads and is now longer considered experimental. The csv plugin can now convert counter values to rates.

[Feb 1, 2008] SSH Factory 3.3

About: SSH Factory is a set of Java based client components for communicating with SSH and telnet servers. Including both SSH (Secure Shell) and telnet components, developers will appreciate the easy-to-use API making it possible to communicate with a remote server using just a few lines of code. In addition, SSH Factory includes a full-featured scripting API and easy to use scripting language. This allows developers to build and automate complex tasks with a minimum amount of effort.

Changes: The SshTask and TelnetTask classes were updated so that when the cancel() method is invoked, the underlying thread is stopped without delay. Timeout support was improved in SSH and telnet related classes. The com.jscape.inet.ipclientssh.SshTunneler class was added for use in creating local port forwarding SSH tunnels. Proxy support was improved so that proxy data is no longer applied to the entire JVM. HTTP proxy support was added.

[Jan 6th 2008] sysstat 8.0.4  by Sébastien Godard -

About: The sysstat package contains the sar, sadf, iostat, mpstat, and pidstat commands for Linux.

The sar command collects and reports system activity information. The statistics reported by sar concern I/O transfer rates, paging activity, process-related activites, interrupts, network activity, memory and swap space utilization, CPU utilization, kernel activities, and TTY statistics, among others.

The sadf command may be used to display data collected by sar in various formats. The iostat command reports CPU statistics and I/O statistics for tty devices and disks.

The pidstat command reports statistics for Linux processes. The mpstat command reports global and per-processor statistics.

Changes: This version takes account of all memory zone types when calculating pgscank, pgscand, and pgsteal displayed by sar -B. An XML Schema was added. NLS was updated, adding Dutch, Brazilian Portuguese, Vietnamese, and Kirghiz translations.

[Nov 6, 2007] freshmeat.net Project details for sarvant

sarvant analyzes files from the sysstat utility "sar" and produces graphs of the collected data using gnuplot. It supports user-defined data source collection, debugging, start and end times, interval counting, and output types (Postscript, PDF, and PNG). It's also capable of using gnuplot's graph smoothing capability to soften spiked line graphs. It can analyze performance data over both short and long periods of time.

[Nov 6, 2007] SYSSTAT tutorial

You will find here a tutorial describing a few use cases for some sysstat commands. The first section below concerns the sar and sadf commands. The second one concerns the pidstat command. Of course, you should really have a look at the manual pages to know all the features and how these commands can help you to monitor your system (follow the Documentation link above for that).

  1. Section 1: Using sar and sadf
  2. Section 2: Using pidstat

[Aug 20, 2007] OpenEsm - What is OpenESM

Zabbix-based monitoring solution. Has Tivoli event adapter written in Perl: OpenESM Universal Tivoli Enterprise Console Event Adapter

Right now, OpenESM has OpenESM for Monitoring v1.3. This release of the software is a combination of Zabbix, Apache, Simple Event Correlation and MySQL. Out of the box, we provide monitoring - warehousing of monitoring data - SLA reporting - correlation and notification. We offer the source code, but we also have a VMWARE based appliance.

[Aug 10, 2007] Argus - System and Network Monitoring Software

Another Perl-based package.  It concentrates on TCP/IP based monitoring or remote hosts.

First, thanks for writing something that seems to be clean and easy to extend. I have been using Nagios @ work for some time and am anxious to replace it.

Richard F. Rebel - whenu.com

Very nice -- we're just starting to test Argus for a small monitoring job, and so far it seems useful. Thanks for your contribution to the open source community. p>

Andre van Eyssen - gothconsultants.com
 

thanks great tool!! p

Sorin Esanu - from.ro

I am really happy with your soft, it is probably one of the best i have never found!
I own a hosting and this tool has been really cool for my business :)

Raul Mate Galan - economiza.com 

Argus works excellently. We use it to log data about all traffic through our router so that we can produce bandwidth usage statistics for customers.

Geoff Powell - lanrex.com.au

[Aug 2, 2007] Conky - a light weight system monitor for Ubuntu Linux Systems -- Ubuntu Geek

Conky is an advanced, highly configurable system monitor for X based on torsmo. Conky is an powerful desktop app that posts system monitoring info onto the root window. It is hard to set up properly (has unlisted dependencies, special command line compile options, and requires a mod to xorg.conf to stop it from flickering, and the apt-get version doesnt work properly). Most people can’t get it working right, but its an AWESOME app if it can be set up right done.

[Jul 25, 2007] monit

Dead-wood C-based application. Looks like has some ad-hoc language for description of checks.

Samba (windows file/domain server)

Hint: For enhanced controllability of the service it is handy to split up the samba init file into two pieces, one for smbd (the file service) and one for nmbd (the name service).

 check process smbd with pidfile /opt/samba2.2/var/locks/smbd.pid
   group samba
   start program = "/etc/init.d/smbd start"
   stop  program = "/etc/init.d/smbd stop"
   if failed host 192.168.1.1 port 139 type TCP  then restart
   if 5 restarts within 5 cycles then timeout
   depends on smbd_bin

 check file smbd_bin with path /opt/samba2.2/sbin/smbd
   group samba
   if failed checksum then unmonitor
   if failed permission 755 then unmonitor
   if failed uid root then unmonitor
   if failed gid root then unmonitor
 check process nmbd with pidfile /opt/samba2.2/var/locks/nmbd.pid
   group samba
   start program = "/etc/init.d/nmbd start"
   stop  program = "/etc/init.d/nmbd stop"
   if failed host 192.168.1.1 port 138 type UDP  then restart
   if failed host 192.168.1.1 port 137 type UDP  then restart
   if 5 restarts within 5 cycles then timeout
   depends on nmbd_bin

 check file nmbd_bin with path /opt/samba2.2/sbin/nmbd
   group samba
   if failed checksum then unmonitor
   if failed permission 755 then unmonitor
   if failed uid root then unmonitor
   if failed gid root then unmonitor

[Jul 30, 2007]  Monitoring Debian Servers Using Monit -- Debian Admin

Looks like dead wood: C-based application.
monit is a utility for managing and monitoring, processes, files, directories and devices on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.

Monit Features

* Daemon mode - poll programs at a specified interval
* Monitoring modes - active, passive or manual
* Start, stop and restart of programs
* Group and manage groups of programs
* Process dependency definition
* Logging to syslog or own logfile
* Configuration - comprehensive controlfile
* Runtime and TCP/IP port checking (tcp and udp)
* SSL support for port checking
* Unix domain socket checking
* Process status and process timeout
* Process cpu usage
* Process memory usage
* Process zombie check
* Check the systems load average
* Check a file or directory timestamp
* Alert, stop or restart a process based on its characteristics
* MD5 checksum for programs started and stopped by monit
* Alert notification for program timeout, restart, checksum, stop resource and timestamp error
* Flexible and customizable email alert messages
* Protocol verification. HTTP, FTP, SMTP, POP, IMAP, NNTP, SSH, DWP,LDAPv2 and LDAPv3
* An http interface with optional SSL support to make monit accessible from a webbrowser

Install Monit in Debian

#apt-get install monit

This will complete the installation with all the required software.

Configuring Monit

Default configuration file located at /etc/monit/monitrc you need to edit this file to configure your options

Sample Configuration file as follows and uncomment all the following options

## Start monit in background (run as daemon) and check the services at 2-minute
## intervals.
#
set daemon 120

## Set syslog logging with the ‘daemon’ facility. If the FACILITY option is
## omited, monit will use ‘user’ facility by default. You can specify the
## path to the file for monit native logging.
#
set logfile syslog facility log_daemon

## Set list of mailservers for alert delivery. Multiple servers may be
## specified using comma separator. By default monit uses port 25 - it is
## possible to override it with the PORT option.
#
set mailserver localhost # primary mailserver

## Monit by default uses the following alert mail format:

From: monit@$HOST # sender
Subject: monit alert — $EVENT $SERVICE # subject

$EVENT Service $SERVICE

Date: $DATE
Action: $ACTION
Host: $HOST # body
Description: $DESCRIPTION

Your faithful,
monit

## You can override the alert message format or its parts such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded on runtime. For example to override the sender:
#
set mail-format { from: monit@monitorserver.com }

## Monit has an embedded webserver, which can be used to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 2812 and
use address localhost # only accept connection from localhost
allow localhost # allow localhost to connect to the server and
allow 172.29.5.0/255.255.255.0
allow admin:monit # require user ‘admin’ with password ‘monit’

# Monitoring the apache2 web services.
# It will check process apache2 with given pid file.
# If process name or pidfile path is wrong then monit will
# give the error of failed. tough apache2 is running.
check process apache2 with pidfile /var/run/apache2.pid

#Below is actions taken by monit when service got stuck.
start program = “/etc/init.d/apache2 start”
stop program = “/etc/init.d/apache2 stop”
# Admin will notify by mail if below of the condition satisfied.
if cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
group server

#Monitoring Mysql Service

check process mysql with pidfile /var/run/mysqld/mysqld.pid
group database
start program = “/etc/init.d/mysql start”
stop program = “/etc/init.d/mysql stop”
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout

#Monitoring ssh Service

check process sshd with pidfile /var/run/sshd.pid
start program “/etc/init.d/ssh start”
stop program “/etc/init.d/ssh stop”
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout

You can also include other configuration files via include directives:

include /etc/monit/default.monitrc
include /etc/monit/mysql.monitrc

This is only sample configuration file. The configuration file is pretty self-explaining; if you are unsure about an option, take a look at the monit documentation http://www.tildeslash.com/monit/doc/manual.php

After configuring your monit file you can check the configuration file syntax using the following command

#monit -t

Once you don’t have any syntax errors you need to enable this service by changing the file /etc/default/monit

# You must set this variable to for monit to start
startup=0

to

# You must set this variable to for monit to start
startup=1

Now you need to start the service using the following command

#/etc/init.d/monit start

Monit Web interface

Monit Web interface will run on the port number 2812.If you have any firewall in your network setup you need to enable this port.

Now point your browser to http://yourserverip:2812/ (make sure port 2812 isn’t blocked by your firewall), log in with admin and monit.If you want a secure login you can use https check here

Monitoring Different Services

Here’s some real-world configuration examples for monit. It can be helpful to look at the examples given here to see how a service is running, where it put its pidfile, how to call the start and stop methods for a service, etc. Check here for more examples.

freshmeat.net Project details for Ortro

Ortro is a Web-based system for scheduling and application monitoring. It allows you to run existing scripts on remote hosts in a secure way using ssh, create simple reports using queries from databases (in HTML, text, CSV, or XLS) and email them, and send notifications of job results using email, SMS, Tibco Rvd, Tivoli postemsg, or Jabber.

Release focus: Major feature enhancements

Changes:
Support for i18n was added, and English and Italian languages are now available. More plugins were added, such as zfs scrub check, svc check, and zpool check for Solaris. Session check and tablespace check for Oracle and Check Uri were added. The mail, custom_query, ping, and www plugins were updated. There are bugfixes and improvements for the GUI such as the "add" button in the toolbar. The PEAR libraries were updated to the latest stable version.

Nagios offers open source option for network monitoring

"One of the big flaws of enterprise monitoring is monitoring without context."
Be wouldn't it be tough for IT managers sell higher-ups on the virtues on a open source monitoring tool? It might be worth the effort, said James Turnbull, author of Pro Nagio 2.0. Turnbull spoke recently with SearchOpenSource.com Assistant Editor MiMi Yeh about how Nagios is different from its counterparts in the commercial world and why IT shops should give it a chance.

What sets Nagios apart from other open source network monitoring tools like Big Brother, OpenNMS, OpenView and SysMon?

James Turnbull: I think there are three key reasons why Nagios is superior to many other products in this area -- ease of use, extensibility and community. Getting a Nagios server up and running generally only takes a few minutes. Nagios is also easily integrated and extended either by being able to receive data from other applications or sending data to reporting engines or other tools. Lastly, Nagios has excellent documentation backed up with a great community of users who are helpful, friendly and knowledgeable. All these factors make Nagios a good choice for enterprise management in small, medium and even large enterprises.

... ... ...

What tips, best practices and gotchas can you offer to sys admins working with Nagios?

Turnbull: I guess the best recommendation I can give is read the documentation. The other thing is to ask for help from the community -- don't be afraid to ask what you think are dumb questions on Wikis, Web sites, forums or mailing lists. Just remember the golden rule of asking questions on the Internet -- provide all the information you can and carefully explain what you want to know.

Are there workarounds to address the complaint that Nagios has no individual IP addresses for each host and service must be defined?

Turnbull: I think a lot of the 'automated' discovery tools are actually more of a hindrance than a help. One of the big flaws of enterprise monitoring is monitoring without context. It's all well and good to go out across the network and detect all your hosts and add them to the monitoring environment, but what do all these devices do?

You need to understand exactly what you are monitoring and why. When something you are monitoring fails, you not only know what that device is but what the implications of that failure are. Nagios is not a business context/business process tool. The fact that you have to think clearly about what you want to monitor and how means that you are more aware of your environment and the components that make up that environment.

Is there any advice you would give to users?

Turnbull: The key thing to say to new users is to try it out. All you need is a spare server and a few hours and you can configure and experiment with Nagios. Take a few problems areas you've had with monitoring and see if you can solve them with Nagios. I think you'll be pleasantly surprised.

[Jun 19, 2007] Simple System Thermometer (systher)

Systher is a small Perl tool that collects system information and presents it as an XML document. The information is collected using standard Unix tools, such as netstat, uptime and lsof.

Systher can be used in many ways:

In order to make the obtained information readable for humans, Systher is equipped with an XSLT processing stylesheet to convert the XML information into HTML. That way, the information can be made visible in a browser.

[May 29, 2007] ZABBIX 1.4 (Stable) by by Alexei Vladishev -

About: ZABBIX is an enterprise-class distributed monitoring solution for networks and applications. Native high-performance ZABBIX agents allow monitoring of performance and availability data of all operating systems.

Changes: This release introduces support of centralized distributed monitoring, flexible auto-discovery, advanced Web monitoring, and much more.

[Apr 11, 2007] freshmeat.net Project details for Unix Server Monitoring Scripts

Collection of a dozen of scripts. some in Perl.

Unix Server Monitoring Scripts is a suite that will monitor Unix disk space, Web servers via HTTP, and the availability of SMTP servers via SMTP. It will save a history of these events to diagnose and pinpoint problems. It also sends a message via email if a Web server is down or if disk usage exceeds one of two thresholds. Each script acts independently of the others.

Main Scripts

Support Scripts

Tarball of all files in the Suite

[Apr 11, 2007] Open source network monitoring -- An open alternative Andrew R. Hickey

Zenoss is built on the python-based Zope Application server. Zenoss uses NetSNMP to collect data via SNMP, data is stored in MySQL, and data is logged by RRDtool.

Feb 08, 2007  | SearchNetworking.com

Network monitoring and management applications can be costly and cumbersome, but recently a host of companies have sprung forth offering an open source alternative to IBM Tivoli, HP OpenView, CA and BMC -- and they're starting to gain traction.

The major commercial software vendors, known as the "big four," are frequently criticized for their high cost and complexity and, in some cases, are chided for being too robust -- having too many features that some enterprise users may find completely unnecessary.

Many of the open source alternatives are quick to admit that their solutions aren't for everyone, but they bring to the table arguments in their favor that networking pros can't ignore, namely low cost and ease of use.

"Open source is a huge phenomenon," Zenoss CEO and co-founder Bill Karpovich said. "It's providing an alternative for end users."

Zenoss makes Core, an integrated IT monitoring product that lets IT admins manage the status and health of their infrastructure through a single Web-based console. The latest version of the free, open source software features automated change tracking, automatic remediation, and expanded reports and export capabilities.

According to Karpovich, Zenoss software monitors complete networks, servers, applications, services, power and related environments. The biggest benefit, however, is its openness, meaning that users can tailor it to their systems any way they choose.

"It's complete enterprise IT monitoring," Karpovich said. "It's network monitoring and management, application management, and server management all through a single pane of glass."

Flexibility included

Some users have said the Tivolis and OpenViews of the world are hard to customize and very inflexible, but open source alternatives are often the opposite. They are known for their flexibility. "You can use the product as you want," Karpovich said.

Nagios developer Ethan Galstad said flexibility is a major influence on enterprises looking to move ahead with an open source monitoring project. Nagios makes open source software that monitors network availability and the states of devices and services.

"You have as an end user much more influence on the future of the feature set," Galstad said, adding that through the open source community, end users can request a feature they want, discuss the pros and cons and, in many cases, implement that feature within a relatively short time.

And for things that Nagios and other open source monitoring tools don't do, end users can tie the tools in with other solutions to create the environment they want.

"There are a lot of hooks," Galstad said.

[Apr 10, 2007] Configure OpenNMS Step By Step by saad khan

2006-07-28 (howtoforge.com)

OpenNMS is an opensource enterprise network management tool. It helps network administrators to monitor critical services on remote machines and collects the information of remote nodes by using SNMP. OpenNMS has a very active community, where you can register yourself to discuss your problems. Normally OpenNMS installation and configuration takes time, but I have tried to cover the installation and configuration part in a few steps.

OpenNMS provides the following features.

ICMP Auto Discovery
SNMP Capability Checking
ICMP polling for interface availability
HTTP, SMTP, DNS and FTP polling for service availability
Fully distributed client server architecture
JAVA Real-time console to allow moment-to-moment status of the network
XML using XSL style web access and reporting
Business View partitioning of the network using policies and rules
Graphical rule builder to allow graphical drag/drag relationships to be built
JAVA configuration panels
Redundant and overlapping pollers and master station
Repeating and One-time calendaring for scheduled downtime

The source code of OpenNMS is available for download from sourceforge.net. A production release (stable) and a development release (unstable), I have used 1.2.7 stable release in this howto. I have tested this configuration with Redhat/Fedora, Suse, Slackware, Debian and it works smoothly. I am assuming that readers already have Linux background. You can use the following configuration for other distributions too. Before you start OpenNMS installation, you need to install following packages:

jdk1.5*
tomcat 4.*
postgres 8.*
rrdtool1.2*

[Apr 10, 2007] Network Monitoring with Zabbix  by ovis

March 10, 2006 (howtoforge.com)

Zabbix has the capability to monitor just a about any event on your network from network traffic to how many papers are left in your printer. It produces really cool grahps.

In this howto we install software that has an agent and a server side. The goal is to end up with a setup that has a nice web interface that you can show off to your boss ;)
It's a great open source tool that lets you know what's out there.
This howto will not go into setting up the network but I might rewrite it one day so I really like your input on this. Much of what is covered here is in the online documentation however if you are like me new to this all this might be of some help to you.

[Apr 9, 2007] GroundWork Monitor Open Source

GroundWork unifies leading open source projects like Nagios, Ganglia, RRDtool, Nmap, Sendpage, and MySQL, and offers a wide range of support for operating systems (Linux, Unix, Windows, and others), applications, and networked devices for complete enterprise-class monitoring.

Release focus: Major feature enhancements
New features include:
 
- Incorporation of RRD data: enhancing GWMOS with other tools that use RRDs should be much easier
- Performance graphing of historical data using the RRD data
- UI improvements to give you access to information of interest, with fewer clicks, in a cleaner interface
 
In addition to the source tarball downloadable fr the SVN repository is also accessible.

GroundWork Monitor Open Source (GWMOS) 5.1-01 Bootable ISO now available: this image should boot cleanly in any ix86-compatible computer, or boot the image in a virtualized environment such as VMWare or Xen. It's a simple, super fast mechanism for evaluating GWMOS while setting up temporary monitoring quickly at any site: just pop in the CD and boot!
 
The GroundWork Monitor Open Source Bootable ISO automatically boots, logs you in, launches Firefox, and starts up GroundWork with all the associated services such as apache, Nagios(R), MySQL, and RRDtool, etc. all loaded and running. 
 
The ISO is set up with included profiles to monitor the host system and two internet sites out-of-the-box, giving you some immediate data to observe without setting up any additional devices. When booted from a physical CD, everything runs in the computer's RAM: the hard drive of the host computer is never touched.
 
Have fun, and keep us posted on your experience at http://www.groundworkopensource.com/community/

[Mar 12, 2007] Linux.com Zabbix State-of-the-art network monitoring

I have used BigBrother and Nagios for a long time to troubleshoot network problems, and I was happy with them -- until Zabbix came along. Zabbix is an enterprise-class open source distributed monitoring solution for servers, network services, and network devices. It's easier to use and provides more functionality than Nagios or BigBrother.

Zabbix is a server-agent type of monitoring software, meaning you have a Zabbix server where all gathered data is collected, and a Zabbix agent running on each host.

All Zabbix data, including configuration and performance data, is stored in a relational database -- MySQL, PostgreSQL, or Oracle -- on the server.

Zabbix server can run on all Unix/Linux distributions, and Zabbix agents are available for Linux, Unix (AIX, HP-UX, Mac OS X, Solaris, FreeBSD), Netware, Windows, and network devices running SNMP v1, v2, and v3.

[Mar 05, 2007] Open Sources InfoWorld OpenNMS bests OpenView and Tivoli while Ipswitch spreads the FUD by Dave Rosenberg

I strongly doubt that this is FUD. Looks like pretty realistic assessment of the situation.
March 05, 2007

OpenNMS bests OpenView and Tivoli while Ipswitch spreads the FUD
Filed under: Infrastructure

Chalk up another victory for OSS over proprietary. OpenNMS beat out both OpenView and Tivoli in the SearchNetworking Product Leadership Awards. I wonder if that will shut up this ridiculous FUD from Ipswitch "Don't trust your network to open source."

I let Travis take the shots at this foolishness...wake up, Ipswitch, you are late to the FUD train. Javier...anything from you?

Myth #1 - Open Source is free - According to Greene, downloading open source from the Internet and then customizing to your environment "often is not a good use of your time." Greene adds that he'd "rather pay an upfront fee for software that does what I need and doesn't have any high-cost labor attached to it."
Hmmm ... what about the fact that proprietary software (and *especially* network monitoring and management products) are often tremendously difficult to install / configure / maintain ongoing? How is being held hostage to a vendor for support / installation / configuration preferable? And how is being tied to a predetermined feature set preferable to having the ability to customize an open source approach solution to meet your environment's needs?
Myth #2 - Bug fixes are faster and less expensive in an open source environment - the second "myth" that Greene exposes around open source is the notion that there are thousands of developers sitting at home contributing labor for free. Greene suggests that most of the contributing vendors are typically employed by large vendors ? and that "even when those individuals generously offer their time for free, can you really afford to wait for one to agree with you on the urgency of action if your network is down."
Hmmm ...so it's better NOT to have access to the source code when you have a bug? It's preferable to have to open a help ticket with the vendor and wait in line? It's better NOT to have general visibility into the bugs and issues being reported by the members of the user community?
Myth #3 - Your IT staff can buy a 'raw' tool and shape it to their needs - Greene's last point is that the industry has moved away from the "classic open source" model where folks download raw open source and customize to their needs - and to more of a commercial open source model, where organizations are leveraging open source distribution as a way to sell services.

Feedback:

Hi,

Not a very valid comparison as there are many products out there that do a far better job the HP OpenView or OpenNMS or Tivoli.

If you are an OSS type supporter in terms of your business model it would make finacial sense to use OpenNMS but in terms of best of breed this OSS product does not come close. Some might argue that using OSS software will cost you more as there are very few people who know how to use it and I mean use it, not some Linux script kiddy but someone with enterprise management experience. These days its not about implementation its about integration and the comparison should be about how nice does it play with the rest of my environment.

I don't see EMC SMARTS in the comparison list.

I am all for OSS software as long as it is not chosen as the cheapest option but rather as the best of breed option. As for NMS commercial software, I use it day in and day out and would like to see a more open model in terms of functionality and development.

Take a leaf out of SUN book, Open Solaris has proven to be a good business model for a commercial company and the benefits will be seen for years to come.

Posted by: James at March 8, 2007 04:34 AM

[Mar 1, 2007] Network and IT management platforms 2007 Products of the Year SearchNetworking.com

GOLD AWARD:
OpenNMS

The network is the central nervous system of the modern enterprise -- complex and indispensable. Keeping tabs on how that enterprise is functioning requires a sophisticated "big picture" management system that can successfully integrate with other network and IT products. Unfortunately, many products in this category are just too expensive for any but the largest companies (with the most generous IT budgets) to afford.

Enter OpenNMS, the gold medal winner in our network and IT management platforms category. The open source enterprise-grade network management system was designed as a replacement for more expensive commercial products such as IBM Tivoli and HP OpenView. It periodically checks that services are available, isolates problems, collects performance information, and helps resolve outages. And it's free.

In our Product Leadership survey, readers praised OpenNMS for being easy to customize, easy to integrate and -- of course -- free. These attributes are all characteristic of any open source product. Because of its open source nature, OpenNMS has a community of developers contributing to its code. The code is open for anyone to view or adapt to suit individual needs.

Consequently, users can customize OpenNMS in ways that are limited only by their abilities and imagination -- not by licensing restraints. One reader said, "It is an open source product, so we can customize it easily." With traditional proprietary products, it may be difficult to find one piece of software that can manage the network effectively for every enterprise, but OpenNMS was designed to allow users to add management features over time. Its intentional compatibility with other open source (and proprietary) products provides seamless integration, requiring less piecemeal coding to fit things together.

Users of OpenNMS can also take advantage of the user community accessible through the OpenNMS Web site for answers to questions and help in troubleshooting problems. While one survey respondent remarked that "open source is advancing slowly to address some of the manageability issues," members of the OpenNMS mailing list are quick to answer any request with a friendly, knowledgeable response. For companies whose IT personnel are not afraid of an unconventional approach, the open source community provides support that is just as reliable as that of a commercial vendor -- and in many cases, more helpful.

But OpenNMS is not a "you get what you pay for" product, either. Readers said it "works great" and "significantly helped our network's bandwidth and packet management and controlled 'rogue' clients." Others found that it "works fine for a small business network" and is an "outstanding option." Even those whose experience was less positive found that any challenges were surmountable, such as the reader who said, "Since it's free, it was worth the effort."

Sys Admin Unix Monitoring Scripts by Damir Delija        

It is impossible to do systems administration without monitoring and alerting tools. Basically, these tools are scripts, and writing such monitoring scripts is an ancient part of systems administration that's often full of dangerous mistakes and misconceptions.

The traditional way of putting systems together is very stochastic and erratic, and that same method is often followed when developing monitoring tools. It is really rare to find a system that's been properly planned and designed from the start. The usual approach when something goes wrong is just to patch the immediate problem. Often, there are strange results from people making mistakes when they're in a hurry and under pressure.

Monitoring scripts are traditionally fired from root cron and send results by email. These emails can accumulate over time, flooding people with strange mails, creating problems on the monitored system, and causing other unexpected situations. Such scenarios are often unavoidable, because few enterprises can afford better measures than firefighting. In this article, I will mention a few tips that can be helpful when developing monitoring scripts, and I will provide three sample scripts.

What is a Unix Monitoring Script?

A monitoring tool or script is part of system management and to be really efficient must be part of an enterprise-wide effort, not a standalone tool. Its purpose is to detect problems and send alerts or, rarely, to try to correct the problem. Basically, a monitoring/alerting tool consists of four different parts:

1. Configuration -- Defines the environment and does initializations, sets the defaults, etc.

2. Sensor -- Collects data from the system or fetches pre-stored data.

3. Conditions -- Decides whether events are fired.

4. Actions -- Takes action if events are fired.

If these elements are simply bundled into a script without thinking, the script will be ineffective and un-adaptable. Good tools also include an abstraction layer added to simplify things later, when modifications are done.

To begin, we have to set some values, do some sanity checks, and even determine whether monitoring is allowed. In some situations, it is good to stop monitoring through the control file to avoid false notifications, during maintenance for example. This is all done in the configuration part of the script.

The script collects values from the system -- from monitored processes or the environment. This data collecting is done by the sensor part. This data can be the output of an external command or can be fetched from previously stored values, such as the current df output or previously stored df values (see Listing 1).

The conditions part of the script defines the events that are monitored. Each condition detects whether an event has happened and whether this is the start or the end of the event (arming or rearming). This process can compare current values to predefined limits or to stored values, if we are interested in rates instead of absolute values. Events can also be based on composite or calculated values, such as "Average idle from sar for the last 5 minutes is less than 10%" (see Listing 2).

Results at this level are logical values usually presented as some kind of empty/not-empty string, to be easily manipulated in later usage. The key is to have some point in the code where the clear status of the event is defined, so branching can be done simply and easily.

Actions consist of specific code that is executed in the context of a detected event, such as storing new values, sending traps, sending email, or performing some other automatically triggered action. It is good to put these into functions or separate scripts, since you can have similar actions for many events. Usually we want to send email to someone or send a trap. It is almost always the same code in all scripts, so keeping it separate is a good idea.

It is important to add some state support. We are not just interested in detecting limit violations; if that were the case, we would be flooded with messages. Detecting state changes can reduce unwanted messaging. When we define an event in which we are interested, we actually want to know when the event happened and when it ended -- that is, when the monitored values passed limits and when they returned. We are not interested in full-time notification that the event is still occurring. Thus, we need to know the change of the event state and value of the monitored variable.

State support is not necessary if there is some kind of console that can correlate notifications. In the simplest implementations, like a plain monitoring script, avoiding message flooding directly in the script itself is useful.

Each event must have a unique name and severity level. Usually, three levels of severity are enough, but sometimes five levels are used. It is best to start with a simple model such as:

Info -- Just information that something has happened
Warning -- Warning of possible dangerous situation
Fatal -- Critical situation

IBM Redbooks

A Practical Guide for Resource Monitoring and Control (RMC), SG24-6615-00 -- http://www.redbooks.ibm.com/redbooks/SG246615.html

Managing AIX Server Farms, SG24-6606-00 Redbook --http://www.redbooks.ibm.com/redbooks/SG246606.html

Books

Frisch, Ćleen. Essential System Administration, 3rd Edition, August 2002. O'Reilly & Associates. ISBN: 0-596-00343-9.

Powers, Shelley, J. Peek, T. O'Reilly, and M. Loukides. Unix Power Tools, 3rd Edition, October 2002. O'Reilly & Associates. ISBN: 0-596-00330-7.

Blank-Edelman, David. Perl for System Administration, 1st Edition, July 2000. O'Reilly & Associates. ISBN: 1-56592-609-9.

Links

Stokely Consulting -- http://www.stokely.com/unix.sysadm.resources/index.html

Big Brother Archive -- http://www.deadcat.net/browse.php

BigAdmin Scripts -- http://www.sun.com/bigadmin/scripts/

Shelldorado -- http://www.shelldorado.com

Damir Delija has been a Unix system engineer since 1991. He received a Ph.D. in Electrical Engineering in 1998. His primary job is systems administration, education, and other system-related activities.

Sys Admin Automating UNIX Security Monitoring by Robert Geiger and John Schweitzer

All of the scripts listed in this article are meant to be run from cron on a regular basis -- daily or hourly, depending on the routine in question -- with the output going to either email or to the systems administrator's pager. However, none of the things described in this article are foolproof. UNIX security mechanisms are only relevant if the root account has not been compromised. For example, scripts run through crontab can be easily disabled or modified if the attacker has attained root access, and most log files can be manipulated to cover tracks if the intruder has control over the root account.

[Feb 23, 2007] Re [SAGE] Mon vs BB vs


I tested out OpenNMS but found Nagios to be easier to get running, plus 
OpenNMS was very linux centric last I checked. Which is annoying since it 
looks like it's just a java application, no reason it couldn't be made to run 
elsewhere.

Anyway, as far as I can tell Nagios does everything OpenNMS does and more. As 
a network monitoring tool it's been great, I have it polling all of our SNMP 
enabled devices and receiving traps. With the host and service dependencies 
it becomes easier to see if the cause of an application failure is software, 
hardware, or network based. 

That being said I would still love to play with OpenNMS if anyone has a way to 
get it to work under FreeBSD.


On Thursday 10 October 2002 04:52 pm, Alan Horn wrote:
> On 10 Oct 2002, Stephen L Johnson wrote:
> >If your are mainly monitoring networks, network monitoring tools are
> >better. The non-commercials tools, that I have looked at are OpenNMS and
> >Naigos (NetSaint). These tools are designed monitor network mainly.
> >Systems monitoring can be added as well.
>
> Nagios is primarily for monitoring network _services_ in it's default
> install (via the nagios plugins you get with the tool). Not for monitoring
> network devices (although it'll do that too). I just wanted to clarify
> that since I read this as 'nagios for monitoring cisco kit etc...' By
> network services I mean stuff like DNS, webservers, smtp, imap, etc... All
> the services that you probably want to monitor first of all when you set
> out to do thia.
>
> Adding systems monitoring with nagios is very nice indeed, using the NRPE
> (Nagios Remote Plugin Executor) module, you can run whatever arbitrary
> code you desire on your system, and return results back to the monitor. I
> have it monitoring diskspace on critical fileservers, health of some
> custom applications etc...
>
> I've used nagios, nocol, and big brother (many many moons ago.. it's
> evolved since I used it). Nagios most recently. Nagios takes a bit of work
> to setup due to its flexibility, but I've found it to be the best for my
> needs in both a single and multi-site situation (we have branch offices
> located around the world via VPN which need to be monitored).
>
> And the knowledge of network topology is great too !
>
> Hope this helps.
>
> Cheers,
>
> Al

[Feb 23, 2007] Re: Starting

David Nolan
Fri, 08 Sep 2006 05:49:55 -0700

On 9/3/06, Toddy Prawiraharjo <toddyp@...> wrote:

> Hello all,
>
> I am looking for alternative to Nagios (or should i stick with it? need
> opinions pls), and saw this Mon.

The choice between Mon and other OSS monitoring systems like Nagios,
Big Brother or any of the others is very much dependent upon your
needs.

My best summary of Mon is that its monitoring for sysadmins.  Its not pretty, its not designed for management, its designed to allow a sysadmin to automate the performance monitoring that might otherwise be done ad-hoc or with cron jobs.  It doesn't trivially provide the typical statistics gathering that many bean-counters are looking for, but its extensible and scalable in amazing ways.  (See recent posts on this list about one company deploying a network of 2400 mon servers
and 1200 locations, and my mon site which runs 500K monitoring tests a day, some of those on hostgroups with hundreds of hosts.)

> Btw, i need some auto-monitoring tools to monitor basic unix and windows > based services, such as nfs, sendmail, smb, httpd, ftp, diskspace, etc.

> I love perl so much, but then its been long time since it's been updated. Is it still around and supported?

If you love perl Mon may be perfect for you, because if there is a feature you need you can always send us a patch. :)

Its definitely still around and supported.  (I just posted a link to a mon 1.2.0 release candidate.)  There hasn't been a lot of updates to the system in the last couple of years, but that's in part because the system is pretty stable as-is. There are certainly some big-picture changes we would like to do, but none of the current developers have had pressing reasons to work on the system.  Personally, most of my original patches were based on CMU's needs when we did our Mon deployment, and since that time no major internal effort has been spent on extending the system.  A review process of our monitoring systems is just starting now and that may result in either more programmer time being allocated to Mon or CMU might move away from Mon to some other system.  (Obviously I'd be unhappy with that result, but I would continue to work with Mon both personally and in my consulting work.)

> Any good reference on the web interface? (the one from the site, mon.lycos.com is dead).

I believe the most commonly used interface is mon.cgi, maintained by Ryan Clark, available at http://moncgi.sourceforge.net/

An older version of mon.cgi is included in the mon distribution.

> And most importantly, where to
> start? (any good documentation as starting point on how to use this Mon)
>

Start by reading the documentation, looking at the sample config file, and experimentation.  A small installation can be setup in a matter of minutes.  Once you've done a proof-of-concept install you can decide if Mon is right for you.

-David

[Feb 23, 2007] [BBLISA] GPL system monitoring tools (alternatives to nagios)

Mon, 27 Nov 2006 18:31:13 -0800

I'm looking for suggestions for any GPL/opensource system monitoring
tools that folks can recommend.

FYI we've been using Nagios for about 6 months now with mixed results.
While i