|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Unix System Monitoring
|
Any sufficiently large and complex monitoring package
written in C contains buggy and ad hoc implementation of at least 66%
of Perl interpreter.
Reintrpretation of P. Greenspun quote about Lisp
|
A computer system consists of processors, memory, and I/O devices. Users log
in to the system and run applications or jobs. Some application are running from
RC scripts, some from cron or other scheduler. As with any complex system
a lot of things can go wrong.
If you manage large number of systems it is important for your sanity
to see the situation on existing boxes via dashboard and integrated alert stream.
While monitoring is not a one-size-fits-all solution a lot of tasks can be standardized
and instead of reinventing the bicycle adopted from or with some existing system.
Situation with monitoring in most large organizations is far from rational. Most
large organizations use many incompatible tools to monitor and troubleshoot different
components of the IT infrastructure. For example one tool is used to monitor servers,
another applications and JVMs, a specialized tool for Oracle, and yet another
overlapping for monitoring of network devices. Often such tools contain redundant,
expensive components which don’t play well with one another. Often some acquired
or inherited via acquisitions monitoring tools are not used at all or in another
extreme discarded despite being superior to existing solutions.
The key question in sound approach to monitoring is the selection of the level
of complexity that is optimal for each component and each part of IT infrastructure.
I saw many cases in which companies used an expensive package for what is little
more then ICMP monitoring. If complexity level exceed the threshold the monitoring
system usually became stagnant and people are reluctant to extend and adapt it to
new tasks. Instead of solution it can became a part of the problem. This is typical
situation on the level of complexity typical for Tivoli, CA Unicenter and
HP Open View. For example, writing rules for Tivoli TEC requires some understanding
Prolog (which is rare among system admins) as well as Perl or other scripting language
(which is common). Adaptability means that simpler open source monitoring
systems has tremendous advantages over the complex one in the long run.
I suspect that the level of complexity should be much lower that the complexity
of monitoring solutions used in most of current implementations demonstrate and
that raising above this level is counterproductive. In other words most organizations
suffer from the feature creep in monitoring system in the same way they are suffering
from feature creep in regular applications.
We can define several categories of monitoring:
-
Monitoring System Configuration Changes This category includes monitoring
for changes in hardware and software configurations that can be caused by an
operating system upgrade, patches applied to the system, changes to kernel parameters,
or the installation of a new software application.
The root cause of system problems can often be traced back to an inappropriate
hardware or software configuration change. Therefore, it is important to keep
accurate records of these changes, because the problem that a change causes
may remain latent for a long period before it surfaces. Adding or removing hardware
devices typically requires the system to be restarted, so configuration changes
can be tracked indirectly (in other words, remote monitoring tools would notice
system status changes).
However, software configuration changes, or the installation of a new application,
are not tracked in this way, so reporting tools are needed. Also, more systems
are becoming capable of adding hardware components online, so hardware configuration
tracking is becoming increasingly more important.
-
Monitoring System Faults. After
ensuring that the configuration is correct, the first thing to monitor is the
overall condition of the system. Is the system up? Can you talk to it, ping
it, run a command? If not, a fault may
have occurred. Detecting system problems ranges from determining whether the
system is up to determining whether it is behaving properly. If the system either
isn't up or is up but not behaving properly, then you must determine which system
component or application is having a problem.
-
Monitoring System Resource Utilization. For an
application to run correctly, it may need certain system resources such as the
amount of CPU or I/O bandwidth an application is entitled to use during a time
interval. Other examples include the number of open files or sockets, message
segments, and system semaphores that an application has. Usually an application
(and operating system) has fixed limits for each of these resources, so monitoring
their use is important. If they are exhausted, the system may no longer function
properly. Another aspect of resource utilization is studying the amount of resources
that an application has used. You may not want a given workload to use more
than a certain amount of CPU time or fixed amount of disk space. Some resource
management tools, such as quota, can help
with this.
-
Monitoring System Performance. Monitoring the
performance of system resources can help to indicate problems with the operation
of the system. Bottlenecks in one area usually impact system performance in
another area. CPU, memory, and disk I/O bandwidth are the important resources
to watch for performance bottlenecks. establish baselines you should monitor
system during typical usage periods. Understanding what is "normal" helps
to identify when system resources are scares during a particular periods
(for example "rush hours"). Resource management
tools are available that can help you to allocate system resources among applications
and users.
-
Monitoring System Security. System's availability can be impacted
through unauthorized use. Performance and resource controls are not useful if
the system is used for the wrong purposes. The value of security tools
is often overstated but in small doses they can be useful not harmful. for example
it is easy to monitor for world writable files and wrong permissions on home
directories and key system directories. There no reason not to implement that.
In many cases static (configuration settings) security monitoring can be adapted
from hardening package such as Titan.
-
Monitoring system logs. This is an integral area that overlaps with
each and every areas described above but still deserve to be treated as a separate.
Usually log monitoring is done along with the integration of log stream on the
special log server.
"The big troika" - OpenView, Tivoli,
Unicenter dominate large enterprise space. They are very complex and expensive
products, products which require dedicated staff and
provide relatively low return on investment. Especially taking
into account the TCO which dramatically increases with each new version due to overcomplexity.
In a way dominant vendors painted themselves into a corner by raising the complexity
far above the level normal sysadmin can bear.
My experience with big troika is mainly in "classic" Tivoli before Candle (aka
Tivoli Monitoring 6.1) and Micromuse ( aka Netcool) acquisitions, but still I think
this statement reflects the reality of all "big vendors ESM products". Also
despite new versions the technologies used are often outdated: those are products
with obsolete of semi-obsolete architecture and sometimes obscure, difficult to
understand and debug protocols used. In the latter case they are often the
source of hidden security vulnerabilities . In a way the agents on each server
are hidden backdoors not the different from backdoors used for "zombification"
of servers by hackers.
As the same time and there is an objective need for such vendors to increase
complexity with each version which cannot be resisted: the feeling of insecurity
and the desire to protect and extend their franchise and get bigger and bigger bonuses.
In a way this is very similar pressures that destroyed the US investment banks in
recent "subprime mess" Due to this vendors are logically pushed by events into the
road which inevitably leads to converting their respective systems into monsters.
They still can be very scalable (Tivoli definitely is) despite overcomplexity, but
the flexibility of the solutions and the quality of interface suffers greatly.
And only due to high quality and qualification of tech support (I can attest that
IBM Tivoli tech support is really excellent) those system can be maintained and
remain stable in a typical enterprise.
That opens some space for open source monitoring solutions which can be much
simpler and more rely on established protocols (for example, HTTP and SSH). Important
fact which favors simpler solutions is that in any organization, usefulness of the
monitoring package is limited to the ability of personnel to tweak it to the environment.
Packages with tuning that are above the head of the personnel can actually be harmful
(Tivoli Monitoring 5.1 with its complex API and JavaScript-based extensions is a
nice example of the genre)
In any organization, usefulness of the monitoring
package is limited to the ability of personnel to tweak it to the
environment.
Packages with the complexity of tuning that are above the head of
the personnel can actually be harmful (Tivoli Monitoring 5.1 with
its complex API and JavaScript-based extensions is a nice example
of the genre) |
Since adequate (and very expensive) training for those products is often skipped
as an overhead, it' not surprising that many companies will never get more than
the most basic functionality for a very expensive (and theoretically very capable)
product. And basic functionality is better provided by simple free packages. So
extremes meet. This situation might be called a system monitoring paradox.
That's exactly what makes Tivoli consultants (and OpenView consultants) happy.
|
System monitoring paradox is that both expensive
and cheap monitoring solution usually provide very similar quality of
monitoring and both have adequate capabilities for a typical large company
|
It costs quite a lot to maintain and customize tools like Tivoli or Open view
in large enterprise environment where money for this are readily available. Keeping
good monitoring specialist on the job is also a problem as once person become really
good in scripting they tend to move to other, more interesting areas, like
web development. There is nothing too exiting in daily work of monitoring
specialist and after a couple of years the usual feeling is that his or her IQ is
underutilized. So many people move on. The strong point of big troika is support
and availability of professional services but the costs are prohibitive. But
it is important to understand that complex products to a certain extent reflect
the environmental complexity and not all tasks can be performed by simple products
although 80% might be s a reasonable estimate.
That means that the $3.6 billion market for enterprise
system management software is ripe for competition from products that utilize scripting
languages instead of trying to foresee each and every need the enterprise can have.
Providing simple scripting framework for writing probes and customizing interface
lower the barrier of entry and is not in the interests of large vendors as it can
lower their profits. They cannot compete in this space. What is interesting
is that scripting-based monitoring solutions are pretty powerful and proved to be
competitive with much more complex "compiled" or Java-based offerings. There
are multiple scripting-based offerings from startups and even individual developers
which can deliver 80% of the benefits of big troika products for 20% of cost
of less and without millions of lines of Java code, an army of consultants and IT
managers and annual conferences for big brass.
Scripting languages beat Java in area of monitoring hands down and if a monitoring
product is written in a scripting language this should be considered to be a strategic
advantage. Advantage that is worth to fight for.
| Scripting languages beat Java in area of monitoring hands down
and if a monitoring product is written in a scripting language this
should be considered to be a strategic advantage. Advantage that
is worth to fight for. |
First of all because codebase is more maintainable and flexible. Integration
of plug-ins written in the same scripting language is simpler. Debugging problems
is much simpler. Everything is simpler. But at the same time I would like
to warn that open source is not a panacea and it has its own (often hidden) costs
and pitfalls. In a corporate environment other things equal you are better off with
open source solution behind which there is at least a start-up. Badly configured
or buggy monitoring package can be a big security risk. In no way that means that,
say, Tivoli installations in real world are secure, but they are more obscure and
security via obscurity works pretty well in a real world ;-)
Let's reiterate the key problems with monster, "enterprise ready", packages:
-
Licensing and Maintenance Costs. One of the most common problems is
the cost of license. Often "the big troika" is too expensive and just priced
smaller companies out of the market. But the picture is more complex then that.
For example IBM sells starting package for ITM 6.1 really cheap and this is
actually full blown enterprise class monitoring system that is just limited
to few nodes. But nodes can be aggregators of events based on some open source
package, not individual servers so this limitation can be partially bypassed.
What you get is the first class GUI and robust correlation engine.
The second import cost is maintenance contracts. Service provided is usually
good or excellent but it costs money. Also due to the level of complexity (or
more correctly level of overcomplexity ;-) for some tasks you need expensive
consultants and those costs in five years can well be comparable with the cost
of the license (see below).
-
Overcomplexity. Often smaller and medium size companies do
not want all the "Christmas tree" of features and wants slimmer, more flexible
and more focused on their needs product. They also cannot afford using expensive
consultants on a regular basis (which is often the way Tivoli is deployed and
maintained so upfront costs is just the tip of the iceberg). Due to IT
outsourcing it is not clear if usage of consultant is the best path as in the
absence of loyal staff there is not countervailing force in complex technical
negotiations and company are bound to overpay or buy unnecessary services and
solutions. I know several companies that use TEC but paradoxically do not have
specialists to write rules for TEC (TEC uses
Prolog as a rules language).
That situation makes TEC inferior to simpler packages. Also there are companies
which use Tivoli monitoring exclusively to monitor disk space on the servers
the way even a simple Perl script that runs via cron can accomplish much better.
-
Absence of insurance in case of abrupt changes of the course by the vendor.
Tivoli users now understand that the fact the TEC is a close source can
cost them substantial amount of money. Even if they do not want to move to Micromuse
style solution IBM will drag them. That would be good if the new solution is
a clearly superior to the old. But this is not the case.
Architectural Issues
If you are designing a monitoring solution you need to solve almost a dozen
of pretty complex design problem. The ingenuity and flexibility the solution for
each of those problems represent the quality of architecture. Among those that we
consider the most important are:
-
Probe architecture. Probe architecture should provide a simple
and flexible way to integrate existing capability of the system (especially
existing system utilities including classic Unix utilities ) and convert then
into usable alerts. Perl is the simplest way to achieve that as it blends very
well into Unix environment is often is used by system administrators for other
purposes, so they do not need to learn yet another language. Probes can communicate
two major things: status information (for example the current CPU utilization
is 0.2) and event information (for example disk utilization fro a particular
partition exceeded the given threshold).
Often the interface with the "mothership" is delegated to a special agent which
contains all the complex machinery necessary for transmitting event to the event
server using some secure or not very secure protokol. In this case probes communicate
with the agent.
In the simplest case the agent can be a stand alone executable that is invoked
by each probe via pipe (sendevent type of the agent). In this case HTML/XML
based protocols are natural (albeit more complex and more difficult to parse
then necessary), although keyword-value pairs are also pretty competitive and
much simpler. For keyword value pairs you need a special long multiline value
option, though. Unix provides the necessary syntax in "here" documents.
For efficiency an agent can be coded in C, although on modern machines this
is not strictly necessary. In case of HTML any command like browser like lynx
can be used as a "poor man agent". In this case the communication with the server
needs to be organized via forms.
SMTP mail, as bad as it is, also proved to be a viable communication channel
for transmitting events from probes to the "mothership".
- The structure of the event. This structure of event should be convenient
for transmitting of information from the probe and usually consist of
a certain number of predefined fields (hostname, timestamp, name of the probe,
etc) and any number of user definable fields. Generally C-structure based
events are flexible enough for description of a large variety of events and
also convenient for representing events hierarchically so that you can reuse
more basic events for creation of derivatives (inheritance). The ability
of create new event using inheritance is really convenient. In this sense BAROC
is not that bad (although fixed length strings sucks badly and should be replaced
with variable length strings. Description of event also should provide
for default values (like in BAROC) and possibly tag fields can be ignored in
duplicate detection.
- Protocol for communication between problems and "mothership". The
reliability and the cost of communicating between probes and "mothership"
are important. Reuse of existing protocol such as HTTP, SMTP or SNMP or
some combination provides some important advantages over the reinventing the
wheel.
- The protocol for delivery of probes to remote locations and running them
( agents or protocols like ssh in case of "agentless" design)
- Aggregation and pre-filtering of events Those are the simplest type
of correlation and due to its important it should be considered separately and
designed and implemented on a different level than full fledged correlation
solution. Here regular expression capabilities are more then enough and you
do not need anything more complex. The common solution, used, for example, in
Tivoli is to use gateways for this purpose. Gateways can be just
another instance of the same "master system" or different more specialized version".
One simple and effective way of aggregation is converting events into "tickets":
groups of events that corresponds to a serviceable entity (for example a server)
- Event correlation
engine This engine should provide a flexible way to filter and correlate
events. This is a pretty complex part of the monitoring solution as correlation
engine operated on the "window" of current events and that windows should
be constantly updated and provide view of certain number of past events in a
round robin fashion. Perl arrays are a good approximation of functionality
required for such an event window (updatable slots, the order is important,
there should be capability of deletion after certain amount of time even if
the event was not displaced by more current events. The simplest correlation
engines are usually SQL based and they operate against a special database that
is totally memory based. More complex are Prolog-based. I do not see why
a scripting language like Perl cannot be used as correlation engine with a proper
library.
- The way to schedule and run remote probes with the ability to rerun failed
(can be done via local scheduling and, say, ssh protocol or on the local
host with possibility of remote updates of schedules, or remote scheduling or
some combination (for example remote schedule can be generated for the next
24 hours, but "master schedule" from which it is derives can be maintained on
the mothership to cut complexity and simplify maintenance.
- The sub-architecture of collecting information from probes and displaying
them on both status of the systems (dashboard) and the events log.
Typically Webservers is used for both dashboard and for event log but there
are big differences between systems in implementation details. The simplest
event log can be implemented via SMTP browser. And typically SMTP browsers are
more flexible that many more specialized solutions. This is actually a strong
argument for using SMTP messages format. For dashboards most advanced
monitoring packages now use AJAX, some use Java, etc. Actually finance.yahoo.com
can serve an a source of inspiration for flexible and robust dashboard.
- The way of forwarding events information to the "action scripts" or other
systems. That's really determine the flexibility of the
system as in the current enterprize environment no systems can fill all needs.
So ability to play nice both on horizontal and vertical integration levels is
really important.
Those question make sense for users too: if you are able to answer those seven
question for a particular monitoring solution that means that you pretty much understand
the particular system architecture.
Not all components of the architecture need to be implemented.
The most essential are
probes. At the beginning everything else can be reused from other system/protocols.
Even on larger scale you can assemble your own monitoring solution just by integrating
of ssh, Perl/Python/PHP and Apache server. Both HTTP and SMTP can be used
as a remote communication protocols. SSH proved to adequate as a agent and data
delivery mechanism. You can even run proves via ssh (so called agentless solution).
The simples script that can run probes can look something like this:
$POLLING_INTERVAL=15
while(true); do
for probe
in /usr/local/monitor/probes/* ; do
$probe >> /tmp/probe_pipe # execute probe and send output for to named pipe
done
sleep $POLLING_INTERVAL
done
As for representation of the results on the "mothership" server things are more
complex here and creating convenient event viewer and dashboard is a large
and complex task. Still basic functionality can achieved without too much effort
using apache, SMTP mail browser and some SCI scripts. Again modifiability is more
important then fancy capabilities.
For example you can write a Perl script that generates a HTML table which contains
the status of your devices. In such a table color bars can represent the status
of the server ( for example, Green=GOOD : Yellow=LATENCY >100ms : Red=UNREACHABLE).
See
Set up customized network monitoring with Perl. I actually like very much the
design of http://finance.yahoo.com
interface and consider it to be a good prototype for generic system monitoring as
it is customizable and fits the need of server monitoring reasonably well. For example,
the concept of portfolios is directly transferable to the concept of groups of servers
or locations).
Similarly any Web-mail implementation represents an almost complete implementation
of event log. If it is written in a scripting language it can be gradually adapted
to the needs (instead of trying to reinvent the bicycle and writing the event log
software from scratch). I would like to reiterate it again that this is a very strong
argument for SMTP-based or SMTP compatible structure of events.
Using paradigm of small reusable components are key to creation of flexible monitoring
system. Even in Windows environment you now can do wonders using free Microsoft
"Linux for Windows" (
SFU 3.5. ). SSH solves pretty complex problem of component delivery and
updates over secure channel, so other things equal it might be preferable to installation
of often buggy and insecure (and that includes many misconfigured Tivoli installations)
local agents. Actually this is not completely true: local installation of Perl can
serve as a very powerful local agent with probes scripts sending information, for
example to Web server. And Perl is installed by default on all major Unixes and
Linux. In the most primitive way refreshing of information from probes can be implemented
as automatic refresh of HTML pages in frames. But there are multiple open source
monitoring packages were people worked on refining those ideas for several years
and you need critically analyze them and select the package that is most suitable
for you.
Still simplicity pays great dividends in monitoring as you can add your
own customarization with much less efforts.
| Simplicity pays great dividends in monitoring as you can
add your own customarization with much less efforts. |
I would recommend to start with a very simple package written in Perl (which
every sysadmin should know ;-) and later when you get understanding of issues and
compromises inherent in the design you can move up in complexity. Return on investment
in fancy graphs is usually less then expected (outside presentations to executives),
but your mileage may vary. If you need graphic output then you definitely need a
more complex package that does the necessary heavy lifting for you. It does not
make much sense to reinvent the bicycle again and again.
| I would recommend to start with a very simple package written
in Perl (which every sysadmin should know ;-) and later when you get
understanding of issues and compromises inherent in the design you can
move up in complexity. |
The key question with adopting an open source package is were you can find time
and patience to evaluate them. I hope that this page (and relevant subpages)
might provide some starting points and hints on where to look. Also with AJAX
the flexibility and quality of open source Web server based monitoring consoles
dramatically increased. Again, for the capabilities of the AJAX technology
you can look at finance.yahoo.com.
Even if the company anticipates getting a commercial product, creating a prototype
using an open source tools might pay off in the major way, giving the ability to
cut though the thick layer of vendor hype into the actual capabilities of a particular
commercial application. Even in production environment the simplicity
and flexibility can compensate for less polished interface and lack of certain more
complex capabilities, so in this area open source tools looks very competitive to
complex and expensive commercial tools like
Tivoli. The tales about overcomplexity
of Tivoli product are simply legendary and we will not repeat them here. But one
lesson emerges: simple applications can complete with very complex commercial monitoring
solutions for one simple reason: overcomplexity undermines both reliability
and flexibility, the two major criteria for monitoring application.
Consider criteria for the monitoring application to be close to criteria for the
handguns or rifles: it should not jam in sand and water.
|
Overcomplexity undermines both reliability and
flexibility
|
Classification of open source monitoring packages based on their complexity
There are several interesting open source monitoring products each of which tries
"to reinvent the bicycle" in a different ways (and/or covert it into moped ;-)
by adding heartbeat, graphic and statistical packages, AJAX, improving the security
and storing events in backend database. But again the essence of monitoring
is reliability and flexibility, not necessary the availability of eye popping excel-style
graphs. Monitoring Unix system is a tool by sysadmins for sysadmins and should
be useful primarily to them not for the occasional demonstration to vice-president
of the company. That means that are not all open source belong to the same category
and we need to distinguish between them based on implementation language and complexity
of the codebase. Like in boxing there should be several categories (usage of scripting
language and the size of codebase if the main create used here):
Some useful features in monitoring packages
One very useful feature is concept of server groups -- servers
that have similar characteristics. That gives that ability to perform group probes
and/or configuration files changes. For example HTTP servers evolved into highly
specialized class of servers and can benefit from less generic scripts to monitor
key components. the same is true for DNS server, mail server and database
servers.
Another useful feature is hierarchical HTML pages layout that provides a nice
general picture (in most primitive form using 3-5 animated icons for "big picture"
(OK, warnings, problems, serious problems, dead) with the ability of more detailed
multilevel drilling "in depth" for each icon. Generic groupings of servers can include,
for example:
-
the first level icons are displaying a general health picture composed
of server groups
-
the second level displaying a specific server group information
-
the third level is an individual server level
-
the fourth level is individual sensor (CPU, disk space, etc)/script
level.
Dr. Nikolai Bezroukov
Notes:
- This is a Spartan WHYFF (We Help
You For Free) site written by people for whom English
is not a native language.
Some amount of grammar and spelling errors should be
expected.
- The site contain some broken links
as it develops like a living tree...
Please try to use Google, Open directory,
etc. to find a replacement link (see
HOWTO search the WEB for details). We would appreciate
if you can
mail us a correct link.
|
|
|
|
Always listen to experts.
They'll tell you what can't be done, and why.
Then do it.
-- Robert Heinlein
|
Nagios is included into SLES 11 as a supported by Novell tool.
Nagios is a popular host and service monitoring tool used
by many administrators to keep an eye on their systems.
Since I wrote a basic installation guide in Jan 2006 on
Cool Solutions many new versions were published and many
Nagios plugins are now available. Because of that I think
it's time to write a series of articles here that show you
some very interesting solutions. I hope that you find them
helpful and that you can use them in your environment. If
you are not yet and nagios user I hope that I can inspire
you and you give it a try.
I don't want to write here a full documentation about
Nagios, I prefer to give you a basic installation guide so
you can set it up very easy and play with it yourself. The
installation guide will show you how to install Nagios as
well as some interesting extensions and how they integrate
into each other. During this installation you will make many
modifications to the installation that will help to
understand how it works, how you can integrate systems and
different services. I will also provide some articles about
monitoring special services where I describe what they do
and what configuration changes are needed. All together
should give you a very good overview and documentation on
how you can enhance the Nagios installation yourself.
If you would like to read some detailed information about
Nagios visit the documentation at the project homepage at
http://www.nagios.org/docs or go through my short
article from Jan 2006 at
http://www.novell.com/coolsolutions/feature/16723.html
Munin the monitoring tool surveys all your computers and remembers what it
saw. It presents all the information in graphs through a web interface. Its
emphasis is on plug and play capabilities. After completing a installation a
high number of monitoring plugins will be playing with no more effort.
Using Munin you can easily monitor the performance of your computers, networks,
SANs, applications, weather measurements and whatever comes to mind. It makes
it easy to determine "what's different today" when a performance problem crops
up. It makes it easy to see how you're doing capacity-wise on any resources.
Munin uses the excellent
RRDTool (written by Tobi Oetiker) and
the framework is written in Perl,
while plugins may be written in any language. Munin has a master/node architecture
in which the master connects to all the nodes at regular intervals and asks
them for data. It then stores the data in RRD files, and (if needed) updates
the graphs. One of the main goals has been ease of creating new plugins (graphs).
This site is a wiki as well as a project management tool. We appreciate any
contributions to the documentation. While this is the homepage of the Munin
project, we will still make all releases through Sourceforge.
I used Nagios for health/performance monitoring of devices/servers for years
at a previous job. It has been a while, and I'm starting to look into this space
again. There are a lot more options out there for remote monitoring these days.
Here is what I have found that look good:
Do you know of any others I am missing? I'll update this list if I get replies.
The requirement is that there must be an Open Source version of the tool.
[Sep 3, 2008] TraffStats 0.11.3
by Klaus Zerwes zero-sys.net
| freshmeat.net
About: TraffStats is a monitoring and traffic analysis application
that uses SNMP to collect data from any enabled device. It has the ability to
generate graphs (using jpgraph) with the option to compare and sum up different
devices. It has a multiuser-design with rights-management and support for multiple
languages.
[Aug 27, 2008] MUSCLE 4.28
by Jeremy Friesner
About: MUSCLE (Multi User Server Client Linking Environment) is an
N-way messaging server and networking API. It includes client-side networking
APIs for various languages, including C, C++, C#, Delphi, Java, and Python.
MUSCLE lets programs communicate over a network via streams of serialized Message
objects. The included server program ("muscled") lets its clients message each
other and store information in its server-side hierarchical database. The database
supports flexible queries via hierarchical wildcarding, and "live" updates via
a subscription mechanism.
Changes: This release compiles again under Win32. A fork() vs forkpty()
option has been added to the ChildProcessDataIO class. Directory and FilePathInfo
classes have been added. There are other minor changes.
Useful Perl-script
FSHeal aims to be a general filesystem tool that can scan and report vital
"defective" information about the filesystem like broken symlinks, forgotten
backup files, and left-over object files, but also source files, documentation
files, user documents, and so on. It will scan the filesystem without modifying
anything and reporting all the data to a logfile specified by the user which
can then be reviewed and actions taken accordingly.
About: httping is a "ping"-like tool for HTTP requests. Give it a
URL and it will show how long it takes to connect, send a request, and retrieve
the reply (only the headers). It can be used for monitoring or statistical purposes
(measuring latency).
Changes: Binding to an adapter did not work and "SIGPIPE" was not
handled correctly. Both of these problems were fixed.
About:
check_oracle_health is a plugin for the Nagios monitoring software that allows
you to monitor various metrics of an Oracle database. It includes connection
time, SGA data buffer hit ratio, SGA library cache hit ratio, SGA dictionary
cache hit ratio, SGA shared pool free, PGA in memory sort ratio, tablespace
usage, tablespace fragmentation, tablespace I/O balance, invalid objects, and
many more.
Release focus: Major feature enhancements
Changes:
The tablespace-usage mode now takes into account when tablespaces use autoextents.
The data-buffer/library/dictionary-cache-hitratio are now more accurate. Sqlplus
can now be used instead of DBD::Oracle.
About: check_lm_sensors is a Nagios plugin to monitor the values of
on-board sensors and hard disk temperatures on Linux systems.
Changes: The plugin now uses the standard Nagios::Plugin CPAN classes,
fixing issues with embedded perl.
PHP based
About: Ortro is a framework for enterprise scheduling and monitoring.
It allows you to easily assemble jobs to perform
workflows and run existing scripts on remote hosts in a secure way using ssh.
It also tests your Web applications, creates simple reports using queries from
databases (in HTML, text, CSV, or XLS), emails them, and sends notifications
of job results using email, SMS, Tibco Rvd, Tivoli postemsg, or Jabber.
Changes: Key features such as auto-discovery of hosts and import/export
tools are now available. The telnet plugin was improved and the mail plugin
was updated. The PEAR libraries were updated.
Perl plugin: check_logfiles is a plugin for Nagios which checks logfiles for
defined patterns
check_logfiles 2.3.3 (Default)
Added: Sun, Mar 12th 2006 15:09 PDT (2 years, 1
month ago)
Updated: Tue, May 6th 2008 10:37 PDT (today)
About:
check_logfiles is a plugin for Nagios which checks logfiles for defined patterns.
It is capable of detecting logfile rotation. If you tell it how the rotated
archives look, it will also examine these files. Unlike check_logfiles, traditional
logfile plugins were not aware of the gap which could occur, so under some circumstances
they ignored what had happened between their checks. A configuration file is
used to specify where to search, what to search, and what to do if a matching
line is found.
[May 5, 2008] Plash 1.19
by mseaborn
About: Plash is a sandbox for running GNU/Linux programs with minimum
privileges. It is suitable for running both command line and GUI programs. It
can dynamically grant Gtk-based GUI applications access rights to individual
files that you want to open or edit. This happens transparently through the
Open/Save file chooser dialog box, by replacing GtkFileChooserDialog. Plash
virtualizes the file namespace and provides per-process/per-sandbox namespaces.
It can grant processes read-only or read-write access to specific files and
directories, mapped at any point in the filesystem namespace. It does not require
modifications to the Linux kernel.
Changes: The build system for PlashGlibc has been changed to integrate
better with glibc's normal build process. As a result, it is easier to build
Plash on architectures other than i386, and this is the first release to support
AMD-64. The forwarding of stdin/stdout/stderr that was introduced in the previous
release caused a number of bugs that should now be fixed.
About: Tcpreplay is a set of Unix tools which allows the editing and
replaying of captured network traffic in pcap (tcpdump) format.
It can be used to test a variety of passive and inline network devices, including
IPS's, UTM's, routers, firewalls, and NIDS.
Changes: This release dramatically improves packet timing, introduces
full fragroute support in tcprewrite, and improves Windows/Cygwin and FreeBSD
support. Additionally, a number of smaller enhancements have been made and user
discovered bugs have been resolved. All users are strongly encouraged to update.
Qlusters, maker of the open source systems management software OpenQRM, last
week announced on SourceForge.net that the most recent release of its OpenQRM
systems management software would be the last from Qlusters.
Imagine managing virtual machines and physical machines from the same console
and creating pools of machines booted from identical images, one taking over
from the other when needed. Imagine booting virtual nodes from the same remote
iSCSI disk as physical nodes. Imagine having those tools integrated with Nagios
and Webmin.
Remember the nightmare you ran into when having to build and deploy new kernels,
or redeploy an image on different hardware? Stop worrying. Stop imagining. openQRM
can do all of this.
openQRM, which just reached version 3.1, is an open source cluster resource
management platform for physical and virtual
data centers. In a previous life it was a proprietary project. Now it's
open source and is succeeding in integrating different leading open source projects
into one console. With a pluggable architecture, there is more to come. I've
called it "cluster resource management," but it's really a platform to manage
your infrastructure.
Whether you are deploying Xen, Qemu, VMWare, or even just physical machines,
openQRM can help you manage your environment.
This article explains the different key concepts of openQRM
openQRM consists mainly of four components:
- A storage
server, such as iSCSI or NFS volumes, which can export volumes to your
clients.
- A filesystem image, captured by openQRM, created, or generated yourself.
- A boot image, from which the node boots, consisting of a kernel, its
initrd, and a small filesystem containing openQRM tools.
- A virtual environment, which is actually the combination of a boot image
and a filesystem.
About: OpenSMART is a monitoring (and reporting) environment for servers
and applications in a network. Its main features are a nice Web front end, monitored
servers requiring only a Perl installation,
XML configuration, and good documentation. It is easy to write more checks.
Supported platforms are Linux, HP/UX, Solaris, AIX, *BSD, and Windows (only
as a client).
Changes: New checks include mqconnect, which tests if a connection
to a WebSphere MQ QueueManager is possible; mysqlconnect, which tests if a connection
to a MySQL database is possible; readfile, which tests if a file in a (potentially
network-based) filesystem is readable; and db2lck, which tests if there are
critical lock situations on your DB2 database. Many bugs were fixed. A username
and password can be specified. Recursive include functionality was added for
osagent.conf.xml. Major performance improvements
were made.
[Feb 26, 2008]
dstat
freshmeat.net
dstat is a versatile replacement for vmstat, iostat, netstat, nfsstat, and
ifstat. It includes various counters (in separate plugins) and allows you to
select and view all of your system resources instantly; you can, for example,
compare disk usage in combination with interrupts from your IDE controller,
or compare the network bandwidth numbers directly with the disk throughput (in
the same interval).
Release focus: Major feature enhancements
Changes:
Various improvements were made to internal infrastructure. C plugins are now
possible too. New topcpu, topmem, topio/tiobio, and topoom process plugins were
added along with new innodb, mysql, and mysql5 application plugins. A new vmknic
VMware plugin was added. Various fixes and improvements were made to plugins
and output.
Author:
Dag Wieers
[contact developer]
[Feb 20, 2008]
collectd 4.3.0
by Florian Forster
About: collectd is a small and modular daemon which collects system
information periodically and provides means to store the values. Included in
the distribution are numerous plug-ins for collecting CPU, disk, and memory
usage, network interface and DNS traffic, network latency, database statistics,
and much more. Custom statistics can easily be added in a number of ways, including
execution of arbitrary programs and plug-ins written in Perl. Advanced features
include a powerful network code to collect statistics for entire setups and
SNMP integration to query network equipment.
Changes: Simple threshold checking and notifications have been added
to the daemon. The hostname can now be set to the FQDN automatically. Inclusion
files have been made more flexible by allowing shell wildcards and including
entire directories. The new libvirt plugin is able to collect some statistics
about virtual guest systems without additional software on the guests themselves.
The perl plugin has been improved a lot. It can now handle multiple threads
and is now longer considered experimental. The csv plugin can now convert counter
values to rates.
|
SSH can be controlled via tools like Expect
too.
About: SSH Factory is a set of Java based client components for communicating
with SSH and telnet servers. Including both SSH (Secure Shell) and telnet components,
developers will appreciate the easy-to-use API making it possible to communicate
with a remote server using just a few lines of code. In addition, SSH Factory
includes a full-featured scripting API and easy to use scripting language. This
allows developers to build and automate complex tasks with a minimum amount
of effort.
Changes: The SshTask and TelnetTask classes were updated so that when
the cancel() method is invoked, the underlying thread is stopped without delay.
Timeout support was improved in SSH and telnet related classes. The com.jscape.inet.ipclientssh.SshTunneler
class was added for use in creating local port forwarding SSH tunnels. Proxy
support was improved so that proxy data is no longer applied to the entire JVM.
HTTP proxy support was added.
About: The sysstat package contains the sar, sadf, iostat, mpstat,
and pidstat commands for Linux.
The sar command collects and reports system activity information. The statistics
reported by sar concern I/O transfer rates, paging activity, process-related
activites, interrupts, network activity, memory and swap space utilization,
CPU utilization, kernel activities, and TTY statistics, among others.
The sadf command may be used to display data collected by sar in various
formats. The iostat command reports CPU statistics and I/O statistics for tty
devices and disks.
The pidstat command reports statistics for Linux processes. The mpstat command
reports global and per-processor statistics.
Changes: This version takes account of all memory zone types when
calculating pgscank, pgscand, and pgsteal displayed by sar -B. An XML Schema
was added. NLS was updated, adding Dutch, Brazilian Portuguese, Vietnamese,
and Kirghiz translations.
|
sarvant analyzes files from the sysstat utility "sar" and produces graphs
of the collected data using gnuplot. It supports user-defined data source collection,
debugging, start and end times, interval counting, and output types (Postscript,
PDF, and PNG). It's also capable of using gnuplot's graph smoothing capability
to soften spiked line graphs. It can analyze performance data over both short
and long periods of time.
You will find here a tutorial describing a few use cases for some sysstat
commands. The first section below concerns the
sar and
sadf commands. The second one concerns
the pidstat command. Of course, you should
really have a look at the manual pages
to know all the features and how these commands can help you to monitor your
system (follow the
Documentation link above for that).
- Section 1:
Using sar and sadf
- Section 2:
Using pidstat
Right now, OpenESM has OpenESM for Monitoring v1.3. This release of the software
is a combination of Zabbix, Apache, Simple Event Correlation and MySQL. Out
of the box, we provide monitoring - warehousing of monitoring data - SLA reporting
- correlation and notification. We offer the source code, but we also have a
VMWARE based appliance.
Another Perl-based package. It concentrates on TCP/IP based monitoring
or remote hosts.
First, thanks for writing something that seems to be clean and easy to extend.
I have been using Nagios @ work for some time and am anxious to replace it.
Richard F. Rebel - whenu.com
Very nice -- we're just starting to test Argus for a small monitoring job, and
so far it seems useful. Thanks for your contribution to the open source community.
p>
Andre van Eyssen - gothconsultants.com
thanks great tool!! p
Sorin Esanu - from.ro
I am really happy with your soft, it is probably one of the best i have never
found!
I own a hosting and this tool has been really cool for my business :)
Raul Mate Galan - economiza.com
Argus works excellently. We use it to log data about all traffic through our
router so that we can produce bandwidth usage statistics for customers.
Geoff Powell - lanrex.com.au
Conky is an advanced, highly configurable system monitor for X based on torsmo.
Conky is an powerful
desktop app that posts system monitoring info onto the root
window. It is hard to set up properly (has unlisted dependencies, special command
line compile options, and requires a mod to xorg.conf to stop it from flickering,
and the apt-get version doesnt work properly). Most people can’t get it working
right, but its an AWESOME app if it can be set up right done.
[Jul 25, 2007]
monit
Dead-wood C-based application. Looks like has some ad-hoc language for description
of checks.
Samba (windows file/domain server)
Hint: For enhanced controllability of the service it is handy to split
up the samba init file into two pieces, one for smbd (the file service) and
one for nmbd (the name service).
check process smbd with pidfile /opt/samba2.2/var/locks/smbd.pid
group samba
start program = "/etc/init.d/smbd start"
stop program = "/etc/init.d/smbd stop"
if failed host 192.168.1.1 port 139 type TCP then restart
if 5 restarts within 5 cycles then timeout
depends on smbd_bin
check file smbd_bin with path /opt/samba2.2/sbin/smbd
group samba
if failed checksum then unmonitor
if failed permission 755 then unmonitor
if failed uid root then unmonitor
if failed gid root then unmonitor
check process nmbd with pidfile /opt/samba2.2/var/locks/nmbd.pid
group samba
start program = "/etc/init.d/nmbd start"
stop program = "/etc/init.d/nmbd stop"
if failed host 192.168.1.1 port 138 type UDP then restart
if failed host 192.168.1.1 port 137 type UDP then restart
if 5 restarts within 5 cycles then timeout
depends on nmbd_bin
check file nmbd_bin with path /opt/samba2.2/sbin/nmbd
group samba
if failed checksum then unmonitor
if failed permission 755 then unmonitor
if failed uid root then unmonitor
if failed gid root then unmonitor
Looks like dead wood: C-based application.
monit is a utility for managing and monitoring, processes, files, directories
and devices on a
UNIX
system. Monit conducts automatic maintenance and repair and
can execute meaningful causal actions in error situations.
Monit Features
* Daemon mode - poll programs at a specified interval
* Monitoring modes - active, passive or manual
* Start, stop and restart of programs
* Group and manage groups of programs
* Process dependency definition
* Logging to syslog or own logfile
* Configuration - comprehensive controlfile
* Runtime and TCP/IP port checking (tcp and udp)
* SSL support for port checking
* Unix domain socket checking
* Process status and process timeout
* Process cpu usage
* Process memory usage
* Process zombie check
* Check the systems load average
* Check a file or directory timestamp
* Alert, stop or restart a process based on its characteristics
* MD5 checksum for programs started and stopped by monit
* Alert notification for program timeout, restart, checksum, stop resource and
timestamp error
* Flexible and customizable
email alert messages
* Protocol verification. HTTP, FTP, SMTP, POP, IMAP, NNTP, SSH, DWP,LDAPv2 and
LDAPv3
* An http interface with optional SSL support to make monit accessible from
a webbrowser
Install Monit in Debian
#apt-get install monit
This will complete the installation with all the required
software.
Configuring Monit
Default configuration file located at /etc/monit/monitrc
you need to edit this file to configure your options
Sample Configuration file as follows and uncomment all
the following options
## Start monit in background (run as daemon) and check
the services at 2-minute
## intervals.
#
set daemon 120
## Set syslog logging with the ‘daemon’ facility. If
the FACILITY option is
## omited, monit will use ‘user’ facility by default. You can specify the
## path to the file for monit native logging.
#
set logfile syslog facility log_daemon
## Set list of
mailservers for alert delivery. Multiple servers may be
## specified using comma separator. By default monit uses port 25 - it is
## possible to override it with the PORT option.
#
set mailserver localhost # primary mailserver
## Monit by default uses the following alert mail format:
From: monit@$HOST # sender
Subject: monit alert — $EVENT $SERVICE # subject
$EVENT
Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST # body
Description: $DESCRIPTION
Your faithful,
monit
## You can override the alert message format or its parts
such as subject
## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.
## are expanded on runtime. For example to override the sender:
#
set mail-format { from: monit@monitorserver.com }
## Monit has an embedded webserver, which can be used
to view the
## configuration, actual services parameters or manage the services using the
## web interface.
#
set httpd port 2812 and
use address localhost # only accept connection from localhost
allow localhost # allow localhost to connect to the
server and
allow 172.29.5.0/255.255.255.0
allow admin:monit # require user ‘admin’ with password ‘monit’
# Monitoring the apache2 web services.
# It will check process apache2 with given pid file.
# If process name or pidfile path is wrong then monit will
# give the error of failed. tough apache2 is running.
check process apache2 with pidfile /var/run/apache2.pid
#Below is actions taken by monit when service got stuck.
start program = “/etc/init.d/apache2 start”
stop program = “/etc/init.d/apache2 stop”
# Admin will notify by mail if below of the condition satisfied.
if
cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
group server
#Monitoring Mysql Service
check process mysql with pidfile /var/run/mysqld/mysqld.pid
group
database
start program = “/etc/init.d/mysql start”
stop program = “/etc/init.d/mysql stop”
if failed host 127.0.0.1 port 3306 then restart
if 5 restarts within 5 cycles then timeout
#Monitoring ssh Service
check process sshd with pidfile /var/run/sshd.pid
start program “/etc/init.d/ssh start”
stop program “/etc/init.d/ssh stop”
if failed port 22 protocol ssh then restart
if 5 restarts within 5 cycles then timeout
You can also include other configuration files via include
directives:
include /etc/monit/default.monitrc
include /etc/monit/mysql.monitrc
This is only sample configuration file. The configuration
file is pretty self-explaining; if you are unsure about an option, take a look
at the monit documentation http://www.tildeslash.com/monit/doc/manual.php
After configuring your monit file you can check the configuration
file syntax using the following command
#monit -t
Once you don’t have any syntax errors you need to enable
this service by changing the file /etc/default/monit
# You must set this variable to for monit to start
startup=0
to
# You must set this variable to for monit to start
startup=1
Now you need to start the service using the following
command
#/etc/init.d/monit start
Monit Web interface
Monit Web interface will run on the port number 2812.If
you have any firewall in your
network setup you need to enable this port.
Now point your browser to http://yourserverip:2812/ (make
sure port 2812 isn’t blocked by your
firewall), log in with admin and monit.If you want a secure
login you can use https check
here
Monitoring Different Services
Here’s some real-world configuration examples for monit.
It can be helpful to look at the examples given here to see how a service is
running, where it put its pidfile, how to call the start and stop methods for
a service, etc. Check
here for more examples.
Ortro is a Web-based system for scheduling and application
monitoring. It allows you to run existing scripts on remote hosts
in a secure way using ssh, create simple reports using queries from databases
(in HTML, text, CSV, or XLS) and email them, and send notifications of job results
using email, SMS, Tibco Rvd, Tivoli postemsg, or Jabber.
Release focus: Major feature enhancements
Changes:
Support for i18n was added, and English and Italian languages are now available.
More plugins were added, such as zfs scrub check, svc check, and zpool check
for Solaris. Session check and tablespace check for Oracle and Check Uri were
added. The mail, custom_query, ping, and www plugins were updated. There are
bugfixes and improvements for the GUI such as the "add" button in the toolbar.
The PEAR libraries were updated to the latest stable version.
"One of the big flaws of enterprise monitoring is monitoring
without context."
Be wouldn't it be tough for IT managers sell higher-ups on the virtues on
a open source monitoring tool? It might be worth the effort, said James Turnbull,
author of
Pro Nagio 2.0. Turnbull spoke recently with SearchOpenSource.com
Assistant Editor MiMi Yeh about how Nagios is different from its counterparts
in the commercial world and why IT shops should give it a chance.
What sets Nagios apart from other open source network monitoring tools
like Big Brother, OpenNMS, OpenView and SysMon?
James Turnbull: I think there are three key reasons why Nagios is
superior to many other products in this area -- ease of use, extensibility and
community. Getting a Nagios server up and running generally only takes a few
minutes. Nagios is also easily integrated and extended either by being able
to receive data from other applications or sending data to reporting engines
or other tools. Lastly, Nagios has excellent documentation backed up with a
great community of users who are helpful, friendly and knowledgeable. All these
factors make Nagios a good choice for enterprise management in small, medium
and even large enterprises.
... ... ...
What tips, best practices and gotchas can you offer to sys admins working
with Nagios?
Turnbull: I guess the best recommendation I can give is read the documentation.
The other thing is to ask for help from the community -- don't be afraid to
ask what you think are dumb questions on Wikis, Web sites, forums or mailing
lists. Just remember the golden rule of asking questions on the Internet --
provide all the information you can and carefully explain what you want to know.
Are there workarounds to address the complaint that Nagios has no individual
IP addresses for each host and service must be defined?
Turnbull: I think a lot of the 'automated'
discovery tools are actually more of a hindrance than a help. One of the big
flaws of enterprise monitoring is monitoring without context. It's all well
and good to go out across the network and detect all your hosts and add them
to the monitoring environment, but what do all these devices do?
You need to understand exactly what you are monitoring and why. When something
you are monitoring fails, you not only know what that device is but what the
implications of that failure are. Nagios is not a business context/business
process tool. The fact that you have to think clearly about what you want to
monitor and how means that you are more aware of your environment and the components
that make up that environment.
Is there any advice you would give to users?
Turnbull: The key thing to say to new users is to try it out. All
you need is a spare server and a few hours and you can configure and experiment
with Nagios. Take a few problems areas you've had with monitoring and see if
you can solve them with Nagios. I think you'll be pleasantly surprised.
Systher is a small Perl tool that collects system information and
presents it as an XML document. The information is collected using standard
Unix tools, such as netstat, uptime and lsof.
Systher can be used in many ways:
- When invoked from the command line, Systher simply shows the state of
the system where it was invoked.
- Systher can be run as a stand-alone daemon, listening to an arbitrary
TCP port, so that callers can remotely obtain the system information.
- Systher can be run as a cgi-bin script, so that browsers can
connect to it.
In order to make the obtained information readable for humans, Systher is
equipped with an XSLT processing stylesheet to convert the XML information into
HTML. That way, the information can be made visible in a browser.
About: ZABBIX is an enterprise-class distributed monitoring solution
for networks and applications. Native high-performance ZABBIX agents allow monitoring
of performance and availability data of all operating systems.
Changes: This release introduces support of centralized distributed
monitoring, flexible auto-discovery, advanced Web monitoring, and much more.
Collection of a dozen of scripts. some in Perl.
Unix Server Monitoring Scripts is a suite that will monitor Unix disk space,
Web servers via HTTP, and the availability of SMTP servers via SMTP. It will
save a history of these events to diagnose and pinpoint problems. It also sends
a message via email if a Web server is down or if disk usage exceeds one of
two thresholds. Each script acts independently of the others.
Main Scripts
Support Scripts
Tarball of all files in the Suite
Zenoss is built on the python-based
Zope Application
server. Zenoss uses
NetSNMP to collect data via
SNMP, data is stored in
MySQL, and data
is logged by RRDtool.
Feb 08, 2007 | SearchNetworking.com
Network monitoring and management applications can be costly and cumbersome,
but recently a host of companies have sprung forth offering an open source alternative
to IBM Tivoli, HP OpenView, CA and BMC -- and they're starting to gain traction.
The major commercial software vendors, known as the "big four,"
are frequently criticized for their high cost and complexity and, in some cases,
are chided for being too robust -- having too many features that some enterprise
users may find completely unnecessary.
Many of the open source alternatives are quick to admit that
their solutions aren't for everyone, but they bring to the table arguments in
their favor that networking pros can't ignore, namely low cost and ease of use.
"Open source is a huge phenomenon," Zenoss CEO and co-founder
Bill Karpovich said. "It's providing an alternative for end users."
Zenoss makes Core, an integrated IT monitoring product that
lets IT admins manage the status and health of their infrastructure through
a single Web-based console. The latest version of the free, open source software
features automated change tracking, automatic remediation, and expanded reports
and export capabilities.
According to Karpovich, Zenoss software monitors complete
networks, servers, applications, services, power and related environments. The
biggest benefit, however, is its openness, meaning that users can tailor it
to their systems any way they choose.
"It's complete enterprise IT monitoring," Karpovich said.
"It's network monitoring and management, application management, and server
management all through a single pane of glass."
Flexibility included
Some users have said the Tivolis and OpenViews of
the world are hard to customize and very inflexible, but open
source alternatives are often the opposite. They are known for their flexibility.
"You can use the product as you want," Karpovich said.
Nagios developer Ethan Galstad said flexibility is a major
influence on enterprises looking to move ahead with an open source monitoring
project. Nagios makes open source software that monitors network availability
and the states of devices and services.
"You have as an end user much more influence on the future
of the feature set," Galstad said, adding that through the open source community,
end users can request a feature they want, discuss the pros and cons and, in
many cases, implement that feature within a relatively short time.
And for things that Nagios and other open source monitoring
tools don't do, end users can tie the tools in with other solutions to create
the environment they want.
"There are a lot of hooks," Galstad said.
|
2006-07-28 (howtoforge.com)
OpenNMS is an opensource enterprise network management tool. It helps network
administrators to monitor critical services on remote machines and collects
the information of remote nodes by using SNMP. OpenNMS has a very active community,
where you can register yourself to discuss your problems. Normally OpenNMS installation
and configuration takes time, but I have tried to cover the installation and
configuration part in a few steps.
OpenNMS provides the following features.
ICMP Auto Discovery
SNMP Capability Checking
ICMP polling for interface availability
HTTP, SMTP, DNS and FTP polling for service availability
Fully distributed client server architecture
JAVA Real-time console to allow moment-to-moment status of the network
XML using XSL style web access and reporting
Business View partitioning of the network using policies and rules
Graphical rule builder to allow graphical drag/drag relationships to be built
JAVA configuration panels
Redundant and overlapping pollers and master station
Repeating and One-time calendaring for scheduled downtime
The source code of OpenNMS is available for download from sourceforge.net. A
production release (stable) and a development release (unstable), I have used
1.2.7 stable release in this howto. I have tested this configuration with Redhat/Fedora,
Suse, Slackware, Debian and it works smoothly. I am assuming that readers already
have Linux background. You can use the following configuration for other distributions
too. Before you start OpenNMS installation, you need to install following packages:
jdk1.5*
tomcat 4.*
postgres 8.*
rrdtool1.2*
March 10, 2006 (howtoforge.com)
Zabbix has the capability to monitor just a about any event on your network
from network traffic to how many papers are left in your printer. It produces
really cool grahps.
In this howto we install software that has an agent and a
server side. The goal is to end up with a setup that has a nice web interface
that you can show off to your boss ;)
It's a great open source tool that lets you know what's out there.
This howto will not go into setting up the network but I might rewrite it one
day so I really like your input on this. Much of what is covered here is in
the online documentation however if you are like me new to this all this might
be of some help to you.
|
GroundWork unifies leading open source projects like Nagios, Ganglia, RRDtool,
Nmap, Sendpage, and MySQL, and offers a wide range of support for operating
systems (Linux, Unix, Windows, and others), applications, and networked devices
for complete enterprise-class monitoring.
Release focus: Major feature enhancements
New features include:
- Incorporation of RRD data: enhancing GWMOS with other tools that use RRDs
should be much easier
- Performance graphing of historical data using the RRD data
- UI improvements to give you access to information of interest, with fewer
clicks, in a cleaner interface
In addition to the source tarball downloadable fr the SVN repository is also
accessible.
GroundWork Monitor Open Source (GWMOS) 5.1-01 Bootable ISO now available:
this image should boot cleanly in any ix86-compatible computer, or boot
the image in a virtualized environment such as VMWare or Xen. It's a
simple, super fast mechanism for evaluating GWMOS while setting up temporary
monitoring quickly at any site: just pop in the CD and boot!
The GroundWork Monitor Open Source Bootable ISO automatically boots, logs you
in, launches Firefox, and starts up GroundWork with all the associated services
such as apache, Nagios(R), MySQL, and RRDtool, etc. all loaded and running.
The ISO is set up with included profiles to monitor the host system and two
internet sites out-of-the-box, giving you some immediate data to observe without
setting up any additional devices. When booted from a physical CD, everything
runs in the computer's RAM: the hard drive of the host computer is never touched.
Have fun, and keep us posted on your experience at
http://www.groundworkopensource.com/community/
|
I have used BigBrother
and Nagios for a long time
to troubleshoot network problems, and I was happy with them -- until
Zabbix came along. Zabbix
is an enterprise-class open source distributed monitoring solution for servers,
network services, and network devices. It's easier to use and provides more
functionality than Nagios or BigBrother.
Zabbix is a server-agent type of monitoring software, meaning you have a
Zabbix server where all gathered data is collected, and a Zabbix agent running
on each host.
All Zabbix data, including configuration and performance data, is stored
in a relational database -- MySQL, PostgreSQL, or Oracle -- on the server.
Zabbix server can run on all Unix/Linux distributions, and Zabbix agents
are available for Linux, Unix (AIX, HP-UX, Mac OS X, Solaris, FreeBSD), Netware,
Windows, and network devices running SNMP v1, v2, and v3.
I strongly doubt that this is FUD. Looks like pretty realistic assessment of
the situation.
March 05, 2007
OpenNMS bests OpenView and Tivoli while Ipswitch spreads the FUD
Filed under:
Infrastructure
Chalk up another victory for OSS over proprietary. OpenNMS beat out both
OpenView and Tivoli in the
SearchNetworking Product Leadership Awards. I wonder if that will shut up
this ridiculous FUD from Ipswitch "Don't
trust your network to open source."
I let Travis
take the shots at this foolishness...wake up, Ipswitch, you are late to
the FUD train.
Javier...anything
from you?
Myth #1 - Open Source is free - According to Greene, downloading
open source from the Internet and then customizing to your environment "often
is not a good use of your time." Greene adds that he'd "rather pay an upfront
fee for software that does what I need and doesn't have any high-cost labor
attached to it."
Hmmm ... what about the fact that proprietary software (and *especially* network
monitoring and management products) are often tremendously difficult to install
/ configure / maintain ongoing? How is being held hostage to a vendor for support
/ installation / configuration preferable? And how is being tied to a predetermined
feature set preferable to having the ability to customize an open source approach
solution to meet your environment's needs?
Myth #2 - Bug fixes are faster and less expensive in an open source
environment - the second "myth" that Greene exposes around open source is
the notion that there are thousands of developers sitting at home contributing
labor for free. Greene suggests that most of the contributing vendors are
typically employed by large vendors ? and that "even when those individuals
generously offer their time for free, can you really afford to wait for
one to agree with you on the urgency of action if your network is down."
Hmmm ...so it's better NOT to have access to the source code when you have a
bug? It's preferable to have to open a help ticket with the vendor and wait
in line? It's better NOT to have general visibility into the bugs and issues
being reported by the members of the user community?
Myth #3 - Your IT staff can buy a 'raw' tool and shape it to their
needs - Greene's last point is that the industry
has moved away from the "classic open source" model where folks download
raw open source and customize to their needs - and to more of a commercial
open source model, where organizations are leveraging open source distribution
as a way to sell services.
Feedback:
Hi,Not a very valid comparison as there are many products out there that
do a far better job the HP OpenView or OpenNMS or Tivoli.
If you are an OSS type supporter in terms of your business model it would
make finacial sense to use OpenNMS but in terms of best of breed this OSS product
does not come close. Some might argue that using OSS software will cost you
more as there are very few people who know how to use it and I mean use it,
not some Linux script kiddy but someone with enterprise management experience.
These days its not about implementation its about
integration and the comparison should be about how nice does it play with the
rest of my environment.
I don't see EMC SMARTS in the comparison list.
I am all for OSS software as long as it is not chosen as the cheapest option
but rather as the best of breed option. As for NMS commercial software, I use
it day in and day out and would like to see a more open model in terms of functionality
and development.
Take a leaf out of SUN book, Open Solaris has proven to be a good business
model for a commercial company and the benefits will be seen for years to come.
Posted by: James at March 8, 2007 04:34 AM
GOLD AWARD:
OpenNMSThe network is the central nervous system of the modern enterprise
-- complex and indispensable. Keeping tabs on how that enterprise is functioning
requires a sophisticated "big picture" management system that can successfully
integrate with other network and IT products. Unfortunately, many products in
this category are just too expensive for any but the largest companies (with
the most generous IT budgets) to afford.
Enter
OpenNMS,
the gold medal winner in our network and IT management platforms category. The
open source enterprise-grade network management system was designed as a replacement
for more expensive commercial products such as IBM Tivoli and HP OpenView. It
periodically checks that services are available, isolates problems, collects
performance information, and helps resolve outages. And it's free.
In our Product Leadership survey, readers praised OpenNMS for being easy
to customize, easy to integrate and -- of course -- free. These attributes are
all characteristic of any open source product. Because of its open source nature,
OpenNMS has a community of developers contributing to its code. The code is
open for anyone to view or adapt to suit individual needs.
Consequently, users can customize OpenNMS in ways that are limited only by
their abilities and imagination -- not by licensing restraints. One reader said,
"It is an open source product, so we can customize it easily." With traditional
proprietary products, it may be difficult to find one piece of software that
can manage the network effectively for every enterprise, but OpenNMS was designed
to allow users to add management features over time. Its intentional compatibility
with other open source (and proprietary) products provides seamless integration,
requiring less piecemeal coding to fit things together.
Users of OpenNMS can also take advantage of the user community accessible
through the OpenNMS Web site for answers to questions and help in troubleshooting
problems. While one survey respondent remarked that "open source is advancing
slowly to address some of the manageability issues," members of the OpenNMS
mailing list are quick to answer any request with a friendly, knowledgeable
response. For companies whose IT personnel are not afraid of an unconventional
approach, the open source community provides support that is just as reliable
as that of a commercial vendor -- and in many cases, more helpful.
But OpenNMS is not a "you get what you pay for" product, either. Readers
said it "works great" and "significantly helped our network's bandwidth and
packet management and controlled 'rogue' clients." Others found that it "works
fine for a small business network" and is an "outstanding option." Even those
whose experience was less positive found that any challenges were surmountable,
such as the reader who said, "Since it's free, it was worth the effort."
It is impossible to do systems administration without monitoring and alerting
tools. Basically, these tools are scripts, and writing such monitoring scripts
is an ancient part of systems administration that's often full of dangerous
mistakes and misconceptions.
The traditional way of putting systems together is very stochastic and erratic,
and that same method is often followed when developing monitoring tools. It
is really rare to find a system that's been properly planned and designed from
the start. The usual approach when something goes wrong is just to patch the
immediate problem. Often, there are strange results from people making mistakes
when they're in a hurry and under pressure.
Monitoring scripts are traditionally fired from root cron and send results
by email. These emails can accumulate over time, flooding people with strange
mails, creating problems on the monitored system, and causing other unexpected
situations. Such scenarios are often unavoidable, because few enterprises can
afford better measures than firefighting. In this article, I will mention a
few tips that can be helpful when developing monitoring scripts, and I will
provide three sample scripts.
What is a Unix Monitoring Script?
A monitoring tool or script is part of system management and to be really
efficient must be part of an enterprise-wide effort, not a standalone tool.
Its purpose is to detect problems and send alerts or, rarely, to try to correct
the problem. Basically, a monitoring/alerting tool consists of four different
parts:
1. Configuration -- Defines the environment and does initializations, sets
the defaults, etc.
2. Sensor -- Collects data from the system or fetches pre-stored data.
3. Conditions -- Decides whether events are fired.
4. Actions -- Takes action if events are fired.
If these elements are simply bundled into a script without thinking, the
script will be ineffective and un-adaptable. Good tools also include an abstraction
layer added to simplify things later, when modifications are done.
To begin, we have to set some values, do some sanity checks, and even determine
whether monitoring is allowed. In some situations, it is good to stop monitoring
through the control file to avoid false notifications, during maintenance for
example. This is all done in the configuration part of the script.
The script collects values from the system -- from monitored processes or
the environment. This data collecting is done by the sensor part. This data
can be the output of an external command or can be fetched from previously stored
values, such as the current df output or previously stored df
values (see
Listing 1).
The conditions part of the script defines the events that are monitored.
Each condition detects whether an event has happened and whether this is the
start or the end of the event (arming or rearming). This process can compare
current values to predefined limits or to stored values, if we are interested
in rates instead of absolute values. Events can also be based on composite or
calculated values, such as "Average idle from sar for the last 5 minutes is
less than 10%" (see
Listing 2).
Results at this level are logical values usually presented as some kind of
empty/not-empty string, to be easily manipulated in later usage. The key is
to have some point in the code where the clear status of the event is defined,
so branching can be done simply and easily.
Actions consist of specific code that is executed in the context of a detected
event, such as storing new values, sending traps, sending email, or performing
some other automatically triggered action. It is good to put these into functions
or separate scripts, since you can have similar actions for many events. Usually
we want to send email to someone or send a trap. It is almost always the same
code in all scripts, so keeping it separate is a good idea.
It is important to add some state support. We are not just interested in
detecting limit violations; if that were the case, we would be flooded with
messages. Detecting state changes can reduce unwanted messaging. When we define
an event in which we are interested, we actually want to know when the event
happened and when it ended -- that is, when the monitored values passed limits
and when they returned. We are not interested in full-time notification that
the event is still occurring. Thus, we need to know the change of the event
state and value of the monitored variable.
State support is not necessary if there is some kind of console that can
correlate notifications. In the simplest implementations, like a plain monitoring
script, avoiding message flooding directly in the script itself is useful.
Each event must have a unique name and severity level. Usually, three levels
of severity are enough, but sometimes five levels are used. It is best to start
with a simple model such as:
Info -- Just information that something has happened
Warning -- Warning of possible dangerous situation
Fatal -- Critical situation
|
IBM Redbooks
A Practical Guide for Resource Monitoring and Control (RMC), SG24-6615-00
-- http://www.redbooks.ibm.com/redbooks/SG246615.html
Managing AIX Server Farms, SG24-6606-00 Redbook --http://www.redbooks.ibm.com/redbooks/SG246606.html
Books
Frisch, Ćleen. Essential System Administration, 3rd Edition, August
2002. O'Reilly & Associates. ISBN: 0-596-00343-9.
Powers, Shelley, J. Peek, T. O'Reilly, and M. Loukides. Unix Power Tools,
3rd Edition, October 2002. O'Reilly & Associates. ISBN: 0-596-00330-7.
Blank-Edelman, David. Perl for System Administration, 1st Edition,
July 2000. O'Reilly & Associates. ISBN: 1-56592-609-9.
Links
Stokely Consulting -- http://www.stokely.com/unix.sysadm.resources/index.html
Big Brother Archive -- http://www.deadcat.net/browse.php
BigAdmin Scripts -- http://www.sun.com/bigadmin/scripts/
Shelldorado -- http://www.shelldorado.com
Damir Delija has been a Unix system engineer since 1991. He received a
Ph.D. in Electrical Engineering in 1998. His primary job is systems administration,
education, and other system-related activities.
All of the scripts listed in this article are meant to be run from cron
on a regular basis -- daily or hourly, depending on the routine in question
-- with the output going to either email or to the systems administrator's pager.
However, none of the things described in this article are foolproof. UNIX security
mechanisms are only relevant if the root account has not been compromised. For
example, scripts run through crontab can be easily disabled or modified
if the attacker has attained root access, and most log files can be manipulated
to cover tracks if the intruder has control over the root account.
I tested out OpenNMS but found Nagios to be easier to get running, plus
OpenNMS was very linux centric last I checked. Which is annoying since it
looks like it's just a java application, no reason it couldn't be made to run
elsewhere.
Anyway, as far as I can tell Nagios does everything OpenNMS does and more. As
a network monitoring tool it's been great, I have it polling all of our SNMP
enabled devices and receiving traps. With the host and service dependencies
it becomes easier to see if the cause of an application failure is software,
hardware, or network based.
That being said I would still love to play with OpenNMS if anyone has a way to
get it to work under FreeBSD.
On Thursday 10 October 2002 04:52 pm, Alan Horn wrote:
> On 10 Oct 2002, Stephen L Johnson wrote:
> >If your are mainly monitoring networks, network monitoring tools are
> >better. The non-commercials tools, that I have looked at are OpenNMS and
> >Naigos (NetSaint). These tools are designed monitor network mainly.
> >Systems monitoring can be added as well.
>
> Nagios is primarily for monitoring network _services_ in it's default
> install (via the nagios plugins you get with the tool). Not for monitoring
> network devices (although it'll do that too). I just wanted to clarify
> that since I read this as 'nagios for monitoring cisco kit etc...' By
> network services I mean stuff like DNS, webservers, smtp, imap, etc... All
> the services that you probably want to monitor first of all when you set
> out to do thia.
>
> Adding systems monitoring with nagios is very nice indeed, using the NRPE
> (Nagios Remote Plugin Executor) module, you can run whatever arbitrary
> code you desire on your system, and return results back to the monitor. I
> have it monitoring diskspace on critical fileservers, health of some
> custom applications etc...
>
> I've used nagios, nocol, and big brother (many many moons ago.. it's
> evolved since I used it). Nagios most recently. Nagios takes a bit of work
> to setup due to its flexibility, but I've found it to be the best for my
> needs in both a single and multi-site situation (we have branch offices
> located around the world via VPN which need to be monitored).
>
> And the knowledge of network topology is great too !
>
> Hope this helps.
>
> Cheers,
>
> Al
David Nolan
Fri, 08 Sep 2006 05:49:55 -0700On 9/3/06,
Toddy Prawiraharjo <toddyp@...>
wrote:
>
> Hello all,
>
> I am looking for alternative to
Nagios (or should i stick with it? need
> opinions pls), and saw this Mon.
The choice between Mon and other OSS monitoring systems like
Nagios,
Big Brother or any of the others is very much dependent upon your
needs.
My best summary of Mon is that its monitoring for
sysadmins. Its not pretty, its not designed for management,
its designed to allow a sysadmin
to automate the performance monitoring that might otherwise
be done ad-hoc or with cron jobs. It doesn't trivially provide the typical
statistics gathering that many bean-counters are looking for, but its extensible
and scalable in amazing ways. (See recent posts on this list about one
company deploying a network of 2400 mon servers
and 1200 locations, and my mon site which runs 500K monitoring tests a day,
some of those on hostgroups with hundreds of hosts.)
> Btw, i need some auto-monitoring tools
to monitor basic unix and windows > based services,
such as nfs, sendmail, smb, httpd, ftp, diskspace, etc.
> I love perl so much, but then its been long time since it's been updated.
Is it still around and supported?
If you love perl Mon may be perfect for you, because
if there is a feature you need you can always send us a patch. :)
Its definitely still around and supported. (I just posted a link
to a mon 1.2.0 release candidate.)
There hasn't been a lot of updates
to the system in the last couple of years, but that's
in part because the system is pretty stable as-is. There are
certainly some big-picture changes we would like to
do, but none of the current developers have had pressing reasons
to work on the system.
Personally, most of my original patches were based on CMU's needs when
we did our Mon deployment, and since that time no major internal effort has
been spent on extending the system. A review process of
our monitoring systems is just starting now and that may result in either more
programmer time being allocated to Mon or CMU might
move away from Mon to some other system. (Obviously
I'd be unhappy with that result, but I would continue to
work with Mon both personally and in my consulting work.)
> Any good reference on the web interface? (the one from the site, mon.lycos.com
is dead).
I believe the most commonly used interface is mon.cgi,
maintained by Ryan Clark, available at
http://moncgi.sourceforge.net/
An older version of mon.cgi is included
in the mon distribution.
> And most importantly, where to
> start? (any good documentation as starting point on how
to use this Mon)
>
Start by reading the documentation, looking at the sample config file, and experimentation.
A small installation can be setup in a matter of minutes. Once you've
done a proof-of-concept install you can decide if Mon is right for you.
-David
Mon, 27 Nov 2006 18:31:13 -0800
I'm looking for suggestions for any GPL/opensource system monitoring
tools that folks can recommend.
FYI we've been using Nagios for about 6 months now with mixed results.
While it works, we've had to do an awful lot of customization and
writing our own checks (mostly application-level stuff for our
proprietary software).
I think we would be alot happier with something simpler and more
flexible than Nagios. Right now it's a choice between further hacking
of Nagios vs. "roll our own" (the latter, I think, will be much more
maintainable over the long run). But of course I'm looking to avoid
reinventing the wheel as much as possible.
Any feedback or pointers are much appreciated.
thanks,
JB
Re: [BBLISA] GPL system monitoring tools? (alternatives to nagios)
Jason Qualkenbush
Tue, 28 Nov 2006 06:35:56 -0800
I don't know about that. Nagios is really a roll your solution. All it
really does is manage the polling intervals between checks. Just about
everything else is something most people are going to write custom to their
environments.
Just make sure you limit the active checks to simple things like ping, url,
and some port checking. The system health checks (like disks, cpu usage,
application checks) are really best done on the host itself. Just run a
cron (or whatever the windows equivalent is) job that checks the system and
submits the results to the nagios server via a passive check.
What customizations are you doing? The config files? What exactly is
Nagios failing to do?
Re: [BBLISA] GPL system monitoring tools? (alternatives to nagios)
John P. Rouillard
Tue, 28 Nov 2006 12:48:17 -0800
In message <[EMAIL PROTECTED]>,
"Scott Nixon" writes:
>We have been looking at OpenNMS(opennms.org). It is developed full time
>by the OpenNMS Group(opennms.com). It was designed from the ground up to
>be an Enterprise class monitoring solution. If your interested, I'd
>suggest listening to this podcast with the *manager* of OpenNMS Tarus
>Balog (http://www.twit.tv/floss15).
I talked with Mr. Balog at the 2004 LISA IIRC. The big thing that
makes opennms a non-starter for me was the inability to create
dependencies between services. It's a pain to do in nagios but it's
there and that is a critical tools for enterprise level operations.
A fast perusal of the OpenNMS docs doesn't show that feature.
Compared to nagios the OpenNMS docs seem weak.
Also at the time all service monitors had to be written in java. I
think there were plans to make a shell connector that would allow you
to run any program and feed it's output back to OpenNMS. That means
all the nagios plugins could be used with a suitable shell wrapper.
OpenNMS had a much nicer web interface and better access control
IIRC. But at the time I don't think you could schedule downtime in the
web interface. Alo I just looked at the demo and didn't see it (but
that may be because it's a demo).
On the nice side, having multiple operational problem levels (5/6
IIRC) rather then nagios's 3: ok, warning, and critical was something
I wished Nagios had.
Also the ability to annotate the events with more info than nagios
allows was a win, but something similar could be done in nagios.
I liked it it just didn't provide the higher level functionality that
we needed.
-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.Feb 20
Nagios is frankly not
very good, but it's better than most of the alternatives
in my opinion. After all, you could spend buckets of cash on
HP OpenView or
Tivoli and still be faced with the same amount of work to customize it into
a useful state....
Among the free alternatives, in my experience
Big Brother is too unstable
to trust, which makes me loath to buy a license as required for a commercial
use.
Mon is quite good at
monitoring and alerting, but it has all the same problems as Nagios plus a lack
of sexy web GUI. I also don't like the way it handles service
restoration alerts or blocking outages (dependencies) or multiple concurrent
outages.
For an easy way to get started with Nagios, try
GroundWork Monitor Open Source: it unifies Nagios with lots of other open source
IT tools and is much easier to set up than vanilla Nagios.
Java
and
JavaScript
written, licensed under GPL
About: Hyperic HQ is a distributed infrastructure management system
whose architecture assures scalability, while keeping the solution easy to deploy.
HQ's design is meant to deliver on the promise of a single integrated management
portal capable of managing unlimited types of technologies in environments that
range from small business IT departments to the operations groups of today's
largest financial and industrial organizations.
Changes: This release features significant new functionality, including
Operations Dashboard, a central view for real-time, general health of the entire
infrastructure managed.
More powerful alerting is provided with alert escalation, alert acknowledgment,
and RSS actions.
Event tracking and correlation provides historical and real-time information
from any log resource, configuration file, or security module that can
be correlated with availability, utilization, and performance.
|
The idea of using gateway that provides encryption and all other "high-level"
features for communicating with the server is attractive for monitoring.
About: DeleGate is a multi-purpose application level gateway or proxy
server that mediates communication of various protocols, applying cache and
conversion for mediated data, controlling access from clients, and routing toward
servers. It translates protocols between clients and servers, converting between
IPv4 and IPv6, applying SSL (TLS) to arbitrary protocols, merging several servers
into a single server view with aliasing and filtering. It can be used as a simple
origin server for some protocols (HTTP, FTP, and NNTP).
Changes: This version supports "implanted configuration parameters"
in the executable file of DeleGate to restrict who can execute the executable
and which functions of it are available, or to tailor the executable adapting
to the environment in which it is used.
Conky is a lightweight
system monitor that provides essential information in an easy-to-understand,
highly customizable interface. The software is a fork of
TORSMO, which is
no longer maintained. Conky monitors your CPU usage, running processes, memory,
and swap usage, and other system information, and displays the information as
text or as a graph.
Debian and Fedora users can use apt-get and yum respectively to install Conky.
A
source tarball is also available.
Python-based. Product used by Mercy Hospital of Baltimore and Cablevision
of New York. Funding $4.8 millions in August 2006. Low cost alternative to
monsters enterprize applications, affordable only to large companies.
Zenoss is an IT infrastructure monitoring product that allows
you to monitor your entire infrastructure within a single,
integrated software application.
Key features include:
- Monitors the entire stack
- networks, servers, applications, services, power, environment,
etc...
- Monitors across all perspectives
- discovery, configuration, availability, performance, events,
alerts, etc...
- Affordable and easy to use
- unlike the big suites offered by IBM, HP, BMC, CA, etc...
- unlike first generation open source tools...
- Complete open source package
- complete solution available as free, open source software
ZABBIX is a 24×7
monitoring
solution without high cost.
ZABBIX is software that monitors numerous parameters of a network and the health
and integrity of servers. ZABBIX uses a flexible notification mechanism that
allows users to configure e-mail based alerts for virtually any event. This
allows a fast reaction to server problems. ZABBIX offers excellent reporting
and data visualization features based on the stored data. This makes ZABBIX
ideal for capacity planning.
ZABBIX supports both polling and trapping. All ZABBIX reports and statistics,
as well as configuration parameters are accessed through a
web-based front end. A web-based
front end ensures that the status of your network and the health of your servers
can be assessed from any location. Properly configured, ZABBIX can play an important
role in monitoring
IT
infrastructure. This is equally
true for small organizations with a few servers and for large companies with
a multitude of servers.
To better understand Splunk Base, look no further than the online encyclopedia
Wikipedia.
Like Wikipedia, Splunk Base provides a global repository of user-regulated
information, but the similarities end there. Splunk Inc. will formally unveil
Splunk Base this week at the LinuxWorld 2006 Conference for all to see its free-of-charge
community stockpiled error messages and troubleshooting tips for IT professionals
from IT professionals -- for any system they can get their hands on.
At the head of this community effort is Splunk's chief community Splunker
Patrick McGovern, who picked up much of his community experience while working
with developers when he managed the open source project repository SourceForge.net.
Now at Splunk, McGovern manages Splunk Base, a global wiki of IT events that
grants IT workers access to information about specific events recorded by any
application, system or device.
|
Six of the leading open source systems management vendors are to announce
that they have created a new consortium to further the adoption of open source
systems management software and develop open standards.
The Open Management Consortium has been founded by
a group of open source
systems management and monitoring players, including
- Qlusters Inc,
- Emu Software Inc,
- Zenoss Inc,
- Symbiot Inc,
- the Webmin project, a
- nd Ayamon LLC, the consultancy company of Nagios creator, Ethan Galstad.
Eyeing systems management as the next big market to "go open source," Zenoss,
Inc. is now trying to give mid-sized customers another alternative beyond the
two main choices available so far: massive suites from the "Big Four" giants
or a mishmash of specialized point solutions.
"We're focusing on the IT infrastructures of the 'mid-market.' These aren't
'Mom and Pops.' They're organizations with about 50 to 5,000 employees, or $50
million to $500 million in revenues," said Bill Karpovich, CEO of the software
firm
Earlier in May, the Zenoss, Inc.-sponsored Zenoss Project joined hands with
Webmin, the Emu Software-sponsored NetDirector, and several other open source
projects to form the Open Management Consortium (OMC).
Right now, a lot of mid-sized companies and not-for-profits are still struggling
to string together effective systems management approaches with specialized
tools such as WhatsUp Gold and Ipswitch's software.
Historically, organizations in this bracket have been largely ignored
by the "Big Four"--IBM, Hewlett-Packard, BMC, and Computer Associates, according
to Karpovich.
"These companies have concentrated mainly on the Fortune 500, and their
suites are very heavy and expensive," Karpovich charged, during an interview
with LinuxPlanet.
But Karpovich anticipates that the Big Four could start to widen their scope
quite soon, spurred by analysts' projections of stellar growth in the systems
management space.
Mercy Hospital, a $400 million health care facility in Baltimore, is one
medium-sized organization that has already turned down overtures from a Big
Four vendor in favor of Zenoss.
"We'd been using a hodgepodge of tools from different vendors," according
to Jim Stalder, the hospital's CIO, who cited SolarWinds and Cisco as a couple
of examples.
But over the past few years, Mercy's IT mainly Windows-based infrastructure
has expanded precipitously, Stalder maintained, in another interview.
Mercy chose Zenoss over a Big Four alternative mostly on the basis of cost,
according to the hospital's CIO.
Zenoss doesn't charge for its software, which is offered under GPL
licensing. Karpovich said. Instead, its revenue model is built around
professional services--including customization, integration, staff training,
and best practices consulting -- and support fees.
Alternatively, organizations can "use their own resources" or hire other
OMC partners or other third-party consultants for professional services.
Zenoss users can also customize the software code for integration or other
purposes.
"We used to have 100 servers, but now we have close to 200," Stalder said.
"Mercy has done a good job of embracing (advancements in) health care IT. But
sometimes your staffing budget doesn't grow as linearly as your infrastructure.
And it got difficult to keep tabs on all these servers with fewer (IT) people
on hand."
Also according to Karpovich, many organizations--particularly in the
midrange tier--don't need all of the features offered in the IBM/HP/BMC/CA suites.
As inspiration behind Xenoss' effort, he pointed to the success of JBoss
in the open source application server market, EnterpriseDB and Postgres among
databases, and SugarCRM in the CRM arena.
"All of these markets have been moving to open source one by one. And they've
all been turned on their heads by really strong vendors. We expect that systems
management will be the next place where open source has a big impact, and we
want to lead the charge," he told LinuxPlanet.
"We want to do something that's somewhere 'in the middle,' offering a very
rich solution with enterprise-grade monitoring at a price mid-sized organizations
can afford."
Karpovich maintained that, to step beyond "first-generation" open source
tools, Zenoss replaces the traditional ASCII interface with a template-enabled
GUI geared to easy systems configurability.
The system also provides autodiscovery and many other features also found
in pricier systems.
Zenoss revolves around four key modules: inventory configuration; availability
monitoring; performance monitoring; and event management.
The inventory configuration module contains its own autopopulated database.
"This is not just an ASCII. We've built a database that understands relationships.
For a server, for example, this means, 'What are patches?' There's a real industry
trend around ITIL, and we are doing that. A lot of commercial vendors are also
talking about CDMD, and we'll be pushing that back toward open source," according
to Karpovich.
The available monitoring in Zenoss is designed to assure that applications
"are 'up' and responding," he told LinuxPlanet.
The performance monitoring module makes it possible to track metrics such
as disk space over time, and to generate user configurable threshold-based alerts.
The event management capability, on the other hand, offers a centralized
area for consolidating events. "Every Windows server has event logging. But
we let you bring together events (from multiple servers) and prioritize them,"
according to the Zenoss CEO.
For his part, Mercy Hospital's Stalder is mainly quite satisfied with Zenoss.
"So far, so good. This represented a major savings opportunity for us, and we
wouldn't have used a fraction of the features in a (Big Four) suite," he told
LinuxPlanet.
"We went live (with Zenoss) in early April, and got it up and running very
quickly. We've been able to turn off several other tools, as a result. And Zenoss
has shown us several (IT infrastructure) problems we weren't even aware existed,"
he said.
For example, in rolling up the logs of its SQL Server databases, Mercy found
out that several databases weren't being backed up properly.
The hospital did need to turn on the SNMP in its servers to get autoduscovery
to work. "But this was only because we'd never turned it on before," he added.
Yet Stalder did point to a couple of features on his future wish list for
Zenoss. He'd like the software to include notification escalation--"so
that if Joe doesn't respond to his pager, you can reach him somewhere else"--as
well as a "synthetic transaction generator," to "emulate how the application
appears to a user logging on."
But Karpovich readily admits that there's room for more functionality in
the Zenoss environment. In fact, that's one of the main reasons behind the decision
to join other open source ISVs in founding the OMC, he suggested.
"With our partners, we're building an ecosystem around products and systems
integration," he told LinuxPlanet. "We haven't yet decided yet where all of
us will fit. But we want to provide (customers) with all that they need for
systems management. In areas where we don't have standards for integration,
we can collaborate on integration."
Other founding members of the Open Management Consortium include Nagios,
an open source project sponsored by Ayamon; openQRM, sponsored by Qlusters;
and openSIMS, sponsored by Symtiog.
The consortium also plans to create a "systems integration repository around
best practives for sharing instrumentation," Karpovich said.
"The business model is kind of like that of SugarCRM. Partners will build
their own businesses selling services. Then, if one of their customers wants
Zenoss, for example, the partner will get a commission," he elaborated.
But Zenoss will also do its best to avoid the bloatware phenomenon
associated with the Big Four suites, according to Karpovich.
"One of the things people don't like about the 'Big Four' is that if they
don't buy capabilities now, it will cost them more later. With Zenoss, you're
not under that kind of pressure," the CEO told LinuxPlanet.
|
BixData addresses the major areas of management and monitoring.
System Management
- Excels at the retrieval of important system information and the modification
of settings for critical services and installed software
Application monitoring
- Monitor critical aspects of applications and their performance. Through
support for WMI on the Windows platforms, full application monitoring of
any major Windows server application is supported, such as Exchange Server
mail boxes or SQL server connection pools. It also supports .NET application
monitoring
Network monitoring
- Supports monitoring of any device with SNMP
Performance monitoring
- Monitors critical operating system, hardware and software performance,
including memory, processor, network and storage and specific application
usage of resources
Hardware monitoring
- Native support for SMART hard disk information that includes monitoring
of ATA, Serial ATA and SCSI hard disks
Host Grapher is a very simple collection of Perl scripts that provide graphical
display of CPU, memory, process, disk, and network information for a system.
There are clients for Windows, Linux, FreeBSD, SunOS, AIX and Tru64. No socket
will be opened on the client, nor will SNMP be used for obtaining the data.
Author: Falko Timme <ft [at] falkotimme [dot] com>
Last edited 04/20/2006In this article I will describe how to monitor your
server with munin and monit. munin produces nifty little graphics about nearly
every aspect of your server (load average, memory usage, CPU usage, MySQL throughput,
eth0 traffic, etc.) without much configuration, whereas monit checks the availability
of services like Apache, MySQL, Postfix and takes the appropriate action such
as a restart if it finds a service is not behaving as expected. The combination
of the two gives you full monitoring: graphics that lets you recognize current
or upcoming problems (like "We need a bigger server soon, our load average is
increasing rapidly."), and a watchdog that ensures the availability of the monitored
services.
Among the network-management start-ups that received second rounds of funding:
|
Company |
Product/description |
Latest funding |
|
Cittio |
WatchTower – enterprise monitoring and management software. |
March 2006 – $8 million from JK&B Capital, Hummer Winblad Venture Partners. |
|
GroundWork Open Source Solutions |
GroundWork Monitor Professional – IT monitoring tool based on open source
software. |
March 2005 – $8.5 million from Mayfield, Canaan Partners. |
|
LogLogic |
LogLogic 3 – appliance that aggregates and stores log data. |
September 2004 – $13 million from Sequoia Capital, Telesoft Partners
and Worldview Technology Partners. |
|
Splunk |
Splunk – downloadable software to search logs generated by hardware
and software. |
January 2006 – $10 million from JK&B Capital |
[Apr 10, 2006]
moodss
Moodss is a modular monitoring application, which supports operating systems
(Linux, UNIX, Windows, etc.), databases (MySQL, Oracle, PostgreSQL, DB2, ODBC,
etc.), networking (SNMP, Apache, etc.), and any device or process for which
a module can be developed (in Tcl, Python, Perl, Java, and C). An intuitive
GUI with full drag'n'drop support allows the construction of dashboards with
graphs, pie charts, etc., while the thresholds functionality includes emails
and user defined scripts. Monitored data can be archived in an SQL database
by both the GUI and the companion daemon, so that complete history over time
can be made available from Web pages or common spreadsheet software. It can
even be used for future behavior prediction or capacity planning, from the included
predictor tool, based on powerful statistical methods and artificial neural
networks.
Big Sister is Perl-based, SNMP-aware monitoring program consisting of a Web-based
server and a monitoring agent. It runs under various Unixes and Windows.
Monit is a utility for managing and monitoring processes, files, directories,
and devices on a Unix system. It conducts automatic maintenance and repair and
can execute meaningful causal actions in error situations. It can be used to
monitor files, directories, and devices for changes, such as timestamps changes,
checksum changes, or size changes. It is controlled via an easy to configure
control file based on a free-format, token-oriented syntax. It logs to syslog
or to its own log file and notifies users about error conditions via customizable
alert messages. It can perform various TCP/IP network checks, protocol checks,
and can utilize SSL for such checks. It provides an HTTP(S) interface for access.
[Dec 8, 2005]
Zabbix by Alexei Vladishev
- Not exactly Perl (written in PHP+C) but still an interesting product...
About: Zabbix
is software that monitors your servers and applications. Polling and trapping
techniques are both supported. It has a simple, yet very flexible notification
mechanism, and a Web interface that allows quick and easy administration. It
can be used for logging, monitoring, capacity planning, availability and performance
measurement, and providing the latest information to a helpdesk.
Changes: This
release introduces automatic refresh of unsupported items, support for SNMP
Counter64, new naming schema for ZABBIX agent's parameters, more flexible user-defined
parameters for UserParameters, double sided graphs, configurable refresh rate,
and other enhancements.
[»]
user comment on ZABBIX
by
LEM - Nov 17th 2004 05:07:23
Excellent
_product_:
. easy to install and configurue
. easy to custom
. easy to use
. very good functional level (multiple maps, availability, trigger/alerts
dependancies, SLA calculation)
. use very few ressources
I've been using ZABBIX to monitor about 500 éléments (servers, routers,
switches...) in a heterogenous environment (windows, unices, snmp-aware
equipements).
An excellent alternative to Nagios and MoM+Minautore.
[»]
Best network monitor I 've seen
by
robertj - Feb 7th 2003 15:29:38
This is a GREAT project. Best monitor
I've seen. Puts the Big Brother monitoring to shame.
|
|
|
[Dec 1, 2005]
MoSSHe (MOnitoring
with SSH Environment). Python based simple, lightweight (both in size and system
requirements) server monitoring package designed for secure and in-depth
monitoring of a number of typical Internet systems
MoSSHe (MOnitoring with SSH Environment) is a simple,
lightweight (both in size and system requirements) server monitoring package
designed for secure and in-depth monitoring of a number of typical Internet
systems.
It was developed to keep the impact on network and performance low, and to
use a safe, encrypted connection for in-depth inspection of the system checked.
It is not possible to remotely run (more or less arbitrary) commands via the
monitoring system, nor is unsafe cleartext SNMP messaging necessary (yet possible).
A read-only Web interface makes monitoring and status checks simple (and safe)
for admins and helpdesk.
Checking scripts are included for remote services (DNS, HTTP, IMAP2, IMAP3,
POP3, samba, SMTP, and SNMP) and local systems (disk space, load, CPU temperature,
fan speed, free memory, print queue size and activity, processes, RAID status,
and shells).
SEC is an open source and platform independent
event correlation tool that was designed to fill the gap between commercial
event correlation systems and homegrown solutions that usually comprise a few
simple shell scripts. SEC accepts input from regular files, named pipes, and
standard input, and can thus be employed as an event correlator for any application
that is able to write its output events to a file stream. The SEC configuration
is stored in text files as rules, each rule specifying an event matching
condition, an action list, and optionally a Boolean expression whose truth value
decides whether the rule can be applied at a given moment.
Regular expressions, Perl subroutines, etc. are
used for defining event matching conditions. SEC can produce output events by
executing user-specified shell scripts or programs (e.g., snmptrap or
mail), by writing messages to pipes or files, and by various other means.
[Apr 10, 2004]
RRDutil
RRDutil is a a tool to collect statistics (typically every
5 minutes) from multiple servers, store the values in RRD databases (using RRDtool),
and plot out pretty graphs to a Web server on demand. The graph types shown
include CPU, memory, disk (space and I/O), Apache, MySQL queries and query types,
email, Web hits, and more.
Faced with an increasing number
of deployed Linux servers and no budget for commercial monitoring tools, our
company looked into open-source solutions for gathering performance and security
information from our Unix environment. There are many open-source monitoring
packages to choose from, including
Big Sister and
Nagios
to name a few. Though some of these try to provide an all-in-one solution, I
knew we would probably end up combining a few tools to obtain the metrics we
were looking for. This article is meant to give a general overview of
the steps in building a monitoring solution. Take a look at the
demo here which is a scaled down model of our production monitoring portal.
Required Packages
I started out with a base Red Hat ES 3.0 installation
but any flavor of Linux will work. Depending on your distro, some of the above
required packages might be already installed, particularly libpng, zlib and
gd. You can check if any of these are installed by issuing the following from
the command line;
rpm –qa | grep packagename
I selected
MRTG
(Multi-Router Traffic Grapher) for the base statistics engine. This tool is
mainly used for tracking statistics on network devices but it can be easily
modified to track performance metrics on your Unix or Windows servers. The instructions
for installing MRTG on Unix can be found
here. The gd, libpng and zlib packages are required to be compiled and installed
before MRTG can be fired up. Even though you might have already installed them,
if you try to compile MRTG with the default package installations, it will probably
complain about various things including GD locations. For your sanity, you’ll
want to install these packages from scratch using the instructions from the
MRTG website since they require specific "--" options when compiled. If you're
feeling creative, you can also rebuild the SRPM's from source. Be sure to exclude
these packages in the Up2date or Yum configuration files since when updates
to these packages become available, the "update" application will overwrite
your custom RPM's.
RRDTOOL is used as a backend
database to store statistics gathered from MRTG. By default, MRTG stores data
in text files which it gathers through SNMP. This method is fine for a few servers
but when your environment starts growing, you’ll need a faster method of reading
and storing data. RDDTool (Round Robin Database) enables storage of server statistics
into a compact database. Future versions of MRTG are going to use this format
by default so you might as well start using it now.
Angelfire is great front-end
tool for monitoring servers via ICMP and services running over TCP. This Perl
program runs from CRON and generates a HTML table which contains the status
of your devices. Color bars represent the status of the server. (Green=GOOD
: Yellow=LATENCY >100ms : Red=UNREACHABLE).
For Apache, I used the default installation that
comes with Red Hat. No need to install a fresh copy plus it will be easier to
maintain for updates using RHN.
Proactive security checks are
a mandatory part of system administration these days. Nessus is a great vulnerability
scanner plus the HTML output options makes incorporating this into the portal
very easy.
When using Linux in a business
environment, it's important to monitor resource utilization. System monitoring
helps with capacity planning, alerts you to performance problems, and generally
makes managers happy.
So, in this month's "Tech Support," let's install
Cacti, a resource monitoring application that utilizes RRDtool
as a back-end. RRDTool stores and displays time-series data, such as network
bandwidth, machine-room temperature, and server load average. With Cacti and
RRDtool, you can graph system performance in a way that will not only make it
more useful, it'll also impress your pointy-haired boss.
Start with RRDtool. Written by Tobi Oetiker (of
MRTG fame) and licensed under the GNU General Public License (GPL),
you can download RRDtool from
http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/download.html. Build
and install the software with:
$ ./configure; make
# make install; make site-perl-install
To ease upgrades, you should also link
/usr/local/rrdtool to the /usr/local/rrdtool-version directory
created by make install.
Now that you have RRDtool installed, you're ready
to install Cacti. Cacti is a complete front-end to RRDtool (based on PHP
and MySQL) that stores all of the information necessary to create and
populate performance graphs. Cacti utilizes templates, supports multiple graphing
hierarchies, and has its own user-based authentication system, which allows
administrators to create users and assign them different permissions to the
Cacti interface. Also licensed under the GPL, Cacti can be downloaded from
http://www.raxnet.net/products/cacti.
The first step to install Cacti is to unpack
its tarball into a directory accessible via your web server. Next, create a
MySQL database and user for Cacti (this article uses
cacti as the database name).
Optionally, you can also create a system account to run Cacti's cron
jobs.
Once the Cacti database is created, import
its contents by running mysql cacti < cacti.sql.
Depending on your MySQL setup, you may need to supply a username and password
for this step.
After you've imported the database, edit
include/config.php and specify your Cacti MySQL database information.
Also, if you plan to run Cacti as a user other than the one you're installing
it as, set the appropriate permissions on Cacti's directories for graph/log
generation. To do this, type chown cactiuser
rra/ log/ in the Cacti directory.
You can now create the following cron
job...
*/5 * * * * /path/to/php /path/to/www/cacti >
/dev/null > 2&1
... replacing
/path/to/php with the
full pathname to your command-line PHP binary and
/path/to/www/cacti with
the web accessible directory you unpacked the Cacti tarball into.
Now, point your web browser to
http://your-server/cacti/
and login with the default username and password of
admin and
admin. You must change the
administrator password immediately. Then, make sure you carefully fill in all
of the path variables on the next screen.
By default, Cacti only monitors a few items,
such as load average, memory usage, and number of processes. While Cacti comes
pre-configured with some additional data input methods and understands
SNMP if you have it installed, its power lies in the fact that you can graph
data created by an arbitrary script. You can find a list of contributed scripts
at
http://www.raxnet.net/products/cacti/additional_scripts.php, but you can
easily write a script for almost anything.
To create a new graph, click on the "Console"
tab and create a data input method to tell Cacti how to call the script and
what to expect from it. Next, create a data source to tell Cacti how
and where the data is stored, and create a graph to tell Cacti how to
display the data. Finally, add the new graph to the "Graph View" to see the
results.
While Cacti is a very powerful program, many
other applications also utilize the power of RRDtool, including Cricket,
FlowScan, OpenNMS, and SmokePing. Cricket is a high performance,
extremely flexible system for monitoring trends in time-series data. FlowScan
analyzes and reports on Internet Protocol (IP) flow data exported by routers
and produces graph images that provide a continuous, near real-time view of
network border traffic. OpenNMS is an open source project dedicated to the creation
of an enterprise grade network management platform. And SmokePing measures latency,
latency distribution, and packet loss in your network.
You can find a comprehensive list of front-ends
available for RRDtool at
http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/rrdworld. Using some
of these RRDtool-based applications in your environment will not only make your
life easier, it may even get you a raise!
What is Spong?
Spong is a simple systems and network monitoring
package. It does not compete with Tivoli, OpenView, UniCenter, or any other
commercial packages. It is not SNMP based, it communcates via simple TCP
based messages. It is written in Perl. It can currently run on every major Unix
and Unix-like operating systems.
Feaures
- client based monitoring (CPU, disk, processes,
logs, etc.)
- monitoring of network services (smtp, http,
ping, pop, dns, etc.)
- grouping of hosts (routers, servers, workstations,
PCs)
- rules based messaging when problems occur
- configurable on a host by host basis
- results displayed via text or web based
interface
- history of problems
- verbose information to help diagnosis problems
- modular programs to makes it easy to add
or replace check functions or features
-
Big Brother BBSERVER emulation
to allow Big Brother Clients to be used
Sample Spong Setup
This is my development Spong setup
on my home network. It is Spong version 2.7. There are a lot of new features
that have been added since verson 2.6f. But if you click on the "Hosts" Link
in the top frame, you will get a good feel of how Spong 2.6f looks and works.
License
Spong is free software issued released under
the GNU General
Public License or the
Perl
Artistic License. You may choice whichever license is appropriate for your
usage.
Documentation
Don't let the amount of documentation scare you,
I still think spong is simple to setup and use.
Documentation for Spong is included with every
release. For version 2.6f, the documentation is in HTML format located in the
www/docs/ directory and is self contained (the links will still work if you
move it), so you should be able to copy it to whatever location that you want.
An online copy of the documentation is available
here.
The documentation for Spong 2.7. is not complete.
It is under going a complete rewrite into POD formation. This change will enable
the documentation to converted into a multitude of different formats (i.e. HTML,
man, text, etc.).
Release Notes / Changes
The CHANGE file for each release functions are
the Release Notes and Change Log for each verion of Spong. The
CHANGES file for Spong 2.6f is available
here and the
CHANGES file for Spong 2.7 is available
here.
Argus is a system and network monitoring application. It will
monitor nearly anything you ask it to monitor (TCP + UDP applications, IP connectivity,
SNMP OIDS, etc). It presents a clean, easy-to-view Web interface. It can send
alerts numerous ways (such as via pager) and can automatically escalate
if someone falls asleep.
In case of broken links
please try to use Google search. If you find the page please notify
us about new location
This far more comprehensive page that this one but with slightly
different focus, although host monitoring and network monitoring now by-and-large
overlap.
This is a list of tools used for Network (both LAN and WAN)
Monitoring tools and where to find out more about them. The audience is mainly
network administrators. You are welcome to provide links to this web page. Please
do not make a copy of this web page and place it at your web site since it will
quickly be out of date.
Argus is a system and network monitoring application.
It will monitor nearly anything you ask it to monitor (TCP + UDP applications,
IP connectivity, SNMP OIDS, etc). It presents a clean, easy-to-view Web interface.
It can send alerts numerous ways (such as via pager) and can automatically escalate
if someone falls asleep.
Big
Sister Big Sister
is an SNMP-aware monitoring program consisting of a Web-based server and a monitoring
agent
Big Sister is an SNMP-aware monitoring
program consisting of a Web-based server and a monitoring agent. It runs under
various Unixes and Windows. Big Sister does for you:
- monitor networked systems
- provide a simple view on the current network status
- notify you when your systems are becoming critical
- generate a history of status changes
- log and display a variety of system performance data
Sys Admin > Using Email to Perform UNIX System Monitoring and Control
[Sep 13, 2004]
moodss Added:
Fri, May 8th 1998 03:34 PDT ; Updated: Mon, 02:00 C, Perl, Python,
TCL
Moodss is a modular monitoring application,
which supports operating systems (Linux, UNIX, Windows, etc.), databases (MySQL,
Oracle, PostgreSQL, DB2, ODBC, etc.), networking (SNMP, Apache, etc.), and any
device or process for which a module can be developed (in Tcl, Python, Perl,
Java, and C).
An intuitive GUI with full drag'n'drop
support allows the construction of dashboards with graphs, pie charts, etc.,
while the thresholds functionality includes warning by emails and user defined
scripts. Any part of the visible data can be archived in an SQL database by
both the GUI and the companion daemon, so that complete history over time can
be made available from Web pages, common spreadsheet software, etc.
Homepage:
http://moodss.sourceforge.net/
MoSSHe (MOnitoring
with SSH Environment) is a simple, lightweight (both in size and system requirements)
server monitoring package designed for secure and in-depth monitoring
of a number of typical Internet systems. Written in Python
MoSSHe (MOnitoring with SSH Environment) is a simple, lightweight (both in
size and system requirements) server monitoring package designed
for secure and in-depth monitoring of a number of typical Internet systems.
It was developed to keep the impact on network and performance low, and to use
a safe, encrypted connection for in-depth inspection of the system checked.
It is not possible to remotely run (more or less arbitrary) commands via the
monitoring system, nor is unsafe cleartext SNMP messaging necessary (yet possible).
A read-only Web interface makes monitoring and status checks simple (and safe)
for admins and helpdesk. Checking scripts are included for remote services (DNS,
HTTP, IMAP2, IMAP3, POP3, samba, SMTP, and SNMP) and local systems (disk space,
load, CPU temperature, fan speed, free memory, print queue size and activity,
processes, RAID status, and shells).
Commercial Monitoring Tools
Linux, UNIX system
Monitoring - Bash shell scripts directory
nPULSE is a Web-based network monitoring package for Unix-like
operating systems. It can quickly monitor up to thousands of sites/devices at
a time on multiple ports. nPULSE is written in Perl and comes with its own (SSL
optional) Web server for extra security.
Sentinel System
Monitor
Sentinel System Monitor is a plugin-based, extendable remote
system monitoring utility that focuses on central management and flexibility
while still being fully-featured. Stubs are used to allow remote monitoring
of machines using probes. Monitoring can support multiple architectures because
the monitoring probes are filed by a library process that hands out probes based
on OS/arch/hostname. Execution of blocks can be triggered by either test failure
or success.
It uses XML for configuration and OO Perl for most programming.
Support for remote command execution via plugins allows reaction blocks to be
created that can try and repair possible problems immediately, or just notify
an administrator that there is a problem.
Open (SourceSystem)
Monitoring and Reporting Tool
OpenSMART is a monitoring (and reporting) environment for servers
and applications in a network. Its main features are a nice Web front end, monitored
servers requiring only a Perl installation, XML configuration, and good documentation.
It is easy to write more checks. Supported platforms are Linux, HP/UX, Solaris,
*BSD, and Windows (only as a client).
InfoWatcher
InfoWatcher is a system and log monitoring program written in Perl. The major
components of InfoWatcher are SLM and SSM. SLM is a log monitoring and filter
daemon process which can monitor multiple logfiles simultaneously, and SSM is
a system/process monitoring utility that monitors general system health, process
status, disk usage, and others. Both programs are easily configurable and extensible.
Network And Service Monitoring System
Network and Service Monitoring System is a tool for assisting network administrators
in managing and monitoring the activities of their network. It helps in getting
the status information of critical processes running at any machine in the network.
It can be used to monitor the bandwidth usage of individual machines in the
network. It also performs checks for IP-based network services like POP3, SMTP,
NNTP, FTP, etc., and can give you the status of the DNS server. The system uses
MySQL for storing the information, and the output is displayed via a Web interface.
|
Kane Secure Enterprise
(http://www.intrusion.com/products/technicalspec.asp?lngProdNmId=3)
should
do everything you require, I also suggest you check out Andy's great IDS
site (www.networkintrusion.co.uk) (that's another fiver you owe me, Andy
the best I can recommend is medusa DS9. it's configurable and makes machine
secure. the computer with medusa using old bind (ver 8) and old sendmail (ver
8.10??) with no patches. it runs linux 2.2.5. machine was not rooted for nearly
two years...
medusa homepage:
http://medusa.terminus.sk
http://medusa.fornax.sk
GMem 0.2
Gmem is a tool to monitor the memory usage of your system using GTK progress
bars and uptime using the proc filesystem. It's configurable and user friendly.
The goal of the Benson Distributed Monitoring System project is to make a
distributed monitoring system with the extensibility and flexibility of mod_perl.
The end goal is for system administrators to be able to script up their own
alerts and monitors into an extensible framework which hopefully lets them get
sleep at night. The communication layer uses standard sockets, and the scripting
language for the handlers is Perl. It includes command line utilities for sending,
listing, and acknowledging traps, and starting up the benson system. There is
also a Perl module interface to the benson network requests.
Network And Service Monitoring System
Network and Service Monitoring System is a tool for assisting network administrators
in managing and monitoring the activities of their network. It helps in getting
the status information of critical processes running at any machine in the network.
It can be used to monitor the bandwidth usage of individual machines in the
network. It also performs checks for IP-based network services like POP3, SMTP,
NNTP, FTP, etc., and can give you the status of the DNS server. The system uses
MySQL for storing the information, and the output is displayed via a Web interface.
Author:
Sreehari Nair
[contact developer]
monfarm
Monfarm is an alarm-enabled monitoring system for server farms. It produces
dynamically updated HTML status pages showing the availability of servers. Alarms
are generated if servers become unavailable.
Sentinel
System Monitor
Sentinel System Monitor is a plugin-based, extendable remote system monitoring
utility that focuses on central management and flexibility while still being
fully-featured. Stubs are used to allow remote monitoring of machines using
probes. Monitoring can support multiple architectures because the monitoring
probes are filed by a library process that hands out probes based on OS/arch/hostname.
Execution of blocks can be triggered by either test failure or success. It uses
XML for configuration and OO Perl for most programming. Support for remote command
execution via plugins allows reaction blocks to be created that can try and
repair possible problems immediately, or just notify an administrator that there
is a problem.
Open (Source|System)
Monitoring and Reporting Tool
A monitoring tool with few dependencies, nice frontend, and easy extensibility.
Demarc PureSecure
An all-inclusive network monitoring client/server program and Snort frontend.
Percival Network
Monitoring System
AAFID2
Framework for distributed system and network monitoring
Copyright © 1996-2008 by Dr. Nikolai Bezroukov.
www.softpanorama.org was
created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time.
Submit
comments This document is an industrial compilation designed and created
exclusively for educational use and is placed under the copyright of the
Open Content License(OPL).
Original materials copyright belong to respective owners. Quotes are made
for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer:
- The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with.
- We do not warrant the correctness of the information provided or its
fitness for any purpose
- In no way this site is associated with or endorse cybersquatters
using
the term "softpanorama" with other main or country domains (e.g. softpanorama.com) with
bad faith intent to profit from the goodwill belonging to
someone else.
Last updated:
May 06, 2009
5 comments:
We are planing to update to nagios 3 wich is available in ubuntu 810.
There are some nice addons like http://www.nagvis.org/screenshots
The best asset for nagios in our case is that it's very easy to developp new plugins. We complement this with some centralized administrative tool which allow us to deploy new plugins or change parameters: cfengine (for *nix) or SCCM 2007 for MS.
yea I really like Nagios a lot. I developed the WebInject plugin for it to monitor websites. My plugin is pretty popular:
www.webinject.org
Still haven't tried Nagios 3 yet
http://www.slideshare.net/KrisBuytaert/opensource-monitoring-tool-an-overview?nocache=5601
Also, dude, the webinject forum isn't working: e.g.
http://www.webinject.org/cgi-bin/forums/YaBB.cgi?board=Users;action=display;num=1201702796