Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Tivoli TEC State Correlation Engine

News See also Recommended Links Selected Docs Installation Event Adapters Field Guides
Event Correlation Technologies         Humor Etc

IBM Tivoli Enterprise Console state correlation engine was introduced in TEC 3.8.  The idea was both to resolve the problem of low speed of the prolog-based correlation in TEC as well as provide more lightweight simple filtering mechanism (TEC rules are an overkill for simple tasks). Both goals were successfully met. The only problem is that this is a completely different engine, written in Java which uses XML-style configuration files.

One on the best descriptions is at Tivoli Field Guide - TEC 3.9 State Correlation Engine: How to Prevent TEC from Becoming Flooded  

The purpose of this field guide is to describe the functionality of the IBM Tivoli Enterprise Console state correlation engine introduced in ITEC version 3.8. It also outlines the configuration and design aspects as well as gives hints for installing and troubleshooting. Some case studies from different customers are discussed at the end.

Tivoli Event Integration Facility User's Guide - Rules

Correlation is achieved with state-based and stateless rules. You specify these rules by using XML syntax, defined by the supplied DTD file, tecsce.dtd. The rules also have non-XML elements that define the associated rule predicates. The location of the default XML file is $BINDIR/TME/TEC/default_sm/ tecroot.xml. This same directory also contains other samples of state correlation XML files. These files are only found on the system where you installed Tivoli Event Integration Facility or the Adapter Configuration Facility. They are not distributed with the default profile to other systems. For more information about the additional sample files, see the readme.txt file in the same location.

Note:
Rules in state correlation are not the same as IBM Tivoli Enterprise Console rules.

You define each rule in a state machine. The state machine gathers and summarizes information about a particular set of related knowledge. It is composed of states, transitions, summaries, and other characteristics, such as expiration timers and control flags.

State-based rules are the following: duplicates, threshold, and collector, all based on state machines. Each state machine looks for a trigger event to start it. Additionally, there is the matching rule, which is a stateless rule.

State-based rules rely on a history of events, whereas the stateless rules operate on a single, current event. Rules are specified by the following:

Predicates

A predicate in the predicate library consists of a boolean operator and zero or more arguments. Each argument can be a predicate returning the following:

Table 12. Predicate types and examples
 

Predicate Type Example
Boolean value Equality
Function returning a value Addition
Event attribute &hostname
Constant The string foobar

See "Predicate Library" for more information.

Actions

The two actions for state correlation are the Discard action and the TECSummary action. These actions support a common, optional boolean attribute, named singleInstance. If this attribute is false, the action is not shared among different rules. Thus, one instance of the action is created for every rule that triggers it. This is the default behavior. If the attribute is true, a single instance of the action is created and shared among all rules that trigger it.

The Discard action explicitly discards an event when a state machine is triggered. Thus, the event is not forwarded. This action has no arguments. The following XML fragment shows an example with the Discard action:

<rule id="root.match_discard_tec_notice">
  	 <eventType>TEC_Notice</eventType>
  	 <match>
      		<predicate>
         			<![CDATA[
             				# always succeeds
             				true
         			]]>
      		</predicate>
   	</match>
   	<triggerActions>
      		<action function="Discard" singleInstance="true"/>
   	</triggerActions>
</rule>

The summary action, TECSummary, compacts a list of correlated events into the summary event, which is then sent to the event server. The action packs all the events that match a specific rule in a single event. Additionally, this action has an optional msg parameter. The msg parameter specifies the value of the msg attribute to be added to the TECSummary event. The msg attribute acts as an identifier for different types of the TECSummary events. Thus, you can use the msg attribute as a means to identify events easily in the event console.

If the event is generated using more than one source event, the repeat_count attribute is added to it. It then contains the number of events that were originally processed. Also, if the original events already had a repeat_count attribute, their values are preserved by adding them to the final repeat_count value of the summary. For example, the following events are received:

EVENT;repeat_count=3;msg=event1;
EVENT;repeat_count=5;msg=event2;
EVENT;msg=event3;

The generated summary has a repeat_count of the following:

repeat_count = 3 + 5 + 1 = 9

The following XML fragment shows how to configure the TECSummary action:

<rule id="root.duplicate_tec_db">
   	<eventType>TEC_DB</eventType>
   	<duplicate timeInterval="10000">
      		<cloneable attributeSet="sql_code"/>
      		<predicate>
         			<![CDATA[
            				# If we reach this point then
            				# the sql_code is already duplicated
            				# because it is used as a cloneable
            				# parameter.
            				true
         			]]>
      		</predicate>
   	</duplicate>
   	<triggerActions>
      		<action function="TECSummary" singleInstance="false">
         			<parameters>
            				SET:msg=root.duplicate_tec_db.summary
         			</parameters>
      		</action>
   	</triggerActions>
</rule>

Attributes Common to All Rules

The following are attributes common to all rules:

id
Specifies the identifier for each rule. It must be unique within the correlation engine where it is registered. Periods are treated as directories. For example, if you have the id test.threshold, you cannot have another rule with test.threshold.1 as the identifier.

 

eventType
Specifies the set of event classes this rule applies to and optimizes performance. When you omit this parameter, state correlation applies the rule to all event classes.

Matching Rules

Matching rules are stateless. They perform passive filtering on the attribute values of an incoming event. A matching rule consists of a single predicate; if the predicate evaluates to true, the trigger actions, which are specified in the rule, run. The following is an example of the rule:

<!--
Match all heartbeat events for my hostname
that have msg="please match me".
-->
<rule id="test.match" >
    <eventType>TEC_Heartbeat</eventType>
    <match>
      <predicate>
        <![CDATA[ 
           &msg == "please match me" &&
           &hostname == "hostname1"
           ]]>
      </predicate>
    </match>
</rule>

Duplicates Rules

The duplicates rule blocks the forwarding of duplicate events within a time interval . It requires these arguments:

Figure 4 shows the state transitions for the duplicate rule:

Figure 4. State transitions for the duplicate rule

Graphic of a duplicate rule
 

In Figure 4, state one is the initial state. Transition 1 occurs when there is a match on an incoming event. At that time, state correlation forwards the matching event, and the timer starts. Transition 2 occurs when the time interval expires, and the state machine resets. The following is an example of the rule:

<!--
Show me only the first error number 10
for my hostname that happens each 10
seconds.
-->
<rule id="test.duplicate" >
    <eventType>TEC_Error</eventType>
    <duplicate timeInterval="10000">
      <predicate>
        <![CDATA[ 
           &msg == "internal error on my adapter" &&
           &hostname == "hostname1" &&
           &errno = 10
          ]]>
      </predicate>
    </duplicate>
</rule>

Threshold Rules

The threshold rule looks for n occurrences of an event within a time interval . When the threshold is reached, it sends events to the defined actions. The threshold rule requires the following parameters:

Figure 5 and Figure 6 show the operation of the threshold rule with timeIntervalMode=fixedWindow specified.

Figure 5. State transitions for the basic threshold rule

Graphic of a threshold rule
 

Figure 5 shows the state machine for the modes FIRST_EVENT, LAST_EVENT, and ALL_EVENTS. Transition 1 occurs when state correlation detects the trigger event (trigger predicate matches). Transition 2 takes place when an incoming event matches the second predicate. When the time interval expires, transition 3 occurs and the state machine resets. Transition 4 resets the state machine after the threshold is reached. When the state SN is reached, either the first event, the last event, or all n events are sent before resetting.

Figure 6. State transitions for the threshold rule using FORWARD_EVENTS

Graphic of a threshold rule with FORWARD_EVENTS
 

In FORWARD_EVENTS mode (Figure 6), the threshold rule operates as in the previous case. Except, it sends all events matching the second predicate after the threshold is reached and until the time interval expires.

When the state machine has timeIntervalMode=slidingWindow specified, the operation of the threshold rule is the same as the fixedWindow time interval. Except that from each node K, there is a transition of 1, 2, .., K-1. This transition accounts for events that are not in the sliding time window. The following is an example of the rule:

<!--
I'm only interested when at least 5 Node_Down
events for hostnames in my local subnet happen
within 1 minute.
-->
<rule id="test.threshold">
    <eventType>Node_Down</eventType>
    <threshold thresholdCount="5" timeInterval="60000"
    timeIntervalMode="slidingWindow" triggerMode="allEvents">
      <predicate>
         <![CDATA[
            (&msg == "node down") &&
            (isMemberOf(&hostname, [ 192.168./16 ]))
            ]]>
      </predicate>
    </threshold>
</rule> 

Threshold rules can also define complex aggregate values, instead of a simple count of events. Use the aggregate configuration tag to define this rule. You can construct an aggregate value similar to the definition of a predicate. But instead of a simple true or false result, define a progressive value using the functions listed in Appendix D, Predicates and Functions for State Correlation. Threshold rules with aggregate values trigger only when the aggregate value is equal or greater than the thresholdCount value. The following is an example of the rule:

<!--
If I receive a slot value with a relative percentage between
0 and 1, but I want to check my threshold using the normal
percentage value of 100%, I can define an aggregate of the
slot relative_percentage, by multiplying it by 100 and counting
all percentages until it reaches 100%.
-->
<rule id="test.aggregate_threshold">
    <eventType>Temperature_Variation</eventType>
    <threshold 
        thresholdCount="100" 
        timeInterval="2000"
        triggerMode="allEvents"
        timeIntervalMode="fixedWindow" >
      <aggregate>
         <![CDATA[
           &relative_percentage * 100
           ]]>
      </aggregate>
      <predicate>true</predicate>
    </threshold>
  </rule> 

Collector Rules

The collector rule gathers events that match the given predicate for a specified period of time . The rule triggers when the timer expires and sends all collected events to the defined actions. The collector rule requires these arguments:

Figure 7 shows the state transitions for the collector rule:

Figure 7. State transitions for the collector rule

Graphic of collector rule
 

In Figure 7, S1 is the initial state. Transition 1 occurs when there is a match on an incoming event; the initial event is not sent but collected. A timer is set to the specified interval. Before the timer expires, all incoming and matching events are collected (transition 2). Transition 3 occurs when the time interval expires, and the state machine resets. At this time, all collected events are sent. The following is an example of the rule:

<!--
Collects 10 seconds of Server_Down
events for my database.
-->
<rule id="test.collector">
    <eventType>Server_Down</eventType>
    <collector timeInterval="10000" >
      <predicate>
        <![CDATA[ 
            &servername == "my_database"
            ]]>
      </predicate>
    </collector>
  </rule>

Old News

Gulfsoft Adding Custom Actions

  Popular
Description: Creating Custom Actions for the TEC 3.9 State Correlation Engine (SCE) and Configuring the Windows Logfile Adapter to use the SCE
Version: | File size: 0 bytes
Added on: 04-May-2004 | Downloads: 1006

 

Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     

State Based Correlation Engine (SBCE) Performance

Determining the length of time a single event enters and leaves the State Based Correlation Engine

Determining tec_gateway and state based correlation performance

In large and small environments, it may be necessary to look at how much time it takes for the tec_gateway process to receive an event, pass it to state based correlation, get it back from state based correlation and send it on to the TEC Server

Tivoli Field Guide - TEC 3.9 State Correlation Engine: How to Prevent TEC from Becoming Flooded

The purpose of this field guide is to describe the functionality of the IBM Tivoli Enterprise Console state correlation engine introduced in ITEC version 3.8. It also outlines the configuration and design aspects as well as gives hints for installing and troubleshooting. Some case studies from different customers are discussed at the end.

Tivoli Field Guide - Event Processing Tools Available in IBM Tivoli Enterprise Console 3.8  

The purpose of this paper is to provide an overview of the event processing tools available in IBM Tivoli Enterprise Console version 3.8. It also ties these tools together so the customer can make informed decisions when planning event management strategies and implementations. Pre-filtering, filtering, state-based correlation, rules, gateways, and endpoints are all discussed.

Tivoli Field Guide - The Tivoli Management Enterprise Endpoint Gateway: A Technical Look at the Internals

An unstable endpoint manager or endpoint gateway can have a catastrophic ripple effect on the reliability and effectiveness of the Tivoli® Management Environment. This is primarily because of the number of management units that are affected by their instability. An unreliable gateway with 1,000 endpoints logged into it affects the ability of the administrator to effectively manage those 1,000 machines. In an environment of 10,000 endpoints, that represents a 10% failure rate...

 



Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

Last modified: August 15, 2009