Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Finding Dominators in Directed Graphs

Old News See Also Recommended Books Recommended Links Algorithms Implementations Minimal spanning trees Humor Etc

Dominator - Wikipedia, the free encyclopedia

Dominance was first introduced by Prosser in a 1959 paper on analysis of flow diagrams. Prosser did not present an algorithm for computing dominance, which had to wait ten years for Lowry and Medlock. Ron Cytron rekindled in the interest in dominance in 1989 when he applied it to efficient computation of φ functions, which are used in static single assignment form.

Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index

Old News ;-)

[PDF] Microsoft PowerPoint - dfa3

File Format: PDF/Adobe Acrobat - View as HTML
Dominators” needed to explain. reducibility. In reducible flow graphs, loops are well. defined, retreating edges are unique. (and called “back” edges). ...
suif.stanford.edu/~courses/cs243/lectures/dfa3.pdf - Similar pages

[PPT] M.Sc. of Software Engineering for the e-Economy CO7206 System ...

File Format: Microsoft Powerpoint - View as HTML
Dominators / Post-dominators; Control-dependence Analysis. Data Flow Analysis; Program and System Dependence Graphs; Slicing. Program Analysis Outline ...
www.cs.le.ac.uk/~mer14/week5-2.ppt - Similar pages
 

Doug Simon Structuring Assembly Programs

Doug's 1997 Honours thesis, under the supervision of Cristina Cifuentes, involved the creation of a new structuring algorithm to structure assembly-based program flow graphs.  This new algorithm was prompted as a way to simplify the expensive data structures and analysis required by the algorithms used in the dcc project.

Doug implemented the algorithms used by Cristina in the dcc project (see Structuring Decompiled Graphs or Chapter 6 of the PhD thesis Reverse Compilation Techniques) as well as the new, simplified, structuring algorithms we devised in 1997.  Both algorithms are compared in a tool called AST (assembly structuring tool) for SPARC assembly programs.

Abstract

An automated assembly to high level language (e.g. C) translator would be of great benefit to maintainers of legacy assembly programs. A component of such a translator would derive higher level constructs such as conditional, loops and sequences from the original source. This can be done by applying an algorithm to structure the control flow graph of the assembly source. This is commonly referred to as a structuring algorithm.

Most existing techniques for control flow structuring not easily adapted to structuring assembly code. During research into the feasibility of a binary decompiler, an algorithm more suited to the task of structuring assembly code was developed. This seminar focuses on a number of modifications to this algorithm that were developed as part of an honour's thesis.  In particular we discuss techniques for generating `better' high level code from unstructured assembly control flow graphs. Unstructured graphs are those that will require the use of gotos in the high level code. The quality of the code can be measured by the number of gotos generated.

In addition to our attempt to improve the functionality of the original algorithm, we also seek to improve the runtime speed of the translator by investigating less expensive data structures. Ideally, any improvement in speed will not involve a trade off with functionality.

We present the results of a prototype assembly structuring tool. The results were generated by running the tool on a number of well known, large benchmark programs used by those in the compiler community. From these results, we discuss what combination of improvements in speed and functionality are optimal in some well defined sense.

Code

The assembly structuring tool, AST, is distributed under a BSD license and it is (C) Doug Simon, 1997.

The code base was last compiled on a SPARC Solaris platform with gcc 3.1 on 1st March 2002. 

Two versions of the AST tool can be built: ast and ast_old.  The ast tool implements the new algorithms, based on dominators and post-dominators of a graph's underlying tree.  The ast_old tool implements the algorithms used in the dcc project, which are based on intervals and the derived sequence of graphs.  The Makefile provided in the distribution compiles the ast tool, to compile the ast_old tool, you need to uncomment some lines in the Makefile.

Code base: ast.tar.gz  (49 KB)

[PPT] A New Simpler Linear Time Dominators Algorithm

File Format: Microsoft Powerpoint 97 - View as HTML
Use augmented graphs of microtrees to summarize the arcs from G into the trees; determine internal immediate dominators (ie idoms residing inside a ...
www.cs.wustl.edu/~raa4/projects/bkrw/cs531bkrw-2.PPT - Similar pages

[PDF] Finding Dominators in Practice

File Format: PDF/Adobe Acrobat - View as HTML
dominators in the graph obtained from G by reversing all arc orientations. ... memory and compute dominators for each graph in sequence, measuring the total ...
www.daimi.au.dk/~loukas/dominators_esa04.pdf - Similar pages

[PDF] Finding Dominators Revisited

File Format: PDF/Adobe Acrobat - View as HTML
Algorithm for finding dominators in reducible graphs [1]. Lemma 5.1. For any vertex w = r, idom(w) is the. nearest common ancestor of sdom(w) and parent(w) ...
www.daimi.au.dk/~loukas/dominators_soda04.pdf - Similar pages
 

Dominators Dominators at AT&T Labs Research

In the summer of 1998, I worked with Anne Rogers and Adam Buchsbaum at AT&T Labs Research. We studied three major algorithms used to find important vertices called dominators in flowgraphs (directed, rooted graphs). A node d dominates n if every path from the root to n includes d. Dominator analysis is an important problem, with applications ranging from optimizing compilers to network flow analysis.

Programmers primarily use one of three algorithms to find dominators in flowgraphs; these are Purdom-Moore, Bit Vector, and Lengauer-Tarjan. On graphs with n vertices and m edges, the respective running times for each algorithm are O(m n), O(m n^2), and O(m alpha(m,n)) (where alpha is the inverse Ackermann function, which grows extremely slowly). When Lengauer and Tarjan introduced their algorithm in a 1979 paper, they used randomly generated flowgraphs as input to compare the actual running times for each of these algorithms. In their paper, they found the Lengauer-Tarjan algorithm showed better performance, even for small graphs, reporting that their algorithm out-performed Purdome-Moore for all test graphs larger than twelve nodes, and it outperformed Bitvector for all test graphs.

Over the summer, I explored whether modern machines with caches exhibit different results and whether using compiler-generated flowgraphs as input affects runtime. I implemented Purdom Moore, and Bit Vector and modified an existing Lengauer-Tarjan implementation. We collected running times for all three algorithms on two classes of graphs: random graphs and flowgraphs generated using the MACHINE SUIF compiler (from Harvard/Stanford) on programs from the SPEC benchmarks. In my analysis, I found that for both random and compiler generated graphs, Lengauer-Tarjan still proved fastest for large graphs. However, in the experiment, Purdom-Moore lost ground, moving the crossover point by Lengauer-Tarjan to seven nodes. We also found the bit vector algorithm does compete with Lengauer-Tarjan for small graphs. Therefore, the conclusions of the original paper still stand, even with modern machine caches and compiler generated graphs.

Although this work did not overturn existing ideas about dominator algorithms, I gained a great deal from the experience. On my return to school, I was able to leverage my experience with dominators in a class project where I used Visual C++ and MFC to create a Windows visualization tool for animating graph algorithms. The program allows users to create directed graphs and monitor the progress of various algorithms on them, including Purdom-Moore. My experience with dominators was more than just research; it was a unique opportunity for me to gain confidence in my research skills, to work with talented researchers, and to bring my summer experience to the classroom.

 

Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     

Dominator - Wikipedia, the free encyclopedia

[PDF] Finding Dominators in Practice

File Format: PDF/Adobe Acrobat - View as HTML
method. In Proceedings of the 25th International Symposium on Microarchitecture,. pages 260–263, 1992. 23. RE Tarjan. Finding dominators in directed graphs. ...
www.daimi.au.dk/~loukas/dominators_esa04.pdf - Similar pages

 

Stephen Alstrup, Dov Harel, Peter W. Lauridsen and Mikkel Thorup.
Dominators in linear time.
Vol. 28(6), 2117-2132,1999, Siam Journal on Computing.

A New, Simpler Linear-Time Dominators Algorithm - Buchsbaum (ResearchIndex)

Abstract: this article is organized as follows. Section 2 outlines Lengauer and Tarjan's approach. Section 3 gives a broad overview of our algorithm and differentiates it from previous work. Section 4 presents our algorithm in detail, and Section 5 analyzes its running time. Section 6 presents our new path-compression result, on which the analysis relies. Section 7 describes our implementation, and Section 8 reports experimental results. We conclude in Section 9. (Update)

Dominators in Linear Time - Alstrup, Harel, Lauridsen, Thorup (ResearchIndex) Dominators in Linear Time (1996)  (Make Corrections)  (14 citations)
Stephen Alstrup, Peter W. Lauridsen, Mikkel Thorup DIKU technical report

Dominators in Linear Time - Alstrup, Harel, Lauridsen, Thorup (1999)   (14 citations)  (Correct)

.... and program optimization [2, 3, 4, 5, 10, 15] The problem was first raised in 1969 by Lowry and Medlock [15] where an O(n 4 ) algorithm for the problem was proposed (as usual, n is the number of nodes and m the number of edges in a graph) The result has been improved several times (see e.g. [1, 2, 17, 20]) and in 1979 an O(mff(m; n) algorithm was found by Lengauer and Tarjan [14] Finally, at STOC 85, Dov Harel [11] announced a linear time algorithm. Based on Harel s result, linear time algorithms have been found for many other problems (see e.g. 4, 5, 10] Harel s description was, however, ....

R.E. Tarjan. Finding dominators in directed graphs. SIAM Journal on Computing, 3(1):62--89, 1974.

Notes on Graph Algorithms Used in Optimizing Compilers - Offner (ResearchIndex)

Notes on Graph Algorithms Used in Optimizing Compilers (1995)  (Make Corrections)  (2 citations)

A New, Simpler Linear-Time Dominators Algorithm - Adam Buchsbaum Haim (1998)   (1 citation)  (Correct)

....theory, with crucial applications in global flow analysis and program optimization [2, 7, 9, 15] Lorry and Medlock [15] introduced an O(n 4 ) time algorithm, where n = jV j and m = jAj, to find all the immediate dominators in a flowgraph. Successive improvements to this time bound were achieved [2, 20,22], culminating in Lengauer and Tarjan s [13] O(mff(m; n) time algorithm; ff is the standard functional inverse of the Ackermann function and is slightly superlinear [25] The Lengauer Tarjan (LT) algorithm is extremely efficient in practice. Reducing the asymptotic time complexity of finding ....

An O(|V|*|E|) Algorithm for Finding Immediate Multiple-Vertex Dominators (1996) - Alstrup, Clausen.. (1996)   (Correct)

Abstract: We present an O(jV j jEj) algorithm for finding immediate multiple-vertex dominators in a graph with vertices V and edges E. 1 Introduction. Finding dominators in a graph has been investigated in many papers [6, 7, 8, 9, 10] in connection with global flow analysis and program optimization. Recently Gupta extended the problem to find generalized dominators [4, 5], which can be used for e.g. propagating loop invariant statements out of the loop in cases, where no single vertex dominates the... (Update)

Generalized Dominators for Structured Programs - Alstrup, Lauridsen, Thorup (1996)   (Correct)

Abstract: . Recently it has been discovered that control flow graphs of structured programs have bounded treewidth. In this paper we show that this knowledge can be used to design fast algorithms for control flow analysis. We give a linear time algorithm for the problem of finding the immediate multiple-vertex dominator set for all nodes in a control flow graph. The problem was originally proposed by Gupta (Generalized dominators and post-dominators, ACM Symp. on Principles of Programming Languages,... (Update)

Algorithms Implementations

[patch] new dominance calculator

+/* Compute dominator relationships using new flow graph structures in + nearly linear time. */ void compute_flow_dominators (dominators, post_dominators) + ...
gcc.gnu.org/ml/gcc-patches/2000-06/msg00794.html - 8k - Cached - Similar pages
 

soot.toolkits.graph (Soot API)

DominatorsFinder, General interface for a dominators analysis. ExceptionalGraph, Defines the interface for navigating a control flow graph which ...
www.sable.mcgill.ca/soot/doc/ soot/toolkits/graph/package-summary.html - 21k - Cached - Similar pages

DominatorTree (Soot API)

protected DominatorsFinder · dominators. protected HashMap · godeToDode "gode" is a node in the original graph, "dode" is a node in the dominator tree. ...
www.sable.mcgill.ca/soot/doc/ soot/toolkits/graph/DominatorTree.html - 31k - Cached - Similar pages
 

[PDF] The Machine-SUIF Control Flow Analysis Library

File Format: PDF/Adobe Acrobat - View as HTML
Call find dominators to compute the dominator relation in the forward graph. Then access this infor-. mation through the following methods. immed dom(n) ...
www.eecs.harvard.edu/hube/software/nci/cfa.pdf - Similar pages

 

Dominators (joeq 1.0 API)

dom_nav. public static final jwutil.graphs.Navigator dom_nav ... public void calculateDominanceFrontier(Dominators.DominatorNode tree) ...
joeq.sourceforge.net/apidocs/ joeq/Compiler/Quad/Dominators.html - 29k -
Cached - Similar pages

Doug Simon Structuring Assembly Programs

Doug's 1997 Honours thesis, under the supervision of Cristina Cifuentes, involved the creation of a new structuring algorithm to structure assembly-based program flow graphs.  This new algorithm was prompted as a way to simplify the expensive data structures and analysis required by the algorithms used in the dcc project.

Doug implemented the algorithms used by Cristina in the dcc project (see Structuring Decompiled Graphs or Chapter 6 of the PhD thesis Reverse Compilation Techniques) as well as the new, simplified, structuring algorithms we devised in 1997.  Both algorithms are compared in a tool called AST (assembly structuring tool) for SPARC assembly programs.

Abstract

An automated assembly to high level language (e.g. C) translator would be of great benefit to maintainers of legacy assembly programs. A component of such a translator would derive higher level constructs such as conditional, loops and sequences from the original source. This can be done by applying an algorithm to structure the control flow graph of the assembly source. This is commonly referred to as a structuring algorithm.

Most existing techniques for control flow structuring not easily adapted to structuring assembly code. During research into the feasibility of a binary decompiler, an algorithm more suited to the task of structuring assembly code was developed. This seminar focuses on a number of modifications to this algorithm that were developed as part of an honour's thesis.  In particular we discuss techniques for generating `better' high level code from unstructured assembly control flow graphs. Unstructured graphs are those that will require the use of gotos in the high level code. The quality of the code can be measured by the number of gotos generated.

In addition to our attempt to improve the functionality of the original algorithm, we also seek to improve the runtime speed of the translator by investigating less expensive data structures. Ideally, any improvement in speed will not involve a trade off with functionality.

We present the results of a prototype assembly structuring tool. The results were generated by running the tool on a number of well known, large benchmark programs used by those in the compiler community. From these results, we discuss what combination of improvements in speed and functionality are optimal in some well defined sense.

Code

The assembly structuring tool, AST, is distributed under a BSD license and it is (C) Doug Simon, 1997.

The code base was last compiled on a SPARC Solaris platform with gcc 3.1 on 1st March 2002. 

Two versions of the AST tool can be built: ast and ast_old.  The ast tool implements the new algorithms, based on dominators and post-dominators of a graph's underlying tree.  The ast_old tool implements the algorithms used in the dcc project, which are based on intervals and the derived sequence of graphs.  The Makefile provided in the distribution compiles the ast tool, to compile the ast_old tool, you need to uncomment some lines in the Makefile.

Code base: ast.tar.gz  (49 KB)

[PDF] Enterprise Navigator: A System for Visualizing and Analyzing ...

File Format: PDF/Adobe Acrobat - View as HTML
2.4 Graph Dominators (Dominator). The Enterprise Navigator uses the Dominator tool to ... dominators of a graph. In a graph with a selected root node R, ...
www.research.att.com/resources/ trs/TRs/99/99.16/99.16.1.body.pdf - Similar pages

 

History

[PDF] A Fast Algorithm for Finding Dominators in a Flowgraph

File Format: PDF/Adobe Acrobat - View as HTML
TARJAN, R. Finding dominators in directed graphs. SIAM J. Comptng. 3. (1974), 62-89. ACM Transactions on Programming Languages and Systems, Vol. 1, No. ...
www.cs.princeton.edu/courses/ archive/spr03/cs423/download/dominators.pdf - Similar pages

[PDF] The Program Dependence Graph and Its Use in Optimization

File Format: PDF/Adobe Acrobat - View as HTML
and dominators [l, 31. The Appendix contains the basic graph theoretic termi- ... flow graph is equivalent to computing dominators [l] in the reverse ...
www.cs.utexas.edu/users/ less/reading/spring00/ferrante.pdf - Similar pages

 

Finding Dominators in Directed Graphs by Robert Tarjan

Abstract. This paper describes an algorithm for finding dominators in an arbitrary directed graph. The algorithm uses depth-first search and efficient algorithms for computing disjoint set unions and manipulating priority queues to achieve a time bound of $O(V\log V + E)$ if $V$ is the number of vertices and $E$ is the number of edges in the graph. This bound compares favorably with the $O(V(V + E))$ time bound of previously known algorithms for finding dominators in arbitrary directed graphs, and with the $O(V + E\log E)$ time bound of a known algorithm for finding dominators in reducible graphs. If $E \geqq V\log V$, the new algorithm requires $O(E)$ time and is optimal to within a constant factor.

Keywords. algorithm, binary tree, complexity, connectivity, depth-first search, directed graph, dominator, equivalence algorithm, graph, immediate dominator, priority queue, search, set union, stack, topological sorting, tree

Full Text (pdf)

 


Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

  • The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with.
  • We do not warrant the correctness of the information provided or its fitness for any purpose
  • In no way this site is associated with or endorse  cybersquatters using the term "softpanorama" with other main or country domains (e.g. softpanorama.com) with bad faith intent to profit from the goodwill belonging to someone else.

Last modified: August 15, 2009

similar except that the transfer is to the statement following the end of the i'th enclosing loop.

The algorithm initially orders the nodes in the graph by constructing a spanning tree of the graph in a depth first manner. The details of the depth-first procedure are omitted here but can be found in any book dealing with algorithms and data structures such as [7].

When structuring loops, loop headers are determined to be those with an edge into them from a node lower in the graph. For each of these loop headers an extra node is introduced. This node does not introduce any extra code in the underlying program and is used when tagging all the nodes within a loop. These are determined to be all the nodes between the new loop header (the introduced node) and the loop tail. We do not detail how the tail is determined here as it is not relevant. Figure 2.5 shows the process of adding the extra node to a graph. In graph (a), B is determined to be a loop header and the results of adding the extra node (B

0

) for this header is shown in

graph (b). The next step would be to introduce a new node for A.

After adding the extra nodes, the algorithm the defines a predicate HEAD

2.3. SEMANTIC PRESERVING STRUCTURING 27

D A

B

C D

A

B'

B C

(a) (b) Figure 2.5: (a) Original CFG and (b) CFG with extra loop header node added.

for each node in the graph such that HEAD (q) is the head of the smallest loop enclosing q in the program. If q is not in a loop, then HEAD (q) is undefined.

Having added the extra nodes, we are guaranteed that for loop header p, HEAD (p) will be either the header of an enclosing loop or undefined. This means that nested loop are correctly determined. Loop nesting occurs when one loop is inside another. An example of this is shown by graph (a) in Figure 2.5. The loop induced by the back edge (D ; A) nests the loop induced by the other back edge (C ; B ).

For conditional structuring, a reach set is constructed for each node p where the members of this set are all the nodes entered by edges from p or from nodes nested within p (i.e. nodes within a loop headed by p or nodes within the clause of a conditional headed by p). The constraint on the set membership for p is that it cannot contain p or any node nested within p. Figure 2.6 gives an example of a CFG and a table showing the reach sets derived from it.

Using these sets, a node is structured as a conditional header if it has at least

28 CHAPTER 2. RELATED WORK

A B

D E

F

G

H

C

Node Reachset

A fBg

B fE,Fg C fE,Fg D fE,Fg

E fGg F fGg G fg H fg

Figure 2.6: A graph with the reach set shown for each node. two out edges, both of which eventually lead to one of the nodes in the reach set with at least one doing so indirectly. It does this by considering a subset of all reachable statements from each statement where the membership of the subset is well defined. For each conditional detected, the first statement of each branch is recorded as well as the follow node of the whole conditional (i.e. the first common statement reached by the branch of the conditional). The predicates of the branching statements are inverted if necessary to prevent generating if..then..else statements where the then clause is empty. As an example consider nodes B and C . Node B will be structured as a conditional as both of its out edges lead indirectly (via c and D ) to the nodes in its reach set. Node C on the other hand, will not be structured as a conditional as both of its out edges lead directly to the nodes of its reach set (i.e. e and F ).

The algorithm is extended to handle irreducible CFGs. The above algorithms ensures that unstructured jumps will only take the form of forward jumps into a clause of an if..then..else statement or into the body of a loop.

This algorithm gives us the useful concept of imposing an ordering between the nodes within a CFG and how such an ordering can be constructed (i.e. by a depth first search traversal of the graph from the entry node). The immediate dominator technique of structuring conditionals provides the basis for how we perform structuring of conditionals.

An adaptation of the technique of inserting nodes when structuring loops is

2.4. ASSEMBLY CONTROL FLOW GRAPH STRUCTURING 29 used by our structuring algorithms. The direct result is that we can structure the one node to be both a loop header and a conditional header.

Baker's algorithm does not take into account the occurrence of nodes which have more than two out edges (i.e. case and switch statements). She states that adapting the algorithm to handle these cases is straightforward but this is not the case. The reason being that when structures headed by one of the mulit-way nodes overlaps with a loop, there is a need for extra analysis to decide which structure will be generated in the HLL code and which will results in goto statement(s). The restriction of goto jumps being only forward is also unacceptable when attempting to minimise the number of goto statements generated. For example, in the case where there is a jump back into the subgraph of a switch statement, it is desirable to structure that backwards jump as a goto rather than as a loop with multiple forward goto jumps into it.

2.4 Assembly Control Flow Graph Structuring

The structuring work performed by Cifuentes [5] as a component of a binary decompiler

1

includes most of the algorithms upon which our work is based.

The algorithms used for structuring rely upon a number on control flow techniques available in the literature dealing with the analysis and manipulation of CFGs such as successors, predecessors, depth first search and more. The definitions of these concepts is postponed until Chapter 4. The control flow analysis is particularly relevant as it is performed on the control flow graphs of assembly-like programs.

Loop structuring is performed through the use of intervals and derived sequence of graphs which had previously been used in both compiler control flow and data flow analyses. Conditional structuring was performed by using the immediate dominator information for each node in the graph.

An algorithm is also given for structuring short-circuit evaluated graphs. These graphs result from the efficient evaluation of compound boolean conditions. For example, the graph shown on the left hand side of Figure 2.7

1

A binary decompiler is a tool that translates a binary program (executable) into a

high level language program.

30 CHAPTER 2. RELATED WORK representing conjunction in a HLL program, can be implemented by the CFG on the right hand side of the same figure. This type of low level implementation is even guaranteed by some high level languages (such as C) and, for example, allows a programmer to test a pointer before dereferencing it without needing to use a lengthier conditional statement.

else t f

t

t

f

f then

elsethen x or y x

y

Figure 2.7: Two possible CFG representations for a compound boolean expression. The short-circuited representation is given on the right hand side of the figure.

The algorithms used for structuring 2way and nway conditionals were given separately in this work. We can modify how this structuring is done so that the two can be performed together. The modification should not only give better performance but also improve the quality of the generated code.

As mentioned previously, the loop structuring algorithm presented by Cifuentes is a major focus of this thesis. We attempt to preserve the functionality of the algorithm presented in this research while improving its execution time by using an alternative to the two data structures she used.

We do not include the structuring of short-circuit evaluated graphs as to do so requires data flow analysis. This is quite feasible in the context of developing a tool (such as a decompiler) that already involves data flow analysis. However, as we are concentrating purely on control flow analysis, we have neglected this type of structuring.

Chapter 3 Assembly to High Level Language

The task of structuring assembly programs involves more than just designing some structuring algorithms. We need to deal with the issue of getting the program into its CFG representation which is required by the structuring algorithms. We also need to provide some means of generating the HLL code from the CFG after it has been augmented with the structuring information. This chapter describes these broader issues of assembly structuring. For the components that are not central to the translation process, a detailed discussion of their algorithmic aspects is given here instead of devoting a whole chapter to them.

The structure of an assembly to HLL translation is similar to standard compilation on a conceptual level (although significantly simpler) as it consists of a series of phases that transform the input (an assembly program) from one representation to another. The phases involved are shown in Figure 3.1. The parser performs both syntax analysis and a simple form of semantic analysis on the input assembly. Any implementation of this phase is architecture dependent because assembly languages are dependent on a particular machine. The CFG generation phase builds the CFG representation of the program. As a result of semantic analysis performed in the parser phase, this phase is both architecture and HLL target independent. The third phase is the structuring process. Both it and the last phase, HLL code generation, are architecture independent but are dependent on the HLL chosen as the target. These last two phases are the subject of the following two chapters

32 CHAPTER 3. ASSEMBLY TO HIGH LEVEL LANGUAGE

Assembly Program

Parser CFG Generation

Structuring HLL Generation

HLL Program

AST

Figure 3.1: The phases of an assembly to HLL translation. respectively and are the core of the structuring process.

3.1 Assembly Parsing An assembly language is in essence just a symbolic syntax for a given machine's binary instruction set. This explains why any given assembly language is machine dependent and programs written in it are not portable to another architecture without significant translation. Only pure binary programs, which are further restricted by binary file formats and operating systems, are less portable. The implications of this architecture dependence for any implementation of the translator is that it will only be able to process assembly programs from one architecture. However, given that the later phases are independent of the given assembly syntax, porting the implementation to a different architecture should only involve rewriting the first phase module.

The semantic analysis performed by the parser is the categorisation of each instruction parsed into one of two categories; transfer instructions and non 3.1. ASSEMBLY PARSING 33 transfer instructions. Knowing what category an instruction is assists in the next phase; control flow graph generation. We partition the instruction set of any architecture into the following two categories:

ffl Control Instructions (TI): this is the set of instructions that change the

value of the program counter. That is, they explicitly set the address of the instruction to be executed. For convenience, we refer to any instruction within this set as a control transfer instruction (CTI). These instructions are:

- Unconditional branches

1

: the flow of control is transferred to the

target branch address.

- Conditional branches: depending on the value of some condition

code(s) set by preceding instructions, the flow of control is transferred either to the target branch address or the next instruction in the sequence.

- Indexed jumps: the target of the jump is defined by the value of

a register and is therefore a range of values, one of which the flow of control is transferred to.

- Subroutine call: the flow of control is transferred to the routine

given in the appropriate operand of the instruction. For our purposes, we assume that the flow of control is transferred back to the next instruction in the sequence after successful completion of the subroutine and only this flow of control is represented in a control flow graph.

- Subroutine return: the flow of control is returned to the caller

subroutine that invoked the current subroutine.

ffl Non transfer instruction (NTI): the set of instructions that implicitly

transfer control to the next instruction in the sequence. That is, all the instructions that are not in TI.

Parsing the assembly input is little more than text file processing. Checking for the correctness of the assembly program is considered to be outside the domain of the AHT process. This phase only reads in the instructions and build a representation of the instruction sequence within memory. When

1

Different architectures use the terms `jump' and `branch' to mean the same thing. We

adhere mainly to the latter but the two are used interchangeably hereafter.

34 CHAPTER 3. ASSEMBLY TO HIGH LEVEL LANGUAGE parsing a control transfer instruction (CTI), the internal representation is tagged as being a CTI as well as being augmented with the location(s) of the transfer control. This simple semantic analysis is tightly coupled to the host architecture but the information it renders is architecture independent. The locations are stored as positions within the internal instruction sequence. Essentially, they are array indices. A parser for assembly in any architecture could be used here instead as long as it performs the architecture independent categorisation of the instructions as discussed above.

3.2 Control Flow Generation As a preliminary to discussing the generation of a control flow graph, we define the concept of a basic block. Working from this definition, we then proceed to formalise the concept of a control flow graph as well as introducing terminology used when discussing control flow graphs.

3.2.1 Basic Blocks This section formally defines the concept of basic blocks which are also the building blocks of a CFG. The definition is given in terms of the sequence of instructions that were parsed into memory by the parser phase.

Definition 3.1 A basic block b = [i

1

; : :; i

n\Gamma 1

; i

n

; i

n+1

], n ? 1 is an instruction sequence that satisfies the following conditions:

1. [i

1

; : :; i

n\Gamma 1

; i

n+1

] 2 NTI

2. 8 i 2 fi

2

: : i

n+1

g ffl i is not the target of a CTI

3. i

n

2 TI

OR 1. [i

1

; : :; i

n+1

] 2 NTI

2. i

n+2

is the target of a CTI

The first alternative in Definition 3.1 specifies that a basic block is a sequence of instructions such that all but the second last are from the NTI set. Additionally, only the first instruction may be the target of a CTI and the second

3.2. CONTROL FLOW GENERATION 35 last instruction must be a CTI. The second alternative defines a basic block that is not delimited by an instruction that explicitly transfers control. In this case, the last instruction has the property that its successor is the target of a CTI. All the instructions in this type of basic block will be from the NTI set.

The above definition is unfortunately dependent upon the underlying architecture. The one we have given is in terms of the SPARC architecture which includes the concept of delayed instructions. These are the instructions that immediately follow most CTIs. This instruction is executed, conditionally in some cases, before the transfer of control takes place. For simplicity, we therefore define these delayed instructions to be a member of the same basic as their preceding CTI. Chapter 6 deals further with the idiosyncrasies of these delayed instruction further when discussing how they were handled by an implementation of our structuring algorithms.

3.2.2 Graph Theory Before defining a control flow graph, we formalise the notion of a graph and its relevant properties. The definitions presented are those given in [5] and are similar to those found in any text dealing with graph theory.

Definition 3.2 A graph G is a tuple (V,E,h) where V is a set of nodes, E is a set of edges, and h is the root of the graph. An edge is a pair of nodes (v,w) such that v,w 2 V.

Definition 3.3 A directed graph G = (N,E,h) is a graph that has directed edges; i.e. each (n

i

; n

j

) 2 E has a direction which is from n

i

to n

j

.

Definition 3.4 A path from n

1

to n

m

in graph G = (N,E,h) , represented by

n ! n

m

, is a sequence of edges (n

1

; n

2

); (n

2

; n

3

); : :; (n

m\Gamma 1

; n

m

) 2 N ; m ? 1.

Definition 3.5 If G = (V,E,h) is a graph such that V = fhg and E = f g, then G is a trivial graph (i.e. a graph with only one node and no edges).

Definition 3.6 If G = (N,E,h) is a graph and 8 n 2 N ; h ! n, then G is a connected graph (i.e. every node in the graph is reachable from the head of the graph).

36 CHAPTER 3. ASSEMBLY TO HIGH LEVEL LANGUAGE 3.2.3 Control Flow Graphs The following definition formally shows that a control flow graph is constructed from the basic blocks of a program. There is one CFG for each subroutine in the program and these CFGs do not overlap

2

. However, the

CFGs do not necessarily partition the basic blocks of a program. Since the edges within a CFG represent the flow of control between the blocks, any unreachable blocks will not be included in any CFG.

Definition 3.7 A control flow graph G = (N,E,h) for a procedure P is a connected, directed graph such that:

1. 9 f : B 7ae N where B is the set of basic blocks for the procedure. 2. 8 e = (n

i

; n

j

) 2 E , e represents a flow of control from one basic block

to another and n

i

; n

j

2 N .

3. 9 b

i

2 B ffl h is the first instruction in b

i

.

Part 1 states that there is a partial one-to-one function mapping basic blocks to nodes in the graph. That is, for every node there is a corresponding unique basic block. However, due to the aforementioned unreachable basic blocks, the reverse is not necessarily true. Part 2 equates the control flow between two basic blocks with an edge in the graph. The third part of the definition defines the head of the graph to be the node containing an instruction that is an entry to a subroutine. This instruction will either be the first instruction in the program or the target of a subroutine call.

For certain types of analysis, it is important that a control flow graph has only one exit node. That is, there is a unique node in the graph such that it is the only one that has no successors. If there is more than one of these in the original graph, we conceptually add a new node and add an edge from each of the candidate exit nodes in the original graph to this new node. This new node does not introduce any code into the underlying program.

2

As we are only concerned with control flow analysis, we do not consider the transfer

of data between subroutines. For this reason, we do not explicitly show the transfer of control from one subroutine to another as an edge in a graph. It suffices for us to simply show the points at which a subroutine call is made.

3.2. CONTROL FLOW GENERATION 37 Definition 3.8 The exit node, e, of a graph is such that 9 e 2 N ffl @n 2 N ffl (e; n) 2 E

As with the instructions, the basic blocks are also classified into different types for the purpose of control flow analysis. The type is determined by the delimiting instruction of a block. Under the SPARC architecture (with its delayed instructions), the delimiting instruction of a basic block will be the second last one in the case where the block is defined by the first part of Definition 3.1. Otherwise it will be the last instruction in the block. The types are:

ffl Unconditional basic block: the second last instruction in the block is an

unconditional branch instruction. This block will have one out-edge.

ffl Conditional basic block: the second last instruction is a conditional

branch. This block will have two out-edges.

ffl Nway basic block: the second last instruction is a indexed jump. The

block will have n out-edges, one for each possible target of the jump.

ffl Call basic block: the second last instruction is a call to a subroutine.

This block will have only one out-edge which will be to the lexical successor of the block's last instruction.

ffl Return basic block: the second last instruction is a return instruction.

This block will have no out-edges.

ffl Fall through basic block: the block is not delimited by a CTI.

The nodes in the graph representing basic blocks

3

of the above types are

given the following names respectively: ucond, cond, nway, call, ret, fall.

To illustrate the process of basic block partitioning and CFG generation consider the example shown in Figure 3.2. The input to the example is an assembly program

4

. Instructions 9 and 12 are examples of unconditional branches

which delimit an unconditional basic block while instruction 6 demonstrates

3

Given this representational relationship between nodes and basic blocks, the two terms

are used interchangeably hereafter.

4

A number of non-transfer instructions have been removed from the code to cut down

the size of the example.

38 CHAPTER 3. ASSEMBLY TO HIGH LEVEL LANGUAGE an indexed jump

5

which delimits an nway basic block. The graph resulting

from these basic blocks and the flow of control between them is shown in Figure 3.3.

1 main: save %sp,-120,%sp 2 or %o0,%lo(.LLC0),%o0 3 call scanf,0 4 or %o0,%lo(.LL3),%o0 5 ld [%o1+%o0],%o0 6 jmp .LL3 .LL12 .LL4 7 nop 8 .LL3: ld [%fp-20],%o1 9 b .LL14 10 or %o0,%lo(.LLC1),%o0 11 .LL4: ld [%fp-20],%o1 12 b .LL14 13 or %o0,%lo(.LLC1),%o0 14 .LL12: sethi %hi(.LLC2),%o0 15 or %o0,%lo(.LLC2),%o0 16 .LL14: bne .LL15 17 nop 18 call printf,0 19 mov 0,%i0 20 .LL15 ret 21 restore

Label Type Instructions

A call 1 - 4

B nway 5 - 7 C ucond 8 - 10 D ucond 11 - 13

E fall 14 - 15 F cond 16 - 17 G call 18 - 19 H ret 20 - 21

Figure 3.2: A sample assembly program. The basic block partitioning is shown by the table on the right hand side.

3.2.4 Unreachable blocks One last refinement has to be performed to the CFGs before they are ready for structuring. We must remove the aforementioned unreachable basic blocks.

To remove the unreachable blocks, a depth first traversal if performed from each node representing the entry point to a subroutine, tagging each node as it is traversed. Upon completion of the traversals, it is simply a matter of removing the untagged nodes.

5

The range of the possible index values have been resolved by a preprocessor.

3.2. CONTROL FLOW GENERATION 39

call nway ucond ucond fall

cond call

ret

A B

E

H G

F

DC

Figure 3.3: The control flow graph for the program given in Figure 3.2

Chapter 4 Structuring Assembly Graphs A control flow graph a representation of the underlying assembly program which does not contain any information that cannot be read directly from the assembly code. To generate high level code from these control flow graphs requires further analysis. The use of structuring algorithms will give us the required information. As a result of these algorithms, the graphs will be annotated with the information describing what high level control constructs are represented in the graph. This chapter presents the control flow analysis performed by the structuring algorithms to attain the high level control flow constructs.

4.1 Notation This chapter includes a number of algorithms presented in pseudocode. While this style is mostly intuitive, the following conventions are used:

ffl attributes of objects are written (with a lower case first letter) as a

function call on the object. For example, iPDom(x) accesses the node x's immediate post-dominator and isBackEdge(v,w) returns whether or not (v ; w ) is an edge.

ffl procedure calls are written in their usual style with the distinguishing

feature of having an uppercase first letter. Return values are implemented as var (i.e. reference) parameters.

4.2. GRAPH STRUCTURING 41

ffl the nesting levels are represented through the use of indentation. ffl a test for equality uses `=' and assignment uses the `:=' symbol.

While it is presumed that the meaning of each node attribute is self-explanatory, a reference for all these attributes is provided in Appendix A.1. This reference relies upon the definitions given in the following sections. Section A.2 defines any predicates used while Section A.3 defines an attribute of a loop when referred to as an object.

4.2 Graph Structuring To structure a control flow graph, a set of high level constructs is required to give the structuring some context. For example, the structuring of control flow graphs could be performed very easily if we were only aiming to describe the flow of control using two-way conditionals (if..then..else) and goto statements. However, it is questionable whether this leaves us much better off than the original assembly program.

The set of high level constructs chosen for structuring needs to be general enough to cater for some of the more common high level languages. This allows any implementation of the assembly structuring process to be easily retargeted for any high level language that at least provides the set of constructs used to do the structuring. Further analysis could then be performed on the output to use additional constructs provided by the language in question. Alternatively, the algorithms could be modified slightly to support new constructs.

A comprehensive review of some common high level languages and the control flow mechanisms they provide is given on pages 138 - 140 in [5]. The constructs embodied in each of the languages is a subset of the constructs shown in Figure 4.1 along with their control flow graph representations. Note that some of the other common control structures such as for loops have not been shown. These omitted structures can usually be implemented using those already shown. For example, in the case of the for loop, it can be implemented by using a sequential statement followed by a while loop.

The set of structures used for our purposes include all those shown in Figure 4.1 except for the last two, multilevel exits and multilevel cycles. Instead,

42 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

Goto Statement

Repeat loop

Multilevel exit Multilevel cycle Multiexit loop

Multicycle loop

While loop Endless loopSequence

n-way branch conditionalConditionalSingle branch conditional

Figure 4.1: HLL control structures and examples of their control flow graph representations.

goto's will be used to emulate these structures should they appear as most languages provide the former but very few (e.g. Ada) provide the latter. However, the inclusion of gotos also excludes at least one common language, Modula2.

4.3 Control Flow Graph Terminology Before discussing how we gather information about the underlying structures present in a control flow graph, we first define the relevant terminology and definitions used. These definitions are taken directly from [5] (apart from

4.3. CONTROL FLOW GRAPH TERMINOLOGY 43 the post-dominator definitions which are a modification of the immediate dominator definitions given in the same piece of work) and are given in terms of a directed graph G = (N ; E ; h; e). This is an extension of the normal representation of a graph in that we now explicity mention the exit node (see Definition 3.8).

Definition 4.1 A closed path or cycle is a path n1 ! n

v

where n

1

= n

v

.

Definition 4.2 The successors of n

i

2 N are fn

j

2 N j n

i

! n

j

g. The

immediate successors of n

i

2 N are fn

j

2 N j (n

i

; n

j

) 2 E g.

Definition 4.3 The predecessors of n

j

2 N are fn

i

2 N j n

i

! n

j

g. The

immediate predecessors of n

j

2 N are fn

i

2 N j (n

i

; n

j

) 2 E g. A node

n

i

2 N post-dominates a node n

k

2 N if n

i

is on every path n

k

! e.

Definition 4.4 A node n

i

2 N immediately post-dominates n

k

2 N if

@n

j

2 N ffl n

j

post-dominates n

k

and n

i

post-dominates n

j

(i.e. n

i

is the

closest post-dominator of n

k

).

Definition 4.5 A strongly connected region (SCR) is a subgraph S = (N

S

; E

S

; h

S

) ffl 8 n

i

; n

j

2 N

S

ffl 9 n

i

! n

j

and n

j

! n

i

.

Definition 4.6 A strongly connected component of G is a subgraph S = (N

S

; E

S

; h

S

) ffl S is a strongly connected region and @S

2

2 SCR(G) ffl

S ae S

2

(i.e. it is the smallest connect region involving h

S

).

Definition 4.7 A depth first search (DFS) is a method of traversing a graph that selects edges to traverse from the most recently visited node that still has unvisited nodes on at least one of its edges. A depth first spanning tree (DFST) of a flow graph G is a directed, rooted, ordering spanning tree of G grown by a DFS algorithm.

As described on page 482 of [7], the DFST T of a graph partitions the edges of the graph into four categories; tree edges, forward edges, backward edges and crossedges. For the purpose of structuring CFGs, the differentiation between tree, forward and cross edges is uninteresting and so we collapse them all to the one category, forward edges. We end up with the following two categories:

44 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

1. Backward edges = f(v ; w ) j (v ; w ) 2 E ^ w ! v 2 T g. 2. Forward edges are all the remaining edges.

The backward edges (also referred to as back edges) are used when structuring loops. The forward edges are used for structuring all other constructs as well as loops.

4.3.1 Interval Theory and the Derived Sequence of

Graphs

For the purpose of the describing the structuring algorithm we aim to improve upon, we define the two data structures it uses. The first structure was defined by Cocke in [6] and the algorithm used to build it developed by Cifuentes in [4].

Definition 4.8 Given some node, n 2 N , the interval I (n) is the maximal, single-entry subgraph in which n is the only entry node (i.e. the only node with a predecessor outside the subgraph) and in which all closed paths contain n. The unique interval node n is called the interval head or simply the header node.

We can use intervals to partition the nodes of G into a unique set of disjoint intervals I = fI (h

1

); : :; I (h

n

)g for some N ? 1. The algorithm presented

in Figure 4.2 is used to derive the sequence of intervals for a graph. This algorithm is derived from work done by Allen [2] in the area of control flow analysis. Defining I as a sequence imposes an ordering between the intervals of a graph which is required when building the other data structure required for this algorithm, the derived sequence of graphs.

Interval theory alone is not sufficient for the complete structuring of loops. We use an additional concept called the derived sequence of graphs which was described by Allen [2] and is based upon the intervals of a graph. The construction of the graphs is an iterative method that collapses intervals into nodes. The nodes formed by this process are then the nodes of the next order graph. The edges for this next order graph are both the in edges to the header of the interval from which the new node was derived as well as the out edges from nodes within the same interval whose destination was not

4.3. CONTROL FLOW GRAPH TERMINOLOGY 45

procedure FindIntervals (G = (N ; E ; s); var I) * Pre: G is a graph * Post: the intervals of G are contained in the sequence I

I:= h i H := fhg for (all unprocessed n 2 H ) do

I(n) := fng repeat

I (n) := I (n) + fm 2 N j 8 p 2 pred(m) ffl p 2 I (n)g until (no more nodes can be added to I (n)) H := H + fm 2 N j m 62 H ^ m 62 I (n) ^ (9 p 2 pred(m) ffl p 2 I (n))g I:= I + I (n) end procedure

Figure 4.2: Interval building algorithm.

within the interval. The process of building the intervals is then performed on the new graph after which the whole process starts again. This continues until either the trivial graph is derived or the graph derived is the same as the graph from which it was derived. If the latter is the case, then the graph is said to be irreducible.

An adaptation of the algorithm given by Cifuentes in [4] is shown in Figure 4.3. The adaptation is a result of using an object oriented design. Under this paradigm, an interval is represented by a class derived from the class used to represent an ordinary graph node with a few additional attributes (see Appendix A.4). For readability, we do not explicitly show the necessary type conversion required when treating a sequence as a set. The nodes and edges of the derived graph G

i

are denoted by N

i

and E

i

respectively.

Examples of the derived sequences for two programs are shown in Figures 4.4 and 4.5. They demonstrate how the intervals are defined for each graph within the sequence. Figure 4.5 is an example of an irreducible graph. The final graph in its derived sequence is the canonical irreducible graph. All irreducible graphs will reduce to containing only or more instances of this graph. An irreducible graph will occur when there is a forward edge into a loop subgraph.

For convenience, we use the term DSG (derived sequence of graphs) when describing the loop structuring algorithm that uses these two data structures.

46 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

procedure BuildDerivedSequences(G = (N ; E ; h); var G) * Pre: G is a graph * Post: the derived sequence of graphs G

1

; : :; G

n

; n ? 1 has

* been constructed and is returned in G

G

1

:= G

G:= hG

1

i

i := 2 repeat /* Construction of G

i

*/

FindIntervals(G

i\Gamma 1

,I)

N

i

:= I

h

i

:= I (1)

E

i

:= f g

if (#N

i\Gamma 1

6= # I) then

for (all I 2 I) do

for (all n 2 nodesOf(I )) do

for (all m 2 (succ(n) - nodesOf(I ))) do

E

i

:= E

i

+ (I ; inInterval(m))

G:= G + G

i

i := i + 1 until (G

i

is the trivial graph or #N

i\Gamma 1

= # I)

end procedure

Figure 4.3: Derived Sequence of Graphs building algorithm.

4.3.2 Parenthesis Theory The proposed alternative loop structuring algorithm does not require any additional data structures beyond the control flow graph itself. However, it requires an extra property to be given to each node within the graph. Given that this extra property can also be used to make another improvement through 2 way conditional restructuring to the structuring algorithms, we define it for the nodes in the graph irrespective of what loop structuring algorithm is used.

We define this property in terms of a graph G = (N ; E ; h). We also rely on a DFS traversal of the graph from h to increment a `time' counter (initialised to 1) after the first and last visit to each node during the traversal. The first of these properties was given by Cormen et al. on pages 477-481 of [7]. The other two are derived from this first one.

Definition 4.9 A parenthesis of n 2 N is denoted t

1

! t

2

and t

1

is the

4.3. CONTROL FLOW GRAPH TERMINOLOGY 47

I(1)

I(2)

G2

G3 1

2

3 4

5

6

G1

2-6

1 1-6

I(1)

I(1)

Figure 4.4: The Derived Sequence of a graph with the intervals of the initial graph and each derived graph shown by the dotted boxes.

4

1 2 1 2

3 3-4

I(2)

I(1) I(1)

I(2) 1(3)

G1

G2

Figure 4.5: An example of an irreducible graph. The final graph is the canonical irreducible graph.

time at which n was first visited during a DFS traversal from h and t

2

is the

time at which n was last visited during the same traversal.

Definition 4.10 A parenthesis n

1

! n

2

is enclosed by parenthesis m

1

!

48 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS m

2

iff both were defined during the same traversal of a graph and m

1

! n

1

!

n

2

! m

2

.

Definition 4.11 9 n

1

; n

2

2 N ffl (n

1

O/ n

2

j (9 P

1

; P

2

ffl P

1

is a parenthesis

of n

1

^ P

2

is a parenthesis of n

2

^ P

1

is enclosed by P

2

)). The O/ relationship

is pronounced "is in".

It is relevant to note that the above definitions imply that there is more than one possible DFS traversal of a graph from the same starting point. This is important in our case as will be shown later. It is sufficient to state now that the two (or more) different traversals are a result of the ordering defined between the out edges of a node. Recall that for a 2way (or cond) node, the edges represent the flow of control taken when the condition is true or false respectively. This gives us a means of performing two symmetrical DFS traversals of the graph, one traversing the true branches first, the other traversing the false branches first. For an nway node, the symmetrical case involves working through the out edges in reverse.

Figure 4.6 shows the recursive algorithm used to set the parenthesis of each node in a graph. Note that it has a boolean parameter, indicating which edges are to be traversed first. For an nway node, when doing the `true' traversal, we work through the sequence of out edges in ascending order and descending order for the `false' traversal. In the context of parenthesis theory, the nodes have a few extra attributes which are described in Appendix A.5.

procedure SetParenthesis(G = (N ; E ; h); brch;var time) * Pre: G is a graph * Post: The brch traversal parenthesis for h has been determined.

traversed(h) := true parenthesis

brch

(h)(1) := time

time := time + 1 for (n 2succ(h) ^ : traversed(n)) do

SetParenthesis(G; brch; time) parenthesis

brch

(h)(2) := time

time := time + 1 end procedure

Figure 4.6: Algorithm used to set the parentheses for each node.

For convenience, we use the term PT (Parenthesis theory) when describing the loop structuring algorithm that uses the parenthesis properties of nodes.

4.4. STRUCTUREDNESS 49 4.4 Structuredness Having defined the context for structuring, we can now classify control flow graphs as either structured or unstructured. Structured graphs are those in which the flow of control represented can be explained completely in terms of a set of structured control structures. All of the structures in our chosen set (see Section 4.2) apart from the goto are structured control structures. Therefore, any control flow graph we can reduce in terms of these structures is a structured graph. In the context of assembly programs, the control flow graphs will quite often be unstructured. We are not recognising subgraphs representing short circuit evaluation of compound conditions

1

, so the graphs

containing these subgraphs will appear to be unstructured.

To deal with cases of unstructuredness in the CFGs, it is important to first identify the types of unstructured cases that we may come across. In the following sections, we describe the categories of structures we want to detect and what information we aim to derive for instances within each of these categories. The type of unstructuredness that may arise in each category is mentioned.

4.4.1 Loops The structured forms of loops are subgraphs that display a single entry/single exit nature. This includes the while, repeat and do loops. We also include multiexit and multicycle loops in this case but this decision is qualified in the following section on unstructured loops.

Examples of structured loops are the pretested, postested, endless, multiexit and multicycle loops shown in Figure 4.1.

For any loop, we want to derive the following information:

header - the node at the entrance to the subgraph representing the loop. latch - the node in the loop whose edge to the header defines the loop. follow - the first node reached after termination of the loop (if any).

1

See Section 2.4 for an explanation of why we make this restriction.

50 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS loop nodes - the nodes that are within the subgraph representing the loop. loop type - is it a preTested (while), postTested (repeat) or endless (do)

loop?

We introduce the notation h \Phi l to represent a loop with the header node h and the latching node l .

Unstructured Loops Some of the categories defined by Williams [11] that were discussed in Chapter 2 are shown in Figure 4.7. We ommit his category of parallel loops as we can structure these as multicycle loops. This is influenced by the target language we have chosen, C. However, both break's and continue's within a loop are only detected when generating the high level code. This means that we consider them as forms of unstructuredness when doing the structuring. When modifying the assembly structuring process for another high level language that provides neither of these control structures, all that is required is a small modification to the code generation phase. The type of control flow transfer occurring in these two structures is often considered to be a restricted form of unstructuredness. That is, they can both be seen as gotos with a limitation on the destinations to which control can be transfered.

Multientry

Overlapping Multiexit Figure 4.7: Sample unstructured loops.

4.4. STRUCTUREDNESS 51 4.4.2 2way Conditionals The control flow graph representation of structured 2way conditionals will also display a single entry/single exit nature. For each 2way conditional header, there will be two branches (one for each out edge from the header). For a structured 2way conditional the nodes reachable along each of the branches will be disjoint from each other except for the one node upon which they must both converge.

Examples of structured 2way conditionals are the two middle graphs in the first row in Figure 4.1.

For any 2way conditional, we need to derive the following information:

header - the node at the entrance to the subgraph representing the conditional.

follow - the unique node at which all paths from each branch converge. conditional type - is it a single clause (if..then) or double clause (if..then..else)

2way conditional?

We introduce the notation h3f to represent a 2way conditional with the header node h and the follow node l .

Unstructured 2way conditional Unstructured 2way conditionals are those that have an abnormal entry or exit to or from a branch. Both of these cases are represented in the canonical abnormal selection path graph, shown in Figure 4.8. Graph (a) contains two potential if..then..else structures, headed by nodes 1 and 3. The former is shown in graph (c) and contains an abnormal exit from its right branch whereas the latter is shown in graph (b) which has an abnormal entry into its left branch. Both types of abnormal exit can also be along a back edge. In that case, the conditional will be overlapping a loop subgraph.

52 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

1 2 3

4 5

6

4

3

6

5 2

1

3 4

(c)(b)(a) Figure 4.8: Unstructured 2way conditional graphs.

4.4.3 Nway conditionals Similarly, an nway conditional is structured if the nodes reachable from each of the branches corresponding to the out edges from the nway header are all disjoint. Once again, there will also be a unique follow node on which all branches converge. The same set of properties is required for an nway conditional as is required for a 2way conditional plus one additional property. As with loops, we also want to know the set of nodes within the nway subgraph. This helps with unstructured cases.

We use the same notation to represent an nway conditional as a 2way conditional.

Unstructured Nway conditionals Being a generalisation of a 2way conditional, an unstructured nway conditional is similar to an unstructured 2way conditional. The three forms of unstructured nway conditionals we can encounter is shown in Figure 4.9. In graph (a) the nway conditional A3E has an unstructured entry from the back edge (F ; D ). Graph (b) has an unstructured exit from B 3F via the back edge (C ; A). The nway conditional B 3F in graph (c) has an unstructured entry via the edge (A; C ). We cannot have an unstructured exit from an nway conditional via a forward edge. If we did, then the node reached by that edge would be lower in the graph than the immediate post-dominator of the nway header. As this contradicts the definition of an immediate postdominator, such a case cannot occur.

4.4. STRUCTUREDNESS 53 Graphs (a) and (b) raise an interesting issue that must be resolved. The back edge in both of these can potentially be structured as the defining back edge for a loop. At the HLL level, overlapping nway conditionals (switch) and loops will result in a large number of gotos being used. To prevent this, we must make a choice between one or the other when they overlap. We choose to prioritise nways above loops as an nway can have a large number of clauses. If we were not to structure these as clause of an nway, then each one would involve generating a goto. Loops, on the otherhand, will only involve generating one goto for the backward edge.

A

B C D E

FF

A

DCB E

(a) (b) (c)

A

B C D E

F

Figure 4.9: Unstructured nway conditionals.

4.4.4 Multi-structure headers Unlike the original algorithm given by Cifuentes in [4], a node in a control flow graph is not restricted to being a header of only one type of control structure. However, the only case in which this can happen is when a node is the header of a repeat or do loop header and is also the header of a 2way conditional. This exception can occur in both structured and unstructured graphs as shown in Figure 4.10. Node 1 in graph (a) is both a do loop and a single-branch conditional header. This intuitively makes sense as the header of an endless loop does not actually control the loop iterations. It is merely the first node inside the loop. If this kind of multi-structuring was not enabled for a single node, a goto would have to be used instead. The same applies to graph (b). Node 1 here can be the header of both a repeat loop and a 2way conditional. The edge (2,5) will generate a goto in this case

54 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS but without the multi-structuring, another goto would also be required for the edge (4,1).

1

2 3

1 2 3

4

5 (b)(a) Figure 4.10: (a) A structured and (b), an unstructured graph in which node 1 is the header of two different control structures.

4.5 Structuring Algorithms This section presents the algorithms designed to determine the underlying control structures of an arbitrary control flow graph (i.e. structured or unstructured, reducible or irreducible). The nodes within the graph are augmented with the structuring information determined which is used during the following phase when generating the high level code.

As we are considering two variations on the algorithm used to structure loops, the treatment of both algorithms is given in separate sections. It also means that the phase diagram in Figure 4.11 describing the order in which the structuring algorithms are performed is not the usual simple sequence. Instead, there is a branch in the path followed by the structuring process when it uses different algorithms depending on how it does the loop structuring.

The determine ordering phase imposes an ordering between the nodes of the graphs. The second phase, determine reverse ordering, uses the concept of a reverse graph and imposes a similar ordering between the nodes in this graph. The set parenthesis phase prepares the nodes for loop structuring under the PT algorithm of Section 4.3.2. However as mentioned earlier, the parenthesis of a node can be used to make improvement to both quality of the generated code when 2way conditionals overlap with either loops or

4.5. STRUCTURING ALGORITHMS 55

Determine

Ordering

Determine Reverse

Ordering

Conditionals

Structure

Determine Immediate

Post-dominators

Build DerivedSequences

StructuringPhases Unstructured CFG

Restructure Strucured CFG

Structure Structure

ParenthesesSet

LoopsLoops DSG DSGPT Figure 4.11: The phases performed during structuring. Branches taken by PT algorithms are labelled `PT' and branches taken by DSG algorithms are labelled `DSG'.

nway conditionals and so this phase is common to both variations of the overall structuring algorithms. The build derived sequences phase, builds the data structures required for loop structuring under the DSG algorithm Section 4.3.1. The next phase, determine immediate post-dominators does exactly that for each node in the graph. The structure conditionals phase detects the potential conditionals represented within the CFG. The phases diverge again here depending on which loop structuring algorithm is being used. Each alternative, structure loops (PT or DSG), detects the loops represented within the CFG using one of the two loop structuring algorithms. Finally, the restructure phase gives the aforementioned improvement to the overall structuring process. It reanalyses the 2way conditionals detected in

56 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS the conditional structuring phase to see if they can be restructured when they overlap with an nway conditional or a loop. Each phase is next explained in detail.

4.5.1 Determine Node Ordering The first step in structuring a CFG is to define an ordering between the nodes within the graph. Apart from other uses, this provides an efficient means of traversing the graph in a linear but partially-ordered fashion. The ordering we define is the post-order of the nodes. The node that was last visited first during a DFS traversal will be first in the ordering and the start node will have an order equal to the number of nodes within the graph. That is, the nodes are ordered according to when they were last visited. A traversal of the nodes according to this ordering can be visualised as starting from the bottom of the graph and working to the top of the graph.

During the traversal used to determine this ordering, an ordering is also imposed on the in edges to the node from its predecessors. This is the ordering used when later algorithms include a traversal of the in edges in ascending order.

The algorithm designed to perform this traversal is given in Figure 4.12.

procedure DefineOrdering (G = (N ; E ; s); var ord) * Pre: G is a graph * Post: the reverse post ordering has been determined

traversed(s) := true for (n 2 succ(s))

append s to the end of the predecessor ordering for n if (: traversed(s))

DefineOrdering (G = (N ; E ; n); ord) order(s) := ord ord := ord + 1 end procedure

Figure 4.12: Algorithm to establish the reverse post-order within the nodes of the original CFG.

4.5. STRUCTURING ALGORITHMS 57 4.5.2 Determine Reverse Ordering The reverse graph G

rev

= (N

rev

; E

rev

; h

rev

) of the graph G = (N ; E ; h) can be

defined by the following set of equations (we assume that a unique eit node has been added if it did not already exist):

1:N

rev

= N

2:E

rev

= f(n; m) j (m; n) 2 E g

3: 9

1

h

rev

2 N ffl @n 2 N ffl (h

rev

; n) 2 E

The order between the predecessors established by the algorithm in Figure 4.12 gives us an ordering between the successors of each node in the reverse graph. The importance of having this relationship between a node's predecessors in the original graph and it successors in the reverse graph becomes apparent when determining the immediate post-dominator of each node. This process relies upon the reverse ordering of the nodes. There are multiple possible reverse orderings and so the aforementioned relationship enables us to constrain the reverse ordering to be directly related to the forward ordering.

The algorithm for determining the post-ordering of the nodes in the reverse graph in shown in Figure 4.13. Note that the input to this algorithm is the reverse graph. We never actually need to construct this graph as it is just the original graph, viewed in a different way (as shown by the above equations).

procedure DefineReverseOrdering (G

rev

= (N

rev

; E

rev

; h

rev

); var ord)

* Pre: G

rev

is the reverse of graph G and the ordering

* between the predecessors of each n 2 G has been established * Post: the reverse post ordering has been determined

traversed(s) := true for (n 2 succ(h

rev

) in ascending order)

if (: traversed(h

rev

))

DefineReverseOrdering (G = (N

rev

; E

rev

; n); ord)

revOrder(s

rev

) := ord

ord := ord + 1 end procedure

Figure 4.13: Algorithm to establish the reverse post-order for the nodes within the reverse CFG.

58 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

2 43 11 12

13

5

7 8

9

6

1

10

Node Post Order in

Reverse Graph

1 13 2 11 3 4 4 5 5 8 6 10 7 7 8 9 9 6 10 3 11 2 12 12 13 1

Figure 4.14: A control flow graph with the nodes identified by their post-order value. The table gives the post-order of each node in the reverse graph. The nodes in the `Node' column correspond with the nodes in the graph shown.

The control flow graph shown in Figure 4.14 shows how the post-ordering of the nodes enables a top to bottom (or bottom to top) traversal of the graph. The nodes are labeled by their post-order value. This convention of identifying nodes by their post-order value will be followed for the graphs presented hereafter. In this graph, the true edges for a 2way node are on the left. The accompanying table displays the post-order for each node in the reverse graph.

4.5.3 Immediate Post Dominators The intuitive interpretation of Definition 4.4 is that the immediate postdominator, ipd , of a node, n, is the closest node to n such that all paths from n to the graph's exit node pass through ipd . Hecht and Ullman in [12] give an algorithm for finding the dominator tree of a control flow graph. If we consider the graph to be the reverse of another graph, then this algo 4.5. STRUCTURING ALGORITHMS 59 rithm gives us the post-dominator tree. As we are only concerned with the immediate post-dominators, a simplification of their algorithm is fine for our purposes. This simplified version is given in Figure 4.15 and Figure 4.16.

procedure FindImmPostDominators(G = (N ; E ; s); RevG) * Pre: RevG is the reverse of G and the post-ordering of its nodes has been determined * Post: the immediate post-dominator for each node has been determined.

/* first pass */ for (all n 2 N in descending order in RevG)

iPDom(n) := undefined for (all c 2 pred

rev

(n))

if (revOrder(c) ? revOrder(n))

iPDom(n) := CommPostDom(iPDom(n),c)

/* second pass */ for (all n 2 N in ascending order in G ffl # succ(n) ? 1)

for (all c 2 succ(n))

iPDom(n) := CommPostDom(iPDom(n),c)

/* third and final pass */ for (all n 2 N in ascending order in G ffl # succ(n) ? 1)

for (all c 2 succ(n))

if (isBackEdge(n; c) ^ order(iPDom(c)) ! order(iPDom(n))

iPDom(n) := CommPostDom(iPDom(n),iPDom(c)) else

iPDom(n) := CommPostDom(iPDom(n),c) end procedure

Figure 4.15: Algorithm for calculating the immediate post-dominators.

The first algorithm, FindImmPostDominators, makes use of the ordering defined between the nodes of the reverse graph. To make sure that the immediate-post dominator for each node is correct with respect to the original CFG, we require three passes over the nodes in the graph.

The first pass traverses the nodes using the ordering defined within the reverse graph. Intuitively, we traverse the nodes in the reverse graph from top to bottom. For each node, we consider its predecessors that are higher in the reverse graph (i.e. they are not reached via a back edge). If there is only one such predecessor, then it is determined to be the immediate postdominator. Otherwise, the immediate post-dominator is initialised to be the common post-dominator of all the predecessors. Finding this common postdominator is performed by the CommPostDom algorithm. The result of this

60 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

procedure CommPostDom (var ipd; s) * Pre: ipd is the current immediate post-dominator and s is a successor of the * node being considered and is lower in the graph * Post: ipd is the immediate post-dominator of the node under consideration.

if (ipd = undefined)

ipd := s else if (s = undefined)

ipd := ipd else

while (ipd and s are defined ^ ipd 6= s)

if (revOrder(ipd) ? revOrder(s))

s := iPDom(s) else

ipd := iPDom(ipd) end procedure

Figure 4.16: Common post-dominator algorithm.

first pass is that every node except for the exit node has now been given an immediate post-dominator.

The second pass is required because the immediate post-dominator information determined in the first pass is not necessarily correct with respect to the original control flow graph. This occurs only when the original graph includes loops, which is the case for most control flow graphs. The CFG shown in Figure 4.17 is an example where we cannot rely upon the immediate postdominator information gained from only one pass. The three graphs depict the state of each node after each pass. Each node is given an alphanumeric identifier. The parenthesised data for each node is its position in the reverse ordering of the graph and its immediate post-dominator after the relevant phase. Shaded nodes are those whose immediate post-dominator changed after the previous pass.

The immediate post-dominators determined for nodes B,C and E after pass 1 are wrong. For node B, there is a path (B,D,H) to the exit node (H) that does not pass through C. Therefore, C is not a post-dominator of B, let alone the immediate post-dominator. The reason for this error is that the reverse ordering algorithm is `confused' by the presence of loops. For instance, one would expect node D to have a higher reverse order than both A and B as it is `higher' than both of these in the reverse graph. The problem arises from the fact that it is impossible to include in the reverse ordering algorithm any

4.5. STRUCTURING ALGORITHMS 61

(a) (b)

(c)

H(8,-) G(7,H)(4,B)F (6,G)E

(3,E)C (1,H)D

B(5,H)

A(2,B)

H(8,-) G(7,H)(4,B)F (6,G)E

(3,E)C (1,H)D

B(5,C)

A(2,B)

H(8,-) G(7,H)(4,B)F (6,H)E

(3,H)C (1,H)D

B(5,H)

A(2,B)

Figure 4.17: A CFG that has different reverse graphs. capabilities for dealing with loops in the reverse graph as it does not know what are back edges until the reverse ordering has been completed.

The second pass attempts to correct the remaining errors by doing another traversal based on the original ordering of the CFG. This time, only nodes that have 2 or more out edges are considered as the immediate postdominators determined for single out edges in pass one will be correct (there is no other choice of immediate post-dominators for these nodes). In the example, only node B's immediate post-dominator is corrected. Nodes C and

62 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS E are left uncorrected as they were traversed in the second pass before B was and they both rely on B to be correct.

After the second pass, there is still a possibility (as shown in the example) that some nodes within a loop (including the latching node) will still have the incorrect immediate post-dominator. The third pass corrects these remaining nodes. Preliminary analysis seems to indicate that a third pass is only required when the graph contains overlapping loops such as the two loops in the example induced by the back edges (D,A) and (F,B). The table in Figure 4.18 gives the immediate post-dominators determined for the nodes in the graph of Figure 4.14.

Node 1 2 3 4 5 6 7 8 9 10 11 12 13 ImmPDom - 12 2 2 2 2 2 6 2 2 2 1 1

Figure 4.18: The immediate post-dominators for each node in the graph of Figure 4.14.

4.5.4 Structuring Conditionals When structuring conditionals, we are analysing nodes that have more than two out edges. For the subgraph headed by one of these nodes, there will be a unique node such that all paths from each of the out edges from the conditional header converge upon it. As mentioned in Sections 4.4.2 and 4.4.3, this is referred to as follow node. Defined in this way, the follow node corresponds with the conditional header's immediate post-dominator and is already determined once this immediate post-dominator is known.

The set of nodes on all paths starting from some successor of a conditional header is called a branch. This set is defined in terms of the conditional h3f and one of the succesors of h such that it is not f .

Definition 4.12 For 9 s 2 (succ(h)nff g), a branch of the conditional h3f is fn 2 N j n is on a path s ! f ^ n 62 fh; f gg.

Note that the above definition implies that a branch may be empty. If a subgraph represents a structured conditional h3f , then it will satisfy the following two properties:

4.5. STRUCTURING ALGORITHMS 63

1. 8 b

i

2 branches of h3f ffl

"

i

(b

i

) = f g

2. 8 b 2 branches of h3f ffl (@n 2 b ffl (m; n) 2 E ^ m 62 (fhg [ b))

The first property states that the membership of the branches must be mutually exclusive (i.e no node may belong to more than one branch of a given conditional). The second property says the predecessor of any node within a branch must be either another node within the same branch or the conditional header. An unstructured conditional will violate at least one of these properties.

Whenever, conditional subgraphs are nested within each other, it may be the case that they share the same follow node. This will mean that both the outer and inner conditional headers will share the same immediate post-dominator. The same will also be true for overlapping (unstructured) conditionals.

2 43 11 12

13

5

7 8

9

6

1

10

Figure 4.19: A control flow graph with the potential conditional nodes shaded.

64 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS Consider the graph in Figure 4.19. The shaded nodes indicate the potential conditional header nodes. Even though some of these may later be structured as loop headers or latching nodes, we still consider them when structuring conditionals. The reason for this is that a node may be both a conditional and a loop header (as discussed in Section 4.4.4).

Determining the type of a 2way conditional For a 2way conditional, we seek to determine exactly what type of 2way conditional it is. The possibilities are:

ffl if..then..else - neither successor of the header is the follow node or at

least one successor is reached by a back edge.

ffl if..then - the false successor is the follow node. ffl if..else - the true successor is the follow node.

Figure 4.19 gives an example of these 2way conditional types. The conditionals (1331) and (1231) are if..else type of 2way conditionals as their true branches (i.e. the left branches) lead directly to their follow nodes. When generating HLL code for this type of conditional, we have the option of either generating an if..then..else that has an empty then clause or we can negate the condition in the header node and generate an if..then statement. The latter is a more natural looking statement and so we choose it. The conditionals (732), (932) and (1132) are if..then..else's as neither of the successors of the respective header nodes are the follow node. The conditional (532) is an interesting case as one of its out edges is a back edge. This type of node will always be initially structured as an if..then..else header. As conditional structuring is performed before loop structuring, there is still a chance it may be detected as the latching node for a loop.

Determining the nodes within an nway conditional For an nway conditional, we want to tag all the nodes on a branch of the conditional with the header node of their most enclosing nway conditional. This information will be used later when performing loop structuring.

4.5. STRUCTURING ALGORITHMS 65 The algorithm used to tag each node within a given nway conditional h3f is essentially a modified depth first traversal from h. All nodes (except for the nway header node and its follow) visited during the traversal are tagged with h as their nway header unless it has already been tagged with another nway header node or it was reached via a back edge. The base case for the recursive call is when the follow node is reached. If h3f encloses another nway conditional, h

2

3f

2

(where f

2

may be f ), then all the nodes within h

2

3f

2

will already have been tagged. This is because we use the post-ordering of

the nodes to traverse the graph in such a way that inner nway conditional will be structured first. Therefore, when we come across the header of an inner (or nested) nway conditional, we can jump directly to the follow node of the inner conditional. This means that each node will only be traversed once during all the nway tagging traversals for the complete CFG. The only nway conditional in Figure 4.19 is (1032). The nway tagging traversal for this conditional will tag nodes 3 : : 9 as having node 10 as their most enclosing nway conditional header. The algorithm that performs this traversal is shown in Figure 4.20.

procedure TagNodesInCase (G = (N ; E ; n); h; f ) * Pre: (h3f ) is an nway conditional and n = h or is in a branch of (n3f ) * Post: all the nodes within the branches of (h3f ) * that are not already members of another nway conditional have been * tagged as belonging to this nway

if (n 6= h)

caseHead(n) := h if (nodeType(n) = nway ^ condFollow(n) 6= f )

TagNodesInCase(G = (N ; E ;condFollow(n)); h; f ) else

for (all c 2 succ(n) ^ caseHead(c) is undefined)

if (c 6= f ^ : isBackEdge(n; c)

TagNodesInCase(G = (N ; E ; c); h; f ) end procedure

Figure 4.20: Algorithm to Tag the Nodes within an Nway Conditional.

Complete Conditional Structuring Algorithm The complete algorithm for structuring both types of conditionals is given in Figure 4.21. The nodes of the graph are traversed by an ascending post 66 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS order traversal. This ensures that inner conditionals are structured before outer ones. Only nodes with at least two out edges are analysed during the traversal.

procedure StructConds (G = (N ; E ; h)) * Pre: The immediate post-dominator has been determined for each node * Post: The conditional headers have been tagged, their follows have been determined. * Each node has been tagged with the header of its most enclosing nway conditional * and the type of a 2way conditional has been determined.

for (all n 2 N in ascending order ffl #succ(n) ? 1)

structType(n) := Cond if (9 m ffl (n; m) 2 E ^ isBackEdge(n; m))

condType(n) := ifThenElse else

condFollow(n) := iPDom(n)

if (n is an nway node)

condType(n) := Case TagNodesInNway(n,condFollow(n)) elseif (succ(n,Then) = condFollow(n))

condType(n) := ifElse elseif (succ(n,Else) = condFollow(n))

condType(n) := ifThen else

condType(n) := ifThenElse end procedure

Figure 4.21: Conditional Structuring Algorithm

4.5.5 Loop Structuring using Derived Sequences of Graphs This section describes how loops are structured using the DSG theory. It is based upon the work on loop structuring given in [5]. Slight modifications have been made as a result of further development on the original algorithm.

The first step in structuring loops is to determine how a loop is represented within a CFG. A loop must involve a back edge from its latching node to its header node. Therefore, we want to know both of these nodes for any loop. We also want to know which nodes are within a given loop as well as a nesting order between the loops. Hecht in [8] made the point that the

4.5. STRUCTURING ALGORITHMS 67 representation of loops by cycles is too fine as not all loops are properly nested or disjoint. The use of strongly connect components is too coarse as there is no nesting order whereas strongly connected regions do not provide a unique coverage of the graph and does not cover the whole graph. The interval and derived sequence of graphs theory discussed in Section 4.3.1 provides us the information we require: one loop per interval and a nesting order given by the derived sequence of graphs.

2 43 11 12

13

5

7 8

9

6

10

I3={3,4,10..12}

I1={13}

I5={2} 1 I2={1}

I4={5..9}

Figure 4.22: The intervals of the Control Flow Graph of Figure 4.14 For some interval I (h) (where h is the interval header), there is a loop headed by h if there is an edge from some node l 2 I (h) to h. These edges will invariably be a back edge and l will be the potential latching node for the loop. Figure 4.22 shows the intervals determined for the graph of Figure 4.14. The interval I

4

is the only interval so far that has a loop in it. This loop,

9 \Phi 5 results from the back edge from 5 to 9.

To detect the loop 12 \Phi 2, we make use of the derived sequence of graphs. To find the next graph in the sequence, we collapse each interval into a single node and form the new graph from these nodes. Figure 4.23 shows the derived sequence of graphs for the original flow graph of Figure 4.22. In the

68 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS first derived graph, G

1

, the loop 12 \Phi 2 is now detectable by the back edge

from the node I

5

(which contains node 2) to the node I

3

(which contains node

12). The derived graph G

2

does not contain any back edges and therefore,

there are no more loops in the graph.

I1 I2 I3

I4 I5

I8={I3,I4,I5} I6={I1}I7={I2} I6

I7 I9={I6,I7,I8} I9

I8

G1

G2

G3

Figure 4.23: Derived Sequence of Graphs for Figure 4.22 The nesting of loop 9 \Phi 5 within loop 12 \Phi 2 is derived from the fact that the former is was detected earlier in the sequence of derived graphs.

We made a note earlier that a back edge from within an interval to the interval header is only a potential latching node. It may be the case that a more suitable latching node can be found. Consider the example in Figure 4.24. In the first graph, G

0

, the interval I

1

contains a back edge to the interval

header, node 1, from node 2 (which is within the same interval). This will result in the initial structuring of the loop 1 \Phi 2. When considering the next graph in the sequence, G

1

, we are presented another possible latching

node, node 4 (from interval I

2

) for a loop headed by node 1. We therefore

restructure the loop to be 1 \Phi 4 as it encloses more nodes. We generalise this decision and state that the latching node found latest in a sequence of derived graphs will always overide an alternative latching node for the same loop header found earlier in the sequence.

Given a control flow graph G = G

0

that has been annotated with its intervals,

its derived sequence of graphs G

0

: : G

n

and the set of intervals of these

graphs, I

0

: : I

n

, we can use the following algorithm to structure loops: each

header of an interval in G

0

is checked for having a back edge from a latching

node that belongs to the same interval and is enclosed by the same nway

4.5. STRUCTURING ALGORITHMS 69

1 2 3

I1={1,2} I3={I1,I2}

G1 I1 I2

4

5

I2={3,4,5}

G0 Figure 4.24: A loop header with two potential latching nodes.

conditional

2

. We choose the latching with the least post-order value if there

is more than one. If a latching node was found, then we check to see if the header has already been structured as a loop header (from a derived graph earlier in the sequence). If so, then this latching node is set back to what is was determined to be (an if..then..else conditional header) before being detected as a latch node and all the nodes within the now defunct loop have their loop header marker set back to undefined. We now set the latch node to be the most recently found latch node and together with the header node, a loop has been found. We now determine its type (preTested, postTested or endless) and mark all the nodes belonging to it. Next, the intervals of G

1

, I

1

are checked for loops and the process is repeated until the intervals in

I

n

have been processed. Whenever there is a potential loop (i.e. an interval

header that has a back edge into from a node within the same interval) such that the loop header and latch node are currently marked as belonging to different loops, the new loop is disregarded as it belongs to an unstructured loop. This back edge will always generate a goto during code generation. The complete algorithm is given in Figure 4.25. This algorithm correctly determines the appropriate nesting levels between the loops. Section A.3 describes the loopNodes attribute used in this and other loop structuring algorithms.

2

See Section 4.4.3 for an explanation of why this check is made.

70 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

procedure StructLoops (G = (N ; E ; h)) * Pre: G

1

: : G

n

and I

1

: : I

m

have been constructed.

* Post: all nodes of G have been tagged with their most enclosing loop and all * loop headers have been given their loop type, loop follow (if any) and latching node

for (G

i

:= G

1

: : G

n

)

for (I

j

with header h

j

:= each of the intervals in G

i

)

if (9 x 2 nodesOf(I

j

) ffl

(x ; h

j

) 2 E ^ loopHead(x ) is undefined ^ caseHead(x ) = caseHead(h

j

) ^

@y 2 nodesOf(I

j

) ffl (y; h

j

) 2 E ^ order(y) ! order(x ))

if (h

j

is already a loop header with latch l

j

)

for (n 2 loopNodes(h

j

\Phi l

j

))

loopHead(n) := undefined reset condType(l

j

) if necessary

structType(h

j

) := loop

FindLoopNodes(h

j

; x ; I

j

)

DetermineLoopType(h

j

; x )

FindLoopFollow(h

j

; x )

end procedure

Figure 4.25: Loop Structuring Algorithm.

Determining the Nodes in a Loop The graph in Figure 4.14 contains two loops that will be detected under the DSG algorithm; 12 \Phi 2 and 9 \Phi 5. It can be seen that for both of these loops, the set of nodes given by the following equality is an intuitively valid membership for each loop:

loopNodes(h \Phi l ) = fn 2 N ffl order(l ) ? order(n) ! order(h)g That is, the loop membership is determined to be all the nodes that have a post-order value in between the post-order values of the header and latch node, excluding the header node itself. Consider the graph in Figure 4.26. The loop membership for 5 \Phi 1 given by the above condition will be f5,4,3,2,1g. However, we can see that nodes 3 and 4 are not really appropriate nodes to include in the loop. If we restrict the above condition to include a test on interval membership, we will get the desired loop membership:

loopNodes(h \Phi l ) = fn 2 N ffl

order(l ) 6 order(n) ! order(h) ^ n 2 I (h))g

That is, a candidate node for loop membership has to also be within the

4.5. STRUCTURING ALGORITHMS 71

1 2

3 4 5

6

I3=4,3} I2={5,2,1}

I1={6}

Figure 4.26: A loop that requires the use of intervals to derive the correct membership.

interval headed by the loop header. In the given example, only nodes f5,2,1g satisfy the new condition as they are all in the interval (I

2

) headed by the

loop header. They are also the only nodes we intuitively desire to determine as members of the loop.

The algorithm in Figure 4.27 is used to find all the nodes within a given loop. If a node found to be within a loop is not already part of another loop, then it is marked by setting its loopHead attribute to be the header of the loop. This ensures that all nodes are marked with the loop header of their most enclosing loop.

Determining the Type of a Loop We rely upon the header and latch nodes of a loop to determine its type. The header of a preTested loop will be a 2way node that controls the loop iterations and will have a latching node that has a solitary edge back to the loop header. One of the edges from a preTested loop header will lead to follow node for the header when it was previously structured as a 2way conditional. A postTested may have either a 2way or one edge header node and a 2way latching node. It will have a 2way header node if the first statement within

72 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

procedure FindLoopNodes(G = (N ; E ; s); h; l; I ) * Pre: (h \Phi l) is a loop. * Post: the loop nodes are marked as being within the loop and if it is their most * enclosing loop, the corresponding loop loop header tag has been set.

for (all m ffl order(h) ? order(m) ? order(l) in descending order)

if (m 2 baseNodesOf(I ))

loopNodes(h \Phi l) := loopNodes(hLoopl) [ fmg if (loopHead(m) = undefined)

loopHead(m) := h end procedure

Figure 4.27: Algorithm to tag the nodes within a loop.

the loop is a 2way conditional. An endless loop will have either a 2way or one edge header node for the same reasons as a postTested loop but it will only have a one edge latching node. The only difference between an endless loop with a 2way header and a preTested loop is whether or not one of the successors of the header is also its conditional follow node.

Using these heuristics to determine the type of a loop means that we will only ever find structured preTested loops but we can find both structured and unstructured versions of the other two types.

The types of the two loops in Figure 4.14 are as follows: 12 \Phi 2 is a preTested loop as firstly, it has a 2way header node, a successor of which, is also the follow node found when the header was initially structured as a 2way conditional. Secondly, it has a one edge latching node. The other loop, 9 \Phi 5, will be an endless loop as it has a one edge latching node and a 2way header node. Neither successor of the header is its conditional follow.

Figure 4.28 gives the algorithm to determine the type of a loop based upon its header and latching node and the conditional follow found for a 2way loop header when it was structured as a conditional. A single node loop will always be determined as either a postTested (2way latch) or an endless (one edge latch) loop. For both a postTested and endless loop, the conditional structuring information for the header will be retained if it is a 2way node.

4.5. STRUCTURING ALGORITHMS 73

procedure DetermineLoopType(G = (N ; E ; s); h; l) * Pre: (h \Phi l) is a loop and set of nodes within the loop has been determined. * Post: the type of the loop has been determined.

if (nodeType(l) = 2way)

loopType(h) := postTested if (nodeType(h) = 2way ^ h 6= l)

structType(h) := loopCond else if (nodeType(h) = 2way)

if (9 x ffl x = succ(h) ^ x = condFollow(h))

loopType(h) := preTested else

loopType(h) := endless structType(h) := loopCond else

loopType(h) := endless end procedure

Figure 4.28: Algorithm for determining the type of a loop.

Determining the Loop Follow The loop follow node is the first node that is reached upon loop termination. For both structured preTested and postTested loops, there is only one candidate for the follow node but there may be multiple candidates in the unstructured forms of these two loops as well as for endless loops. When designing an algorithm to find the follow node, we aim to find the most appropriate one if there is more than one to choose from.

In a preTested loop, the follow node is determined to the non-conditional follow node of the loop header. It will always be the case that exactly one of the header's successors will be the non-conditional follow as we used this condition to structure the loop as a preTested loop in the first place. For postTested loop, the follow node will be the successor of the latch node that is not the loop header. For an endless loop, there may be zero or more jumps out of the loop. Since the follow node is the first node that is reached upon loop termination, we aim to find the node that is reached from the loop after an exit such that it is higher in the graph than all other nodes that are reached from the loop after an exit. This will be the candidate node with the highest post order value. This property of the chosen follow node means that all the other candidate nodes can be reached from it.

74 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

procedure FindLoopFollow(G = (N ; E ; s); h; l) * Pre: (h \Phi l) is a loop and its nodes and type have been determined * Post: the loop's follow (if any) has been determined

if (loopType(h) = preTested)

if (succ(h; 1) = condFollow(h))

loopFollow(h) := succ(h; 1) else

loopFollow(h) := succ(h; 2) else if (loopType(h) = postTested)

if (succ(l; 1) = h)

loopFollow(h) := succ(h; 2) else

loopFollow(h) := succ(h; 1) else /* endless loop */

foll := undefined for (all m 2 loopNodes(h \Phi l) ffl nodeType(m) = 2way)

for (all x ffl x 2 succ(m) ^ x 6= h)

if (x 62 loopNodes(h \Phi l) ^ (order(x ) ? order(foll) .

foll = undefined)) foll := x if (foll 6= undefined)

loopFollow(h) := foll end procedure

Figure 4.29: Algorithm for determining the follow node of a loop.

In Figure 4.14, 12 \Phi 2 has node 1 as it follow node and 9 \Phi 5 has node 2 as its follow node. Figure 4.29 gives the algorithm to determine the follow node of a loop based on the type of the loop detremined in the algorithm of Figure 4.28 and the set of nodes within the loop determined in the algorithm of Figure 4.27.

4.5.6 Loop Structuring using Parenthesis Theory This section describes the loop structuring algorithms that have been developed as a more efficient alternative to those given in Section 4.5.5. Under the parenthesis theory (PT), we need to alter our definition of how loops are represented within a control flow graph. Without the use of the extra data structures defined for the DSG algorithm, we simply define a loop to potentially occur where there is a back edge. That is, we initially structure loops only by detecting cycles within the graph. We have to then rely of the

4.5. STRUCTURING ALGORITHMS 75 parentheses of the nodes to further constrain the loop structuring. The cycles are found during a traversal of the nodes in ascending order. For a graph with only structured loops, this will ensure that outer loops are discovered before inner loops. We retain this assumption and use extra analysis in the face of unstructuredness.

Having found a back edge (l ; h), we initially check the nodes l and h for meeting three conditions before structuring them as the latch and header of a new loop. Together, they must satisfy the following conditions:

1. Both h and l must currently belong to the same loop (i.e. have the

same loop header). This may be undefined if neither is enclosed by a loop. This ensures that we do not structure overlapping loops.

2. Both h and l must currently belong to the same nway conditional (i.e.

have the same nway header). This may be undefined if neither is enclosed by an nway conditional

3

.

3. There must be no other back edge (l

2

; h) such that l

2

is lower than l

in the graph. This ensures that the most enclosing loop (i.e. the loop with the largest membership) is found.

Having found a new loop, h \Phi l , we then determine its membership, its type and its follow node. The membership information is used when looking for the next loop. The algorithm shown in Figure 4.30 performs this loops structuring.

The graph in Figure 4.14 will have both the back edges, (2,12) and (5,9), analysed by this algorithm as potential loop inducing back edges. The edge (2,12) will be analysed first as node 2 is lower in the graph than node 5. This demonstrates that the outer loops will be structured first. This edge will successfully induce the new loop, 12 \Phi 2, as it meets the three necessary conditions; neither of them belong to a loop, neither belong to an nway conditional and node 2 is the only (and therefore lowest) node with a back edge to node 12. We delay discussion of whether or not the edge (5,9) induces a new loop until we have presented the method used to decide which nodes are in a loop.

3

See Section 4.4.3 for an explanation of why this check is made.

76 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

procedure StructLoops (G = (N ; E ; h)) * Pre: G is a graph and the parenthesis pair for each node have been determined. * Post: all nodes of G have been tagged with their most enclosing loop. All loop * headers have been given their loop type, loop follow (if any) and latching node.

for (all h 2 N in ascending order)

if (9 x 2 pred(h) ffl

loopHead(x ) = loopHead(h) ^ caseHead(x ) = caseHead(h) ^ isBackEdge(x ; h) ^ @y 2 pred(h) ffl order(y) ! order(x )) if (x 6= h)

structType(x) = seq structType(h) = loop FindLoopNodes(G; h; x ) DetermineLoopFollow(G; h; x ) end procedure

Figure 4.30: Loop Structuring Algorithm.

Determining the Nodes in a Loop Given the loop h \Phi l , we make use of the parenthesis theory introduced in Section 4.3.2 to determine which nodes are in the loop. We determine a node n to be within the loop if n O/ h and l O/ n. The latching node is also considered to be within the loop body but the header is not. Using both parentheses for the node ensures that we detect as many nodes as possible using this method.

Figure 4.31 gives a graphical representation of the true parenthesis in (a) and false parenthesis in (b) for each node in the graph of Figure 4.14. For the loop 12 \Phi 2, we detect the nodes 11 and 3 using the true parenthesis and nodes 10, 9, 8 and 6 using the false parenthesis to be nodes of the loop. This can be explained visually br referring to the example graph and its nodes' parentheses. We see that the line in Figure 4.31 representing the false parenthesis of node 2 (the latch node) lies within the line representing the false parenthesis of 10 which in turn lies within the line representing the false parenthesis of 12 (the header node). Therefore, node 10 is within the loop. The same relationship between the lines of the other nodes determined to be within the loop can also be observed. As with the DSG algorithm, we only need to consider the nodes that lie within the post-order range of the loop header and latch. The following equation defines the set of loop nodes that

4.5. STRUCTURING ALGORITHMS 77

13 11 10

9 8 7 6 5 4 3 2 1

12

(a)

(b)

13 11 10

9 8 7 6 5 4 3 2 1

12

ParenthesesNodes

Figure 4.31: The (a) true and (b) false parenthesis for each node in the graph of Figure 4.14.

will be found: loopNodes(h \Phi l ) = fn 2 N fflorder(l ) ?order(n) !order(h) ^ n 2 I (h))g The number of nodes detected to be within the loop is considerably less than was detected by the DSG algorithm. However it is the most that can be achieved using only information provided by a DFS tree perspective of the

78 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS graph (which is essentially what the parenthesis theory gives us). An immediate effect of having determined node 9 to be in the loop but not node 5 is that when we analyse the back edge (5,9) as a potential loop inducing back edge, the first condition for it to induce a loop will not be met. That is, node 5 and 9 will not both belong to the same loop.

Figure 4.32 give the algorithm for finding the nodes in a loop given the loop header and latch node as well as each node's pair of parenthesis.

procedure FindLoopNodes(G = (N ; E ; s); h; l; I ) * Pre: (h \Phi l) is a loop. * Post: the loop nodes are marked as being within h \Phi l and if it is their most * enclosing loop, the corresponding loop header tag has been set.

for (all m ffl order(h) ? order(m) ? order(l) in descending order)

if (m O/ h) ^ (l O/ m . m = l))

loopNodes(h \Phi l) := loopNodes(h \Phi l) [ fmg loopHead(m) := h end procedure

Figure 4.32: Algorithm to tag the nodes within a loop.

Determining the Type of a Loop As with the DSG loop type determining algorithm, we are given the header and latch nodes of a loop, from which we must determine the type of the loop. As we aim to preserve as much similarity between the functionality of the two variations of loop structuring algorithms as possible, we use the same algorithm for this component of loop structuring as was given for the DSG algorithm in Figure 4.28. The type determined for the loop 12 \Phi 2 in our example graph is therefore the same as if detected under the DSG algorithm; node 1.

Determining the Loop Follow As with the procedure followed to determine the type of a loop, we once again borrow heavily from the DSG algorithm when designing the algorithm to determine the follow node of a loop. The only change we have to make to

4.5. STRUCTURING ALGORITHMS 79 the algorithm shown in Figure 4.29 is to the subcomponent that determines the follow node for an endless loop.

If a conditional node, n, and its follow, f , are both detected as being in a loop, there is no guarantee (in contrast to the DSG algorithm where there is a guarantee) that both of h's successors will also be in the loop. For example, consider the endless loop 8 \Phi 1 in Figure 4.33. The shaded nodes indicate those determined to be within the loop. If we were to apply the same method for determining the loop follow of an endless as was used in the corresponding DSG algorithm, then node 4 will be considered as a potential loop follow. Ihis is obviously a bad choice as the only path from node 4 leads directly to a node within the loop. The modification we make to the algorithm to ensure that this type of node is not considered as a potential endless loop follow is as follows: the follow of an endless loop (if any) is determined to be the node with the highest post-order value such that it is not within the loop and it is the successor of a 2way conditional header that is in the loop but whose conditional follow is not in the loop.

The algorithm used to find the follow is given in Figure 4.34. If whilst searching for a 2way conditional that has a follow outside the loop, we come across one, h3f , that has its follow inside the loop, we do one of two things depending on whether f is lower in the graph than h (i.e. has a lesser postorder value):

1. If f is lower in the graph than h, we continue the traversal from it as all

the nodes whose post order value is in the range order(f ): : order(h) will be within the body of the conditional and will therefore also really be inside the loop (even though it may not be tagged as such). This will happen when analysing node 3 in Figure 4.33.

OR

2. If f is not lower in the graph than h then this the result of an unstructured backwards jump. This will hold for any conditionals further down the graph within the loop and so we just break out of the traversal of the nodes within the loop.

If after completing this traversal a suitable follow has been found, then it is set to be the follow of the loop.

80 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

1 2 3 4

5

7

8

6

Figure 4.33: An Endless loop. 4.6 Restructuring 2way Conditionals To improve the code generated when a 2way conditional overlaps with a loop or nway conditional, we make use of the nodes' nway header tag and their loop header tag. For a 2way conditional, h3f , that represent an unstructured jump into or out of a loop, h's loop header tag will be different to f 's loop header tag. Similarly for an unstructured jump into or out of an nway conditional -- h and f 's respective nway header tags will be different. We use this information to modify the type of a 2way conditional and its follow where doing so will make for generation of better HLL code.

We justify the need for restructuring through an example. The example includes some generated HLL code. There is no great need to understand how this code was generated, just to observe the overlook of it. Consider nodes 7 and 9 of the graph in Figure 4.14. Node 7 is the header of the 2way conditional 732 which has been determined as an ifThenElse type of conditional and 9 is the header of the conditional 932 (as well as being the header of the loop 9 \Phi 5) which also has the type ifThenElse. If we were to leave the 2way conditionals as they were originally structured and use a code generation algorithm similar to that given by Cifuentes in [5] then, the subgraph entered by 9 and exited at 2 would generate the code shown in Figure 4.35.

4.6. RESTRUCTURING 2WAY CONDITIONALS 81

procedure FindLoopFollow(G = (N ; E ; s); h; l) * Pre: (h \Phi l) is a loop and its nodes and type have been determined * Post: the loop's follow (if any) has been determined

if (loopType(h) = preTested)

if (succ(h; 1) = condFollow(h))

loopFollow(h) := succ(h; 1) else

loopFollow(h) := succ(h; 2) else if (loopType(h) = postTested)

if (succ(l; 1) = h)

loopFollow(h) := succ(h; 2) else

loopFollow(h) := succ(h; 1) else /* endless loop */

foll := undefined for (all m 2 loopNodes(h \Phi l) in descending order ffl

nodeType(m) = 2way ^ condFollow(m) is defined) if (condFollow(m) 2 loopNodes(h \Phi l))

if (order(m) ? order(condFollow(m)))

m := condFollow(m) else

break else

if (9 x 2 succ(m) ffl

x 62 loopNodes(h \Phi f ) ^ (foll = undefined . order(foll) ! order(x ))) foll := x if (foll 6= undefined)

loopFollow(h) = foll end procedure

Figure 4.34: Algorithm for determining the follow node of a loop.

We can improve on this by recognising that one of the branches of each of the two conditionals, 932 and 732, leads along a path that does not go through the latch node of the enclosing loop, 9 \Phi 5. Therefore, we can view both of these branches as leading to an unstructured exit of the loop and the other branches as staying within the loop. Consider the code shown in Figure 4.36 that we can generate using the code generation algorithms presented in Chapter 5 if we were to restructure both conditionals as ifElse's, resetting their follows to be the true successor of the header.

This is a marked improvement over the previously generated code as we

82 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS

"begin-verbatim""

do -

BB9; if (cond.BB9) -

BB7; if (cond.BB7) -

goto L5; "" else - L6: BB6;

goto L2; "" goto L2; "" else -

BB8; goto L6; "" L5: BB5;

"" while (cond.BB5); L2: BB2;

Figure 4.35: HLL code generated by an existing code generation algorithm.

have halved the number of gotos generated. We now explain how such restructuring is performed.

The restructuring relies upon the use of the nodes' parentheses. As mentioned in Cormen [7], the parenthesis can be used to determine ancestor relationships between the nodes in an underlying DFS tree of a graph. In terms of the definitions we gave in Section 4.3.2, a node n is ancestor on a node m if m O/ n. This also implies that there is a path from m to n in the graph.

As mentioned earlier, we attempt to restructure a 2way conditional header if its follow is not in the same loop of nway conditional as its header. The algorithm to do this is given in Figure 4.37. This algorithm looks for 2way headers with a branch that leads to an unstructured entry or exit to or from a loop or nway. We also restructure 2ways nodes that have a successor reached by a back edge.

4.6. RESTRUCTURING 2WAY CONDITIONALS 83

do -

BB9; if (!cond.BB9) -

BB8; L6: BB6;

goto L2; "" else BB7; if (!cond.BB7) -

goto L6; "" BB5; "" while (cond.BB5); L2: BB2;

Figure 4.36: HLL code generated as a result of 2way conditional restructuring.

To detect which branch leads to the `offending' entry or exit, we make use of the ancestor relationship defined by the nodes' parentheses. For example, when restructuring a 2way conditional, h3f , in which one branch makes an unstructured exit from a loop we will find it if both the following conditions are true:

1. h has a different loop header than f . 2. 9

1

s 2 succ(h) ffl there is a path from s to the latch node of the loop

enclosing h (s may be the latch node).

If both hold, then s is determined to be the new follow for the conditional headed by h. If h is the true successor of h, then the type of the conditional is changed to be ifElse otherwise it is set to ifThen. For example, node 7 in Figure 4.14 is a candidate for restructuring as it is inside the loop 9 \Phi 5 but its follow, 2, is outside the loop. One of the successors of 7, node 5, happens to be the latch node of the loop enclosing 7 and so it is set to be the new follow for the conditional headed by node 7. Also, the type of this conditional is changed from ifThenElse to ifElse as it is the else clause that leads to the unstructured loop exit.

A similar restructuring is performed for unstructured entries into an nway.

84 CHAPTER 4. STRUCTURING ASSEMBLY GRAPHS We can not have an unstructured forward exits from an nway as the any conditional within an nway will have the same follow as the nway.

Lastly, we restructure any 2way nodes that have a successor, s, reached by a back edge that was not structured as the latch node for a loop. We set the new follow of the 2way to be the successor not reached by the back edge. If s is the true successor of the 2way, it type is changed to be ifThen otherwise it is set to ifElse.

4.6. RESTRUCTURING 2WAY CONDITIONALS 85

procedure Restructure2ways (G = (N ; E ; s)) * Pre: loop and conditional structuring has been performed * Post: 2way conditionals may have had their condType and condFollow changed

for (all n 2 N ffl nodeType(n) = 2way ^ n is the header of n3f )

if (loopHead(n) 6= loopHead(condFollow(n)))

/* check for jump out of a loop */ h

n

\Phi l

n

:= the loop in which n is nested

h

f

\Phi l

f

:= the loop in which condFollow(n) is nested

if (9 x 2 succ(n) ffl x = l

n

. l

n

O/ x

if (x = succ(n,true))

condType(n) := ifElse condFollow(n) := succ(n,true) else

condType(n) := ifThen condFollow(n) := succ(n,false) /* check for a jump into a loop */ else if (9 x 2 succ(n) ffl x = h

f

. h

f

O/ x )

if (x = succ(n,true))

condType(n) := ifElse condFollow(n) := succ(n,true) else

condType(n) := ifThen condFollow(n) := succ(n,false) /* check for jump into an nway */ else if (n

f

6= condFollow(caseHead(n) ^

9 x 2 succ(n) ffl caseHead(n) 6= caseHead(x )) if (x = succ(n,true)

condType(n) := ifThen condFollow(n) := succ(n,false) else

condType(n) := ifElse condFollow(n) := succ(n,true) /* check for a back edge from a 2way header */ else if (condFollow(n) is undefined)

if (isBackEdge(n,succ(n,true)))

condType(n) := ifThen condFollow(n) := succ(n,false) else

condType(n) := ifElse condFollow(n) := succ(n,true)

Figure 4.37: Algorithm to Restructure 2way Conditionals

Chapter 5 Code Generation The information summarised for a control flow graph is used in code generation to generate high level language (HLL) control flow structures. To ensure that we generate code at the appropriate nesting levels (represented by indentation in the output), the task of generating code is tackled as follows: we generate code for the root of a graph, n and then recurse on the successors of n that are within the structure rooted at n. Each level of recursion also represents a successive nesting level and so the code generated within the recursive call is indented one more indent space (which can be arbritarily set) than the code for the node making the recursive call. Upon termination of the recursion, code is then generated for the follow node of the structure rooted at n. This follow node was determined to be the first node reached from a structure once execution of that structure had completed. A follow was only determined during the structuring phase for compound structures such as loops and conditionals. The follow for the other structures such as sequences and calls is simply the successor of these nodes as they are the only place to which they can transfer control. A return node does not transfer control anywhere and hence will generate code for itself and then simply terminate the sequence of recursive code generation call of which it is at the end.

The algorithms used to generate the code from a CFG are independent of the algorithms used to structure the CFG to a large extent. As long as both have the same context with respect to a set of HLL control structures, then it does not matter how these structures were determined. For this reason, we can use the algorithms presented in this chapter to generate HLL code for a control flow graph that was structured using either the derived sequence of graphs

87 (Section 4.3.1) or the parenthesis theory (Section 4.3.2) method to structure loops. However, the code generated will depend on what structures were determined by the respective algorithms. Figure 5.1 reproduces the CFG example used in Chapter 4 and shows the HLL structure determined for each node in the graph by the derived sequence of graphs (DSG) algorithms.

2 43 11 12

13

5

7 8

9

6

1

10

Node Struct Foll

1 seq 2 seq 12 3 seq 2 4 seq 2 5 seq 2 6 seq 2 7 ifThenElse 2 8 seq 6 9 postTest 2 10 nway 2 11 ifThenElse 2 12 preTest 1 13 ifElse 1

Figure 5.1: The HLL structure type determined for each node in the reproduced control flow graph of Figure 4.14 with respect to the DSG structuring algorithms. For 2way nodes, the true branch is the left one.

This chapter describes the different algorithms used to generate code for the respective structures and then demonstrates how these are combined to make the one recursive code generation procedure, CodeGen(). As we are only performing control flow analysis to the program, only HLL control flow statements will be generated. The rest will remain in assembly-like form.

The following sets are used to assist with code generation:

followSet - all the follow nodes of the control structures in which the current

node is nested

gotoSet - the nodes which the current node must not generated code for

88 CHAPTER 5. CODE GENERATION The semantics of these sets will be made clearer throughout the following sections.

5.1 Notation and Conventions Used The pseudocode used for the algorithms in this chapter uses the same format as that used in Chapter 4 with some additions for describing output. We use the procedure write for output and adopt C style format strings. For example:

write(""%s do "-"CNewline") outputs the following:

do - followed by a new line to the screen (or file). Several other minor functions will be used that we can define here:

ffl Indent(i ) - return a string of i tabstops. ffl AsmCond(n) - return the assembly opcode for the conditional branch

instruction that delimits the basic block represented by the node n.

ffl WriteBB(n; i ) - write out the sequence of non-transfer instructions in

the basic block represented by n, one per line. Each line will be indented by i tabstops.

ffl EmitGotoAndLabel(n; i ) - generates a goto jump to the label for n

(represented by Ln) and places the label at the position where the code has been or will be generated for n. As we have chosen C as our target language, this procedure actually does a little bit of analysis to see if it is possible to generate a continue or break instead. The former can be generated if n is the loop header of the most nested loop enclosing the node generating the goto and the latter if n if the follow node of the same loop.

5.2. GENERATING CODE FOR LOOPS 89 When generating code for any node in a structure that involves a conditional expression, we make use of the opcode of the underlying conditional assembly instruction. For example, when generating code for a while loop header node representing the following basic block:

... bge .LL3 nop

we will generate the expression while (cond BBn) ... where n is the node representing the above basic block. If necessary, we negate the condition by prepending an exclamation mark to the opcode as while (!cond BBn) ... . The opcode to be used will be retrieved using the asmCond() function define above. In the HLL code samples given, we use the above convention of denoting the assembly condition for some basic block n as cond BBn.

As we are only interested in demonstrating the HLL control structures generated, in our examples we represent the output of WriteBB for some node n as BBn.

5.2 Generating Code for Loops The subgraph for a loop will have its code generated when the loop header is traversed during code generation. This header node, h, will have the details of the loops body, type and follow node. The code for any type of loop has the same structure: the loop header, loop body and the loop trailer. The condition controlling the loop iterations can be either in the loop header as for a preTested loop or in the trailer as for a postTested loop. For an endless loop, depending on what high level constructs are available and the type of code we wish to generate, the condition could be in either the loop header or trailer.

Consider the loops 12 \Phi 2 and 9 \Phi 5 in the graph of Figure 5.1. For the loop 12 \Phi 2, the body of the loop is executed if the underlying conditional branch delimiting the basic block represented by node 12 evaluates to false. This is because it is the false edge from node 12 that leads into the loop body.

The algorithm in Figure 5.2 generates the code for a loop. The structure of the algorithm consists of four phases, one for generating code for each of

90 CHAPTER 5. CODE GENERATION the following; loop header, body, trailer and follow. We discuss these in the following sections. All explanations are given in terms of the loop h \Phi l .

5.2.1 Generating Code for a Loop Header Generating code for a header, h, of a preTested loop involves generating the code for the basic block represented h followed by a while statement with the appropriate loop condition. For a postTested loop, we need to output the header of a do loop followed by the code for the basic represented by h. This latter code needs to be generated at the next indentation level as it is within the loop body. An endless loop is a mixture of the two. It has a similar header to a preTested loop except that the condition is a constant -- true. Its represented basic block is generated at the next indentation level.

In the given example, this phase will generate the following loop header for node 12:

BB12 while (cond.BB12) -

5.2.2 Generating Code for a Loop Body When generating code for the body of the loop h \Phi l , we want to ensure two things:

1. The code for l 's basic block is generated at the same indentation level

as the first node within the loop.

2. Code is not generated for the loop follow node from a node within the

loop

To ensure both of these conditions, we update the parameters latch and followSet that are passed to the CodeGen procedure so that the former is now l and the latter includes the follow of the current loop.

The body of a loop is contained in the subgraph headed by one of h's successors or possible by h itself in the case where it is also a conditional header. It

5.2. GENERATING CODE FOR LOOPS 91 is at a nesting level one greater than the current nesting level and therefore its code is generated at an indentation level one greater than the current indentation level.

In the case of a preTested loop, the loop body is headed by the successor that is not the loop follow node. To generate code for the loop body in this case, we perform a call to the CodeGen procedure on the appropriate successor.

For a postTested or endless loop that is also a conditional header, we initially reset the necessary attributes for h so that code can be generated for the conditional it heads (which also generates the code for the body of the loop) and another call is made to CodeGen for h. Code will be generated for h's basic block during this call. If the loop is not a conditional header, then we generate code for h's basic block at the next indentation level as the first node of the loop body and then generate code for the rest of the body with a call to CodeGen on h's only successor.

For any type of loop, we make sure that the code for l 's basic block is generated if it hasn't already been generated.

5.2.3 Generating Code for a Loop Trailer Generating the loop trailer for an endless loop only involves generating the closing bracket. The same applies for a preTested loop except that we must firstly rewrite the code for h's basic block. The reason for this is that the assembly code that determines the value of the condition used by the conditional branch at the head of the loop comes before this instruction in the basic block. So, to ensure that the condition is set correctly on each iteration, we must repeat the condition setting instructions. This does not disrupt the functionality of the original code and typically only involves a minimal amount of code duplication.

The loop trailer for a postTested loop is similar to the loop header for a preTested loop. We need to generate the conditional expression controlling the loop iterations. In this case however, this while expression will come after the closing brace for the loop body.

92 CHAPTER 5. CODE GENERATION 5.2.4 Generating Code for a Loop Follow Continuing to generate code for the graph rooted at h involves generating code for the graph rooted at the loop's follow. This applies no matter what type of loop it is. In the case of an unstructured control flow graph, there is the possibility that the loop's follow node may have already been generated. If this is the case, we emit a goto label at the position where this code has been generated as well as generating a goto at the current position to jump to this label.

5.3 Generating Code for 2way Conditionals The task of generating code for a 2way conditional, h3f involves similar steps as used when generating code for loops. We generate code for the conditional header which will mean writing an if..then or if..then..else with the appropriate expression controlling which conditional clause is executed. We use an if..then and a negated expression to generate code for an ifThen type of conditional. We use the AsmCond() function to find the appropriate expression.

Following this, code is generated for each of the (non-empty) clauses headed by a successor of h along with the appropriate clause-delimiting structures (i.e. case labels or the else statement). We take into account the fact that a 2way may have been restructured. In this case, we update the gotoSet, if h was determined to be the header of an unstructured entry or exit to/from a loop, to include the original conditional follow, the latch node (if any) of the loop enclosing f (if any) and the loop header node (if any) of the loop enclosing h. This prevents the code for any of these nodes being generated from within this conditional as they will all be reached by some other more appropriate path.

Finally, the closing bracket and code for the conditional follow is generated if it has not already been generated. If it has, a goto is inserted at the correct indentation level and a label is placed at the destination of the

The algorithm for generating code for the graph rooted at a 2way conditional is given in Figure 5.4. goto.

5.3. GENERATING CODE FOR 2WAY CONDITIONALS 93

procedure WriteLoop (n; i; latch; followSet; gotoSet) * Pre: n is the header of the loop n \Phi l, i is the current indentation level, * latch is the latching node of the current enclosing loop (if any), followSet is * the set of follow nodes for enclosing structures and gotoSet is the set of nodes * for which code must not be generated by this call * Post: code for the graph rooted at n is generated

traversed(n) := true followSet := followSet + floopFollow(n)g

/* Write loop header */ switch (loopType(n))

preTested:

writeBB(n; i) if (succ(n,false) = loopFollow(n))

write("%s while (%s) fnn",Indent(i),asmCond(n)) else

write("%s while (!%s) fnn",Indent(i),asmCond(n)) postTested:

write ("%s do fnn",Indent(i)) endless:

write ("%s while (true) fnn",Indent(i))

/* Write loop body */ switch (loopType(n))

preTested:

if (succ(n,true) = loopFollow(n))

CodeGen(succ(n,false),i; l; followSet; gotoSet) else

CodeGen(succ(n,true),i; l; followSet; gotoSet) postTested . endless:

if (structType(n) = loopCond)

traversed(n) := false structType(n) := cond CodeGen(n,i + 1; l; followSet; gotoSet) else

WriteBB(n; i + 1) CodeGen(succ(n,1),i + 1; l; followSet; gotoSet) if (traversed(l) = false)

WriteBB(l; i + 1) (cont)

Figure 5.2: Algorithm to Generate Code for a Loop Header Rooted Graph

94 CHAPTER 5. CODE GENERATION

(cont)

/* Write loop trailer */ switch (loopType(n))

preTested:

WriteBB(n; i + 1) write ("%s gnn",Indent(i)) postTested:

if (succ(l,false) = loopFollow(n))

write("%s g while (%s) nn",Indent(i),asmCond(l)) else

write("%s g while (!%s) nn",Indent(i),asmCond(l)) endless:

write ("%s gnn",Indent(i))

/* Continue with loop follow */ followSet := followSet - floopFollow(f )g if (traversed(loopFollow(f )) = false)

CodeGen(loopFollow(n),i; latch; followSet; gotoSet) else

EmitGotoAndLabel(loopFollow(n),i) end procedure

Figure 5.3: Algorithm to Generate Code for a Loop Header Rooted Graph -- (cont)

5.4 Generating Code for Nway Conditionals The procedure for generating code for the subgraph rooted at an nway node is as follows: the code for the nway basic block is generated followed by a switch statement rpresenting the indexed jump. The expression used in the switch statement is simply the register upon which the jump was performed. This is retrieved from the basic block represented by the nway node by the function NwayReg(). Once again, data flow analysis is required to provide a HLL expression instead. Code is then generated for the subgraphs rooted at each succesor of the nway. The case label for each clause of the nway is simply the position of the edge to that clause within the sequence of out edges from the nway (given by the function Index()). Then the nway trailer is generated as is the code for the graph rooted at the conditional follow of the nway. A goto and the appropriate label is emmitted whenever a node, whose code has already been generated, is reached during any of the above phase of nway code generation.

5.4. GENERATING CODE FOR NWAY CONDITIONALS 95

procedure Write2way(n; i; latch; followSet; gotoSet) * Pre: n is the header of the 2way conditional, n3f , i is the current indentation level, * latch is the latching node of the current enclosing loop (if any), followSet is * the set of follow nodes for enclosing structures and gotoSet is the set of nodes * for which code must not be generated by this call * Post: code for the graph rooted at n is generated

if (n was not restructured)

followSet := followSet + ff g else

if (n heads a jump into/out of a loop)

gotoSet := gotoSet + fiPDom(n); latch;loopHead(f )g else if (n heads a jump into an nway)

followSet := followSet + ff g /* write conditional header */ WriteBB(n; i) if (condType(n) = ifElse)

write ("%s if (!%s) fnn",Indent(i),asmCond(n)) else

write ("%s if (%s) fnn",Indent(i),asmCond(n)) /* write body of `then' clause */ if (condType(n) = ifElse)

c := succ(n,false) else

c := succ(n,true) if (traversed(c) . c =loopFollow(loopHead(n)))

EmitGotoAndLabel(c; i + 1) else

CodeGen(c; i + 1; latch; followSet; gotoSet) /* write body of `else' clause */ if (nodeType(n) = ifThenElse)

write ("%s g else nn%sf",Indent(i),Indent(i)) if (traversed(succ(n,false)))

EmitGotoAndLabel(succ(n,false),i + 1) else

CodeGen(succ(n,false),i + 1; latch; followSet; gotoSet) /* write tailer and continue with follow */ write("%s gnn",Indent(i) remove all nodes added to followSet and gotoSet

that weren't in them on entry to this procedure if (f is defined)

if (traversed(f ))

EmitGotoAndLabel(f ,i) else

CodeGen(f ; i; latch; followSet; gotoSet) end procedure

Figure 5.4: Algorithm to Generate Code for for 2way Rooted Graph

96 CHAPTER 5. CODE GENERATION The algorithm in Figure 5.5 performs the code generation for a graph rooted at an nway node.

procedure WriteNway(n; i; latch; followSet; gotoSet) * Pre: n is the header of the nway conditional, n3f , i is the current indentation level, * latch is the latching node of the current enclosing loop (if any), followSet is * the set of follow nodes for enclosing structures and gotoSet is the set of nodes * for which code must not be generated by this call * Post: code for the graph rooted at n is generated

WriteBB(n; i) write ("%s switch (%s) f nn", Indent(i), NwayReg(n)) for (all s 2 succ(n))

write ("%s case %s: nn",Indent(i + 1),Index(s)) if (traversed(s) = false)

CodeGen(s; i + 2; latch; followSet; gotoSet) else

EmitGotoAndLabel(s; i + 2) write ("%s break; nn", Indent(i+2)) remove f from followSet if it was not in it upon entry to this procedure if (traversed(f ) = false)

CodeGen(f ; i; latch; followSet; gotoSet) else

EmitGotoAndLabel(f ; i) end procedure

Figure 5.5: Algorithm to Generate Code for a Nway Rooted Graph

5.5 Generating Code for Non-Compound Structures

The algorithm for generating the code of n, a one edge, fall through, return or call node, is as follows: the code for the basic block represented by n is generated by WriteBB. If n is a return node, then a return statement is generated after the call to WriteBB.

We generating code for the follow of n (i.e. its sole successor in the graph), we the use the loop header and nway header information to generate a goto if the successor is in a different structure or has already been generated. The only time we don't generate a goto under these conditions is when the successor is only reached from n. Neglecting this case would otherwise mean that the successor would never be generated.

5.6. THE COMPLETE ALGORITHM 97 The algorithm in Figure 5.6 performs the code generation for a graph rooted at an non-compound node.

procedure Write1way(n; i; latch; followSet; gotoSet) * Pre: n is a non-compound (i.e. 1way) node, i is the current indentation * level, latch is the latching node of the current enclosing loop (if any), followSet * is the set of follow nodes for enclosing structures and gotoSet is the set of nodes * for which code must not be generated by this call * Post: code has been generated for the graph rooted at n

WriteBB(n; i) if (nodeType(n) = ret)

write("%s return; nn",Indent(i)) return s := succ(n; 1) if (traversed(s) . (loopHead(s) 6= loopHead(n) ^

(9 p 2 pred(s) ffl traversed(p) = false . s 2 followSet)) . s = loopFollow(loopHead(latch)) . (caseHead(s) 6= caseHead(n) ^ s 6= condFollowcaseHead(n))) EmitGotoAndLabel(n; i) else

CodeGen(n; i; latch; followSet; gotoSet) end procedure

Figure 5.6: Algorithm to Generate Code for a Non-compound Rooted Graph

5.6 The Complete Algorithm We can now define the complete algorithm for generating the code for a control flow graph that has been annotated with structuring information. The procedure takes as arguments a pointer to a node/basic block n, the indentation level to be used, the latching node of any enclosing loop (will be the null pointer if there no enclosing loop), the set of follow nodes for any enclosing conditionals or loops and a set of nodes for which code must not be generated by the current call to this procedure. Initially, n is set to be the root of a control flow graph, the indentation level is 1, the latching node pointer is null and the two sets are empty. If n is within gotoSet we will generate a goto to it if items 1 and (2 or 3) from the following are met:

1. n is not a latch node. When a latch node is put into the gotoSet, it code

will be generated by its corresponding loop header instead to ensure

98 CHAPTER 5. CODE GENERATION

that it is generated at the same indentation level at the loop head. 2. it is the loop follow of the currently enclosing loop. Being in the gotoSet

means that code for this follow will be generated else where.

3. there exists some predecessor of n that has not yet been traversed

during code generation. This means a goto to n will not be generated if this is the last time it will be traversed.

If a goto has not been generated for n yet, then if it is in the followSet but not the most recently added node in this set, a goto will be generated otherwise the CodeGen procedure returns as code for the n will be generated by the structure that added the latest node to the followSet. The procedure also returns at this point if a call to CodeGen on n has been made previously and reached the point where its traversal flag was set.

Finally, a call is made to the appropriate code writing algorithm for n based on its type determined during structuring. The complete algorithm is given in Figure 5.7.

5.6. THE COMPLETE ALGORITHM 99

procedure CodeGen(n; i; latch; followSet; gotoSet) * Pre: n is a node in a CFG that has been structured, i is the current indentation * level, latch is the latching node of the current enclosing loop (if any), followSet * is the set of follow nodes for enclosing structures and gotoSet is the set of nodes * for which code must not be generated by this call * Post: code for the graph rooted at n is generated

recentFoll := the most recently added member of followSet if (n 2 gotoSet ^ isLatchNode(n) =false^

(n = loopFollow(loopHead(latch)) . 9 p 2 pred(n) ffl traversed(p) = false)) EmitGotoAndLabel(n; i) return else if (n 2 followSet)

if (n 6= recentFoll)

EmitGotoAndLabel(n; i) return if (traversed(n) = true)

return traversed(n) = true if (isLatchNode(n))

if (i = the indentation level at which loopHead(n) was generated)

WriteBB(n; i) return else traversed(n) := false

EmitGotoAndLabel(n; i) return switch (structType(n))

case loop: case loopCond:

WriteLoop(n; i; latch; followSet; gotoSet) case cond:

if (condType(n) = nway)

WriteNway(n; i; latch; followSet; gotoSet) else

Write2way(n; i; latch; followSet; gotoSet) case seq:

WriteSeq(n; i; latch; followSet; gotoSet) end procedure

Figure 5.7: Algorithm to Generate Code for a Control Flow Graph.

Chapter 6 Implementation and Results This chapter describes ast, a prototype implementation of the assembly structuring process presented in Chapters 3, 4 and 5. Two variations were implemented; one for the algorithms that make use of the parenthesis theory described in section 4.3.2 for loop structuring and one for the algorithms that make use of the derived sequence of graphs described in Section 4.3.1 for loop structuring. Both implementations were run on a set of benchmarks programs for the purpose of making comparisons between the two in terms of both functionality (the quality of of the high level code generated) and resource efficiency (memory usage and execution speed).

Section 6.1 describes the specific aspects of the given implementations such as limitations imposed upon the type of input it can handle. Following this, Section 6.2 presents the tests that were performed on the implementations and the conclusions drawn as a result of these tests.

6.1 ast 6.1.1 Implementation Parameters Before considering the coding of the ast, a number of other decisions had to be made. Namely, the target architecture had to be chosen as well as the language used for the implementation. The first decision came down to a choice between the Intel and SPARC architectures as machines implementing

6.1. AST 101 both were readily available. The SPARC was chosen for two reasons. Firstly, as part of related research performed by the author, a subset of the ast's functionality had already been implemented for the SPARC architecture. Secondly, being a RISC machine, the instruction set of the SPARC is smaller than that of a CISC machine such as the Intel family. This meant that less time would have to be spent of the architecture dependent parts of the implementation. Any means of minimising the time spent on these parts was desirable as the emphasis was on the machine independent structuring algorithms.

Given the potential for ast to be integrated with other reverse engineering tools, a criteria for the choice of implementation language was that it supported code reuse to some extent. Object oriented languages are proclaimed to enable better code reuse and so C++ was chosen, being one of the more popular object oriented languages currently available within the development environment.

As explained previously, ast works with SPARC8 assembly programs. However, it does not accept all the instructions defined for this architecture. The reason for this is twofold. Firstly, we restricted the domain of handled programs to be only applications programs. That is, only programs using unprivileged instructions

1

are accepted. Secondly, the scope has been narrowed further to realise development time resources available for the tool. To this end, we neglect all trap instructions. This is of little consequence in reality as the majority of programs never include these instructions.

6.1.2 Delayed Instructions The concept of a delayed instruction is a feature of SPARC's pipelined architecture which means that the next instruction to be executed is fetched during the execution of the current instruction. For just about all unconditional control transfer instructions (CTIs), the instruction in their delay slot (i.e. at the address one greater than the program counter) is always executed before the transfer of control takes place. For the conditional CTIs however, the decision of whether or not to execute the delayed instructions depends upon the value of an annul flag within the instruction. Pages 50-51 of [9] explain the semantics of the delayed instruction further. In our implementation, we have not strictly adhered to these semantics. We make the

1

Privileged instructions are those that can only be executed by the super user.

102 CHAPTER 6. IMPLEMENTATION AND RESULTS assumption that the instruction in the delay slot of a CTI, will always be executed before the transfer of control takes place. This is done once again in an attempt to place some constraint of the time required to develop ast. It also makes reasoning about the output HLL in terms of the input assembly significantly easier. Importantly, it does not influence the development of the structuring algorithms presented in Chapter 4.

The relaxing of our adherence to the semantics of the delayed instruction was covered earlier in Section 3.2.1. We briefly detail here the modification that would have to be made to ensure that the proper semantics are not violated when generating the control flow graph. The control flow graphs generated from the given definition of a basic block does not correctly capture the possible paths of control flow in all situations. An example is shown in Figure 6.1. The ,a appended to instruction 2 (bne,a .LL1) indicates that the annul flag has been set for the succeeding delayed instruction. This means that if the conditional branch is taken (the right branch in the graph) then the sequence of instruction execution will be 1,2,3,5. However, if the condition code set in instruction 1 evaluates to false the conditional branch will not be taken and the flow of control will follow the left branch in the graph. In this case the execution sequence will be 1,2,4. A tempting solution to this problem is to move the delayed instruction into the block C. However, this may conflict with the original flow of control if there is another edge into C. The execution path including this edge will not originally have included instruction 3. The solution is to put instruction 3 in its own block which would lie in between block A and C as demonstrated in Figure 6.2.

cmp %o2,,512 ld [%o0+4],%o0 .LL1:

sethi %hi(trans),%o0

or %o0,%g0,%o0

(1) (2) (3)

(4) (5)

........

.... A

B C

bne,a .LL1

Figure 6.1: CFG demonstrating the semantics of a delayed instruction in the context of the annul flag.

6.1. AST 103

sethi %hi(trans),%o0 (3) cmp %o2,,512 (1)

(2)

.... A

ld [%o0+4],%o0 .LL1: or %o0,%g0,%o0(4) (5)

........B C

A' bne,a .LL1

Figure 6.2: Solution to the delayed instruction dilemma. 6.1.3 Preprocessor Knowledge of the patterns generated by the compiler used to generate the test cases meant that certain preprocessing could be performed on the input that would otherwise involve data flow analysis. As ast does not attempt any data flow analysis, this preprocessing allows for a wider and more interesting range of inputs to be tested against. In particular, any program containing an indexed jump can be included within the valid input domain of ast as the preprocessing can use the compiler patterns to determine the actual targets of these CTIs. This type of instruction is most often the implementation of a case or switch statement. An example of the preprocessor functionality is shown in Figure 6.3. Both the input (shown in the left column) and the output (shown in the right column) are have their instructions numbered for easier reference. Lines 14 - 21 of the input give an example of the pattern generated by the Gnu compiler for an indexed jump. The value of the register %o0 will be the address of one of the lines 18 - 21. As can be seen in the output, the preprocessor have moved the label from each of these lines into the position previously occupied by the %o0 register.

Secondly, the preprocessor has also removed all lines that are not one of the following:

instruction - an assembly instruction. The regular expression used to

match this type of line is ^"t[a-z]. The line !tabstop?save %sp,-1136,%sp

104 CHAPTER 6. IMPLEMENTATION AND RESULTS

will be matched by this pattern for example. branch label - these are the labels that will also be the operand of a branch

instruction. The pattern used to match these lines is ^".LL[0-9]. The line .LL3: will be matched by this pattern.

routine label - these are the labels at the entrance to a routine. They will

be matched by the two regular expressions ^[a-zA-Z.] and :$. The line bf: will be detected as a procedure label.

1 .file "bf.c" 2 gcc2.compiled.: 3 .section ".text" 4 .align 4 5 .global bf 6 .type bf,#function 7 .proc 04 8 bf: 9 !#PROLOGUE# 0 10 save %sp,-1136,%sp 11 !#PROLOGUE# 1 12 sethi %hi(trans+4),%o0 13 ... 14 jmp %o0 15 nop 16 .align 4 17 .LL32: 18 .word .LL30 19 .word .LL4 20 .word .LL5 21 .word .LL6 22 .LL4: 23 ...

1 bf: 2 save %sp,-1136,%sp 3 sethi %hi(trans+4),%o0 4 ... 5 jmp .LL30, .LL4, .LL5, .LL6 6 nop 7 .LL4: 8 ...

Figure 6.3: Assembly program before (left) and after (right) preprocessing. Each instruction has been given a numeric label for easier reference.

The annul flag discussed in Section 6.1.2 also presents one more problem that can be fixed by the preprocessor. The unconditional branch instructions (e.g. ba) also have the option of setting the annul bit. If this bit is set, then the delayed instruction is never executed after the branch is taken. For this reason, it (the delayed instruction) is invariably the target of another CTI (otherwise it would be an unreachable instruction). In order to allow these unconditional branch instructions to be treated in the same manner

6.2. TESTING AND RESULTS 105 as the conditional branch instructions, a nop is inserted after them by the preprocessor. In context of the proposed treatment of delayed instructions, inserting the nop does not affect the semantics of the original program.

6.1.4 Data Representation An important issue in the implementation of ast is the efficiency of its resource usage. In this respect, a memory efficient representation of both the instructions and the CFGs is to be achieved. The use of reference mechanisms (e.g. pointers) available in languages such as C++ enables the desired memory efficiency to be attained. The instructions are stored as an array. The number of basic blocks that partition these instructions is unknown which suggests the use of a linked list when doing the partitioning. The graph edges are represented as pointers between the nodes. This is allows for optimal memory use efficiency as each block is only represented once in memory. The same applies for the sequence of instructions within each block. Pointers are used to index the subsection of the instruction array containing the instructions for the relevant block.

6.2 Testing and Results This section describes the tests performed as well as providing information on the test programs used as input to generate the results.

6.2.1 Test Inputs For the purpose of testing both variations of ast, the assembly input files used were generated automatically through use of the Gnu C compiler (version 2.7.2.1) with the -S and -O2 options which generates optimized assembly code. The C source files were taken from the SPEC group of benchmark programs

2

. These benchmarks are widely used in the compiler community.

Testing on this set of input programs ensures that ast was tested on nontrivial inputs.

2

See the url http://www.specbench.org/ for further details on the SPEC group.

106 CHAPTER 6. IMPLEMENTATION AND RESULTS The programs chosen for testing were from the CINT95 suite and are shown in Figure 6.4 along with the relevant details of each. The figures for each program are totals of all the assembly files for that program.

Program Assembly Graph

Ins. Nodes

gcc 411477 77668

go 108995 12934 ijpeg 59203 8570

li 16697 3703 m88ksim 43075 7032

perl 85473 16695 vortex 160761 25839

Figure 6.4: The programs used for testing along with the instruction count and graph node count.

6.2.2 Comparisons for Quality of Code Generated This section presents the results of tests that give some kind of indication as to the type of code generated by each variation of ast. For simplicity, which we refer to the version of ast implementing the derived sequence of graphs algorithm as ast

DSG

and the other version as ast

PT

. The metrics

used to judge the quality of the code produced by ast

DSG

and ast

PT

are

based on the number of HLL control structures and gotos generated. The other method that could be used is to just read the code produced. However, this is hampered by both the size of the code produced. However, we present the code generated for the control flow graph of Figure 5.1 by each algorithm as it is relatively small. It also gives some idea of how unstructuredness can produce a variation in the quality of code produced, depending on what algorithms are used to perform the structuring.

Figure 6.5 shows the code generated when using the parenthesis theory algorithms for loop structuring. The limitations of the parenthesis theory are well demonstrated here. Recall that the back edge (5,9) did not induce a loop as the two nodes, 5 and 9, were determined to be in two different loops. This is a direct result of relying upon a the parenthesis pair of each node which describes a relationship between the two in terms of the underlying DFS tree. As this example shows, extending the usage of the parenthesis to

6.2. TESTING AND RESULTS 107 imply a similar relationship but in terms of a graph does not always give us the result we desire. By not determining node 9 to be a loop header, the resulting code in this case involves twice as many gotos as the generated code shown in Figure 6.6. This latter code was given by the derived sequence of graphs based algorithm.

The DSG based algorithm does have one disadvantage however in comparison to the PT algorithm. This arises in the case where there is a forward jump into a loop. The forward edge representing this jump will cause the graph to be irreducible. What this means is that the two nodes participating in the back edge that would otherwise induce a loop, will never be in the same interval. Recall from Section 4.5.5 that this is one of the conditions necessary for the edge to induce a loop. The PT algorithm is not faced with this limitation.

Total number of HLL structures generated Having seen that the DSG algorithm is more likely to generate less gotos than the PT algorithm, we present some test results that support this. The bar graphs in Figures 6.7, 6.8 and 6.9 show the number of 2way conditionals, nway conditionals and loops respectively generated by each implementation. The shaded bars indicate the results generated by ast

DSG

and the clear

bars represent the results for ast

PT

. These graphs show that while the

DSG algorithm gives better results (i.e. more HLL structures) in each case, the difference is relatively small. As would be expected then, the ast

PT

implementation will generate more gotos than ast

DSG

and this is indeed the

case as shown in Figure 6.10 which shows the number of gotos generated as a percent of all control transfer instructions in the generated code. That is, as a percent of the total number of goto, while, if..then..else, if..then, switch, break and continue statements generated. This last figure also shows that the proportion of gotos generated is still quite large (as high as 32% in one case). This goes to show that if the input assembly is reasonably unstructured (as the optimized assembly was in our tests), the resulting code will still retain a significant amount of unstructuredness. However, we have managed to transform up to 85% of the control transfer instructions into high level control structures. This is a significant improvement for anyone having to maintain assembly programs.

108 CHAPTER 6. IMPLEMENTATION AND RESULTS

main() -

BB12; if (!bne) -

BB11; while (bne) -

BB10; if (bne) - L2:

BB2; goto L1; "" BB9; switch (Reg0) - case cond.0:

goto L2; break; case cond.1:

BB3; goto L1; break; case cond.2: L8:

BB8; if (bne) -

BB6; if (bne) -

BB4; if (bne) -

goto L8; "" "" else - L5:

BB5; "" "" else -

BB7; goto L5; "" break; "" L1:

BB1; BB11; "" "" BB0; return; ""

Figure 6.5: The C style high level code produced by ast

PT

6.2. TESTING AND RESULTS 109

main() -

BB12; if (!bne) -

BB11; while (bne) -

BB10; if (bne) - L2:

BB2; goto L1; "" BB9; switch (Reg0) - case cond.0:

goto L2; break; case cond.1:

BB3; break; case cond.2:

do -

BB8; if (!bne) -

BB7; L5:

BB5; break; "" BB6; if (!bne) -

goto L5; "" BB4; "" while (bne); break; "" L1:

BB1; BB11; "" "" BB0; return; ""

Figure 6.6: The C style high level code produced by ast

DSG

110 CHAPTER 6. IMPLEMENTATION AND RESULTS

0 5000 10000 15000 20000 25000 30000

2way Conditionals

Number of HLL 2way Conditionals Generated gcc go ijpeg li m88ksim perl vortex Figure 6.7: Number of HLL 2way conditionals generated.

0 50 100 150 200

Switch/Case Statements

Number of HLL Switch/Case Statements Generated gcc go ijpeg li m88ksim perl vortex Figure 6.8: Number of HLL nway conditionals generated.

6.2. TESTING AND RESULTS 111

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Loops

Number of HLL Loops Generated gcc go ijpeg li m88ksim perl vortex Figure 6.9: Number of HLL loops generated.

0 10 20 30 40 50

Percentage

Gotos as a % of total HLL CTIs gcc go ijpeg li m88ksim perl vortex Figure 6.10: Number of gotos as a percentage of the total HLL control structures generated.

112 CHAPTER 6. IMPLEMENTATION AND RESULTS 6.2.3 Resource Usage The second groups of tests was performed only on the ast

DSG

implementation

to compare its overheads in terms of memory usage and execution efficiency with that of ast

PT

. The reason the data was only collected for ast

DSG

is

that apart from the construction and usage of the derived sequence of graphs, both implementations were identical. What we aim to determine with these tests is whether or not the parenthesis theory algorithm gives a significant enough improvement in runtime resource usage to offset its disadvantages with respect to the quality of code it generates.

The graph in Figures 6.11 shows the memory requirements of the derived sequence of graphs for each program. As can be seen, the maximum required by any program was about 2MB. This is to be expected as gcc was the largest test program used (411477 instructions). For any modern computer, this kind of memory usage does not impose a significant drain on the memory available, especially given that the program will not be used as frequently as other equally memory hungry programs. And, as Figure 6.12 shows, larger inputs (such as gcc) require relatively less memory for DSGs than smaller programs (such as ijpeg). Therefore, we can safely conclude that, in terms of memory requirements, the DSG algorithm does not perform badly enough to justify the use of ast

PT

instead.

The same is true for the speed overhead incurred by ast

DSG

. Figure 6.13

shows, the time taken to build the DSGs as a percent of the total time taken to build the DSGs, perform the structuring and generate the code. It does not take into account the time required to do the parsing or control flow graph generation. Even without taking into account the extra time required for these phases, the DSG building only takes up on average 25 percent of the runtime. The mostly costly phase (which is common to both ast

DSG

and

ast

PT

) is the code generation and parsing phases as they require intensive

disk reading and writing.

With regard to the DSG performance, the conclusion is that its extra resource requirements are not as significant a disadvantage the it advantage in generating better HLL code and should therefore remain the algorithm of choice when structuring assembly code.

6.2. TESTING AND RESULTS 113

0 500 1000 1500 2000

KBytes

Memory required for DSGs gcc go ijpeg li m88ksim perl vortex

Figure 6.11: Memory cost of DSGs

0 10 20 30 40 50 60 70

Bytes

Memory required by DSGs / # reachable nodes gcc go ijpeg li m88ksim perl vortex Figure 6.12: Memory cost of DSGs / number of reachable nodes.

114 CHAPTER 6. IMPLEMENTATION AND RESULTS

0 5 10 15 20 25 30 35 40

Secs

Time to build DSG as % of Total Time gcc go ijpeg li m88ksim perl vortex Figure 6.13: Time taken to build the DSGs as a percent of the total time to perform the structuring.

Chapter 7 Conclusion This thesis has presented some the issues involved when translating assembly programs to a program in a higher level language. In particular we concentrated on deriving high level fontrol flow constructs from the control flow graph representation of an assembly program. When presenting the algorithm that perform the analysis and tranformations involved, we investigated two variations on the algorithms particular to loop structuring. The first was based upon work by Cifuentes in [5]. A comparsion between the performance characteristics of each variation was a major motivation for this thesis. We also made a number of enhancements to some of the other algorithms presented by Cifuentes.

An assembly to high level language structuring process involves a number of phases; parsing, control flow graph generation, structuring and code generation. Parsing reads in the assembly source and performs a simple type of semantic analysis to categorise the instructions according to the control flow characteristics. The control flow graph generation phase does exactly that -- constructs the control flow graph representing the control flow between the basic blocks of the program. The structuring phase annotates the nodes with information summarising the role they play in a high level constrol structure (if any). This information is then used in the code generation phase to produce a semi high level language program. We cannot produce a complete high level program as we do not perform any data flow analysis.

When dealing with unstructured control flow graphs, we made use of a canonical set of unstructured forms derived from the work by Williams and Oulsnam (see Section 2.1) as a guideline for the types of unstructured code we

116 CHAPTER 7. CONCLUSION would have to handle. However, this was only a basis from which to start. The myriad of combinations of these forms that presented themselves when analysing real assembly programs meant that a considerable amount of extra effort was required to handle all unstructed cases. In addition, the presence of nway nodes was not accounted for Williams and Oulsnam. These type of nodes add extra complexity to the analysis as we have to prioritise which high level structures will be generated when their representations within the control flow graph overlap each other. The main heuristic guiding a decision in these cases was to minimise the potential number of gotos that would be generated in HLL code.

Two other extensions and/or modifications made to existing structuring algorithms. Firstly, the method used to determine the follow node for a conditional was changed to rely on the post-dominator properties of the nodes as opposed to the forward dominator. The result of this change was that detecting whether or not an nway subgraph had an unstructured entry or exit was made considerably easier and more reliable. Secondly additional analysis was applied to 2way conditional headers so that in case where one of the branches included an unstructured entry or exit to another structure, the 2way was restructured in such a way that fewer gotos were generated in the final HLL code.

The final part of this thesis presented the results of comparison tests between the implementations of the two variations on the structuring process. The first variation was when the process involved using a derived sequence of graphs (see Section 4.3.2) to do loop structuring where as the second was when parenthesis theory (see Section 4.3.1) was used for loop structuring instead. The results showed that while the latter was slightly more efficient in terms of memory usage and execution speed, it was enough of an improvement to offset the higher quality of code generated by the former.

This thesis provides the impetus for two types of future work in this area. Firstly, additional development of ast can be undertaken to remove some of the limiting assumptions under which it was developed that restrict its functionality, or even, as in the case of how it handles delayed instructions, compromises its strict correctness.

Secondly, and of more import, would be further research into two of the algorithms described within this thesis. The first is the algorithm used to mark the nodes within an nway subgraph. Currently this uses a depth first search which incurs the overhead of recursion. Secondly, the algorithm for deter 117 mining the immediate post-dominator of each node makes three passes over all the nodes in the graph. There is evidence to suggest that this can be done more efficiently by considering that the closely related concept of immediate foprward-dominators has a faster algorithm to determe this property for a node.

Appendix A

Attribute definitions

A.1 Base Node Attributes The following group of attributes are defined for the nodes in the control flow graphs. Additional attributes that are relevant under only one of the two variations of the structuring algorithm are defined in later sections.

nodeType - the type of the node. One of ucond, jcond, nway, call, ret,

fall. Occasionally, 2way is used instead of cond.

order - the post-ordering order of the node. This ordering is used when

considering the nodes of a graph in ascending or descending order.

revOrder - the post-ordering of the node in the reverse graph. pred - the immediate predecessors of the node. This can be used as either

a set or a sequence. If used as a sequence, a second parameter can be given to indicate which predecessor to return. For example, pred(n) denotes the set of predecessors of n where as pred(n; 1) denotes the first predecessor of n.

succ - the immediate successors of the node. The semantics of this attribute

are similar to pred with one addition. For a 2way node, we can indicate which successor to return by indexing with a boolean value. This enables us to access the true false successors which correspond with the first and second successors respectively.

A.1. BASE NODE ATTRIBUTES 119 pred

rev

- the immediate predecessors of the node in the reverse graph. The

membership of this set is identical to the succ set for the node.

iPDom - the immediate post-dominator of the node. caseHead - the case header node for the most nested nway in which this

node is a member. May be undefined.

loopHead - the loop header node for the most nested loop in which this

node is a member. May be undefined.

condFollow - the follow node of the conditional headed by this node. This

may be an nway or 2way conditional and the attribute is only defined for nodes that have been structured as conditional headers.

loopFollow - the follow node of the loop headed by this node. Only defined

for loop headers.

structType - the high level control structure this node is determined to be

the header of. Will be one of:

ffl seq - a sequential statement (default for all nodes before structuring)

ffl loop - node is a loop header ffl cond - node is a conditional header ffl loopCond - node is both a loop and conditional header

condType - the type of conditional this node heads. One of:

ffl ifThen - header of a 2way conditional where only the true branch

is non-empty.

ffl ifElse - header of a 2way conditional where only the false branch

is non-empty.

ffl ifThenElse - header of a 2way conditional where both clauses are

non-empty.

ffl case - header of an nway conditional header.

loopType - the type of loop this node heads. One of:

ffl preTested - header of a loop where the loop condition is tested

before the first iteration (e.g. while loops).

120 APPENDIX A. ATTRIBUTE DEFINITIONS

ffl postTested - header of a loop where the loop condition is tested

after the first iteration.

ffl endless - header of a loop with no loop condition.

traversed - a boolean flag used indicate whether or not a node has been

visited during a given graph traversal. The value of this flag is implicity set back to false after a complete traversal has been completed.

A.2 Predicates This section describes the predicates used. While they are implemented as attributes of a node, their functionality differs in that they only return a boolean value which depends on the other attributes and cannot be explicitly set.

isBackEdge - given a pair of nodes, returns true if the pair represents a

back edge in a graph.

isLatchNode - return true if the node is a latch node of a loop

A.3 Attributes for Other Objects Certain other concepts used in the algorithms are more easily reasoned about when treated as objects with their own attributes. We describe these attributes here.

loopNodes - this attribute is defined for a loop (h \Phi l ) and represents all

the nodes that either have h marked as their loop header or some other loop headed, h

2

, such that h

2

is nested within (h \Phi l ). The recursive

definition for a test of nesting is:

A node n is nested within (h \Phi l ) if

(loopHead(n) = h or loopHead(n) is nested within (h \Phi l ))

A.4. DERIVED SEQUENCE OF GRAPHS ATTRIBUTES 121 A.4 Derived Sequence of Graphs Attributes The following attribute is only defined for the nodes used in the derived graphs under the DSG algorithm. That is, it is defined for each node in G

i

where n ? 2. These nodes also inherit all of the above attributes.

nodesOf - the set of nodes from G

i\Gamma 1

in the interval this node represents.

baseNodesOf - the set of nodes from G

0

(i.e. the original CFG) in the

interval this node represents.

Under the DSG algorithm where the derived sequence is G

0

; : :; G

n

, the following attribute is defined for all nodes in graphs G

0

; : :; G

n\Gamma 1

(which may

be no graph if n = 0).

inInterval - the node in G

i+1

which represents the interval in which this

node is a member.

A.5 Parenthesis Theory Attributes The following attributes are defined for control flow graph nodes when using the PT algorithm. As both these attributes are a tuple type, their individual elements can be accessed using the standard indexing mechanism.

parenthesis

true

- the parenthesis for this node defined during a DFS traversal of the graph where the true edges were traversed first.

parenthesis

false

- the parenthesis for this node defined during a DFS traversal of the graph where the false edges were traversed first.

Bibliography

[1] Ole Agesen. Design and Implementation of Pep, a Java

TM

Just-In-Time

Translator. Theory and Practice of Object Systems, 3(2):127-155, 1997.

[2] F.E. Allen. Control flow analysis. SIGPLAN Notices, 5(7):1-19, July

1970.

[3] B.S.Baker. An algorithm for structuring flowgraphs. Journal of the

ACM, 24(1):98-120, January 1977.

[4] C. Cifuentes. Structuring decompiled graphs. In T. Gyim'othy, editor,

Proceedings of the International Conference on Compiler Construction, Lecture Notes in Computer Science 1060, pages 91-105, Link"oping, Sweden, 24-26 April 1996. Springer Verlag.

[5] Cristina N. Cifuentes. Reverse Compilation Techniques. PhD thesis,

Queensland University of Technology, 1994.

[6] J. Cocke. Global common subexpression elimination. SIGPLAN Notices,

5(7):20-25, July 1970.

[7] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms, chapter 18, pages 365-367. The MIT Press and McGraw-Hill Book Company, 1989.

[8] M.S. Hecht. Flow Analysis of Computer Programs. Elsevier NorthHolland, Inc, 52 Vanderbilt Ave, New York, New York 10017, 1977.

[9] Sparc International Inc. The SPARC Architecture Manual. PrenticeHall, 1992.

[10] Ulrike Lichtblau. Decompilation of control structures by means of graph

transformations. In Proceedings of the International Joint Conference on Theory and Practice of Software Development, pages 284-297, Berlin, 1985.

BIBLIOGRAPHY 123 [11] M.H.Williams. Generating structured flow diagrams: The nature of

unstructuredness. Computer Journal, 25:393-396, 1982.

[12] M.S.Hecht and J.Ullman. A simple algorithm for global data flow analysis problems. SIAM Jounal of Computing, 4(4):519-532, December 1975.

[13] G. Oulsam. Unravelling unstructured programs. Computer Journal,

25:379-387, 1982.

[14] Thomas Pittman and James Peters. The art of compiler design : theory

and practice. Prentice Hall, Englewood Cliffs,NJ, 1992.

[15] Andrew S. Tanenbaum. Structured Computer Organization, chapter 7.

fire_on_non_leaf, attribute_conditions)

Description

This predicate defines criteria used for determining the state of an event. Criteria are used where needed in any rule action by the check_event_criteria predicate, and are also used by the create_cache_search_criteria predicate.

This predicate should be run in a rule triggered by a TEC_Start event at event server start-up. This loads the criteria once, instead of every time it is needed.

The following checks are done at runtime to this predicate. Errors can be obtained with the log_error predicate, which is described on page log_error.

 

  • The event class is checked for existence.
  • All attribute conditions are checked thoroughly.

Arguments

attribute_conditions
Specifies the attribute conditions for the reference event. Each attribute condition is defined as a list with three elements. The attribute_conditions argument is a list of attribute conditions, which means it is a list of lists. For example, the following figure illustrates the format of this argument if two attribute conditions are defined:
[['attribute', operator, 'value'],
['attribute, operator, 'value']]
Notes:
  1. The attribute and operator must be compatible. For example, you cannot create a condition for an attribute defined as a STRING type with a greater_than operator. See the following table for attribute-operator compatibility.
  2. Attribute conditions can only be defined for attributes of a SINGLE complexity type.
  3. The matches operator requires the same Perl regular expression syntax as that supported by the Tivoli Management Framework.
  4. ENUM types are evaluated arithmetically based on their integer representation.
  5. Attributes of a LIST_OF complexity type are not supported in an attribute condition.
The following table lists the operators you can use for each SIMPLE type of attribute. The operators used in this predicate are similar to those used in an event filter. As such, you might find the information in Attribute conditions helpful.
Simple Type Operator
ENUM, INTEGER, REAL equals greater_than greater_than_equal less_than less_than_equal not_equals outside within
STRING equals not_equals matches outside within
class
The event classes for which the attribute conditions are defined. This argument is specified in list format; for example, ['NT_SNMP', 'NT_Server_Start'].

The order of event classes in the list is significant if you use the return_order argument with a value of order in the create_cache_search_criteria predicate when defining a search using this event criteria. See the create_cache_search_criteria for additional information.

criteria_name
The name that uniquely identifies the criteria.
fire_on_non_leaf
Specifies whether the criteria can be used for executing rules on the reference event if it is a superclass. The following values are valid:
no
The criteria can be used for executing rules on leaf classes only.
yes
The criteria can be used for executing rules on both leaf and non-leaf classes.

Examples

  1. The following example shows how to define event cache search criteria with the predicate. The name of this criteria is example_criteria. This is the name referred to from the create_cache_search_criteria predicate. This criteria is used for a TEC_DB event and it can only be used for a leaf class.

     

    Note:
    This example is not a realistic definition. It is simply intended to show the various ways you can define attribute conditions.
    create_event_criteria(example_criteria,
      'TEC_DB',
      no, 
      [['hostname', equals, 'chair' ],
       ['hostname', not_equals, 'chair1' ],
       ['hostname', matches, 'ch.*r' ],
    
       ['repeat_count', within, [5]],
       ['repeat_count', outside, [10,15]],
       ['repeat_count', equals, 5],
       ['repeat_count', not_equals, 6],
       ['repeat_count', greater_than, 4],
       ['repeat_count', greater_than_equal, 5],
       ['repeat_count', less_than, 6],
       ['repeat_count', less_than_equal, 5],
    
       ['severity', within, ['MINOR']],
       ['severity', outside, ['FATAL','HARMLESS']],
       ['severity', equals, 'MINOR'],
       ['severity', not_equals, 'FATAL'],
       ['severity', greater_than, 'HARMLESS'],
       ['severity', greater_than_equal, 'MINOR'],
       ['severity', less_than, 'CRITICAL'],
       ['severity', less_than_equal, 'CRITICAL']
      ]
    ),
  2. The following example creates event criteria named db_critical. This criteria applies to a TEC_DB event sent from a database server with a severity greater than or equal to CRITICAL. The severity attribute is defined as an ENUM type with a default value of 60. This example assumes all database server names begin with DB_SRV followed by other characters.
    create_event_criteria('db_critical',
                          'TEC_DB'
                           yes,
                          [['hostname',matches,
                            'DB_SRV*'],
    
                           ['severity',
                            greater_than_equal,
                            'CRITICAL']
                          ]
    )
  3. The following example creates event criteria named ups_problem. This criteria applies to any upsOnBattery, upsBatteryLow, or upsBatteryDischarged event from host homer.
    create_event_criteria('ups_problem',
                          ['upsOnBattery',
                           'upsBatteryLow',
                           'upsBatteryDischarged'],
                           yes,
                          [['hostname',equals,'homer']]
    )

See Also

create_event_sequence

Defines a sequence of events for correlation.

Synopsis

create_event_sequence([event_sequence], [attribute_conditions])

--OR--

create_event_sequence([event_sequence], [attribute_conditions], [event_details])

Description

This predicate is used to define a sequence of events, and any additional information pertaining to those events, that make up an event sequence. An event sequence is a series of events (generally causally related) that occur in a fixed order.

This information is loaded into the knowledge base of the rule engine at event server start-up and is used by rules that call correlation predicates. You can load it with a rule triggered by a TEC_Start event at event server start-up time.

Arguments

attribute_conditions
The list of conditions for attributes that must be met by both events in a sequence (the event under analysis and the event being searched for in the cache) to be eligible for correlation. There are two types of conditions, defined as follows:
absolute
A condition that can be placed upon an attribute, similar to an attribute condition in an event filter. For example, [severity,equals,'HARMLESS']. This type of condition applies to all events in the event sequence. Any absolute conditions that apply to only a subset of events in an event sequence must be specified with the attr_condition predicate, which is described on page attr_condition .

See the attribute_conditions argument description for the create_event_criteria for additional details about specifying attribute conditions for this argument, as the syntax is the same.

attribute-match
Names of attributes whose values must match between correlated events. You should always define at least one attribute-match condition to ensure correlation only between events of the same system. For example, [hostname]. This type of condition applies to both events that are being correlated when using the attr_exception predicate in this argument. The attr_exception predicate is described on page attr_exception .
event_details
The list of predicates that provides additional details about individual events in the event sequence, including identifying clearing events. The predicates that can be specified are shown in the following table.

Predicate
 
Description
attr_condition Defines absolute attribute conditions for a single event in an event sequence.
attr_exception Defines an attribute that must match a different attribute in other events in an event sequence.
attr_sequence Defines the values of an attribute that change due to a problem event's position in an event sequence.
clears Defines a clearing event for events in an event sequence.
event_sequence
The list of event class names in event-sequence order, from left to right. For example, ['upsOnBattery', 'lowBattery', 'upsDischarged'].

Examples

  1. The following example defines an event sequence with clearing events. The sequence contains events generated from two monitoring sources: APC uninterruptible power supply and IBM Tivoli Distributed Monitoring. The uninterruptible power supply event sequence is illustrated in the figure on page ***.

    The problem events are specified in event-sequence order, from left to right, the root cause being the upsOnBattery event. The last event in the sequence (universal_host) is generated by Tivoli Distributed Monitoring. Each of the problem events in the sequence has a related clearing event defined with the clears predicate. The uninterruptible power supply events are related if the hostname attributes have the same value.

    The Tivoli Distributed Monitoring universal_host event is handled a little differently because the value for the affected host is stored in the probe_arg attribute (rather than the hostname attribute like the uninterruptible power supply events). This requires mapping of the probe_arg attribute to the hostname attribute of the uninterruptible power supply events so correct comparisons can be made. A situation like this is referred to as an attribute exception and requires the use of the attr_exception predicate, which is called from the event_details argument of the create_event_sequence predicate.

    A universal_host event with a severity of FATAL is generated when a host is unavailable. When the host is available again, the same event is sent with a severity of HARMLESS. This situation means that two events of the same class are differentiated by the value of an attribute. This requires the use of the attr_condition predicate in the event_details argument of the create_event_sequence predicate to define that a universal_host event is only to be correlated with an uninterruptible power supply event if its severity is FATAL. Furthermore, a universal_host event is a clearing event only of its severity is HARMLESS.

    The attribute_conditions argument for the first three clears predicates in the example are empty lists because the clearing events are event classes and attribute conditions are not needed. Their hostname attributes must match because hostname is specified in the attribute_conditions argument for the create_event_sequence predicate.

    The attribute_conditions argument for the fourth clears predicate places a condition on the universal_host event's severity attribute for the value being equal to HARMLESS for defining a clearing event. The specification of the hostname attribute in the attribute_conditions argument for the create_event_sequence predicate and the attr_exception predicate called from the event_details argument of the create_event_sequence predicate ensure that the host names match between clearing and problem events.

    create_event_sequence(
      ['upsOnBattery',
       'lowBattery',
       'upsDischarged',
       'universal_host'],
    
      ['hostname', ['status','outside',['CLOSED']]
      [
       clears('powerRestored',[ ], ['upsOnBattery'],[ ]),
       clears('returnFromLowBattery',[ ],['lowBattery'],
              [ ]),
       clears('dischargeCleared',[ ],['upsDischarged'],
              [ ]),
       clears('universal_host', 
              [ ['severity', equals,'HARMLESS'] ]
              ['universal_host'], 
              [ ]),
       attr_condition('universal_host',
                      ['severity',equals,'FATAL']),
       attr_exception('hostname','universal_host',
                      'probe_arg')
      ]
    ),
  2. The following example defines an event sequence with clearing events. This sequence contains events generated from Compaq Insight Manager related to physical drive problems. This event sequence is illustrated in the figure on page ***.

    This monitor generates events of the same class, using the cpqTapePhyDrvCondition attribute to indicate status. This attribute with a value of OK designates a clearing event. To handle this type of event sequence, the attribute name and the sequence of values (in event-sequence order from left to right) it can assume must be defined with the attr_sequence predicate, which is called from the event_details argument of the create_event_sequence predicate.

    The event_sequence argument only defines one class because this one event comprises the entire event sequence. Status is dependent upon the cpqTapePhyDrvCondition attribute.

    In the clears predicate, a target_attribute_conditions argument is not needed because the conditions for the problem events are already defined with the attr_sequence predicate.

    create_event_sequence(
      ['cpqTape3PhyDrvStatusChange'],
      ['hostname', ['status','outside',['CLOSED']]]
      [
       attr_sequence(
         'cpqTape3PhyDrvStatusChange',
         'cpqTapePhyDrvCondition'=['Degraded','Failed']),
       clears(
         'cpqTape3PhyDrvStatusChange',
         [ ['cpqTapePhyDrvCondition',equals,'OK'] ],
         ['cpqTape3PhyDrvStatusChange'],
         [ ])
      ]
    ),
  3. This example is based upon the event sequence defined in the following figure. It defines an event sequence with sequences that branch. This means that two or more sequences merge or emerge from a single sequence. In this example, the cpqHe3ThermalSystemFan and the cpqHe3ThermalCpuFan type of events merge with the cpqHe3ThermalTemp event type sequence. This event sequence contains events generated from Compaq Insight Manager that are related to temperature problems. Root cause events are the darkest shade, clearing events are the next lightest, and effect events are the lightest.
    Branching Causal Chain

    The create_event_sequence predicate can be used to define this type of event sequence by specifying one complete sequence with subsequent sequences that include at least one of the events that all of the related sequences have in common.

    If a sub-sequence branches from the complete sequence and then reconnects later in the sequence, both the event where it branched and the event where it reconnected must be specified. You should specify the complete sequence first and then specify sub-sequences that branch from or connect to it; this approach will make it easier for you to write the predicates.

    The following two create_event_sequence predicates completely define the event sequence shown in the flowchart for this example. The cpqHe3ThermalTempDegraded event in the second predicate specifies that the cpqHe3ThermalCpuFanFailed event joins the sequence defined in the first predicate.

    create_event_sequence(
      ['cpqHe3ThermalSystemFanDegraded',
       'cpqHe3ThermalSystemFanFailed'
       'cpqHe3ThermalTempDegraded',
       'cpqHe3ThermalTempFailed'],
    
      [hostname, ['status', equals, 'OPEN']],
    
      [
       clears('cpqHe3ThermalSystemFanOK', 
              [ ],
              ['cpqHe3ThermalSystemFanDegraded], 
              [ ]),
    
       clears('cpqHe3ThermalTempOK', 
              [ ],
              ['cpqHe3ThermalTempDegraded], 
              [ ]),
    
       clears('cpqHe3ThermalConfirmation', 
              [ ],
              ['cpqHe3ThermalTempFailed], 
              [ ])
      ]
    ),
    
    create_event_sequence(
      ['cpqHe3ThermalCpuFanFailed',
       'cpqHe3ThermalTempDegraded'],
    
      [hostname, ['status', equals, 'OPEN']],
    
      [
        clears('cpqHe3ThermalCpuFanOK', 
               [ ],
               ['cpqHe3ThermalSystemFanFailed], 
               [ ])
      ]
    ),

See Also

None.

create_threshold

Defines a threshold.

Synopsis

create_threshold(threshold_criteria_name, cache_search_criteria_name, _window, _count, _max_report_frequency)

Description

This predicate defines the criteria for setting a threshold for events in the event cache. It is used in conjunction with the check_threshold predicate. It should be run in a rule triggered by a TEC_Start event at event server start-up time. This loads the criteria once.

Arguments

_count
Specifies the threshold. For example, a threshold of 5 means that when the sixth matching event is found, the threshold has been exceeded.
_max_report_frequency
Specifies the time, in seconds, that a threshold has to remain exceeded before the threshold is reported again as exceeded.
_window
Specifies the time window, in seconds, to count events that match the search criteria towards the threshold. The time window is based upon the reception time of the reference event, which is typically the event under analysis. The time window spans the number of seconds before and after the reference event. For example, a time window of 600 seconds (10 minutes) means that events matching the search criteria received 5 minutes before or 5 minutes after the reference event are counted towards the threshold.
cache_search_criteria_name
The name that uniquely identifies the search criteria for a query of the event cache. This criteria was created with the create_cache_search_criteria predicate. The search criteria must be defined before using the create_threshold predicate.
threshold_criteria_name
The name that uniquely identifies the threshold criteria. This name is referred to from the check_threshold predicate.

Examples

The following example shows how to define threshold criteria with the predicate. Its characteristics are:

 

  • The name of the threshold criteria is db_critical_threshold
  • The name of the event cache search criteria defined by the create_cache_search_criteria predicate is db_critical_search
  • The time window for the threshold is 600 seconds surrounding the reference event
  • The threshold is 3 occurrences of the event within the time window
  • The threshold must be exceeded for 300 seconds before it is reported again as exceeded.
create_threshold('db_critical_threshold',
                 'db_critical_search',
                 600,
                 3,
                 300)

See Also

decrement_slot

Subtracts a number from the value of the specified integer attribute.

Synopsis

decrement_slot(_event, _attribute_name, _by_value, _trigger)

Description

This predicate subtracts a number from the value of the specified integer attribute.
Note:
Generally, the term slot has been replaced by the term attribute, even though this command name has not been changed.

Arguments

_attribute_name
The attribute to change.
_by_value
The amount to subtract.
_event
A pointer to the event to change.
_trigger
Specifies whether change rules should be evaluated as a result of this attribute change. Valid values are: 'YES', yes, 'NO', or no.

Examples

The following example shows predicate usage:
decrement_slot(_event,host_down,1,no)

See Also

drop_change_request

Prevents a change request from being applied after change rules are run.

Synopsis

drop_change_request

Description

This predicate prevents a change request from being applied after change rules are run.

Arguments

None.

Examples

The following example shows that for a request to change the status of an event to ACK or CLOSED, if the requesting user is not Root-myHost-region, the change request is dropped and the msg attribute of the event is set to a denial message. The change request is not evaluated by any other change rules after being dropped.
change_rule: deny_state_change_of_TTs:(
       event: _event of_class _class,
       sender: _sender equals operator(_x),
       slot: status set_to _new_status within ['ACK', 'CLOSED'],
 action: (
  _sender \==  operator('Root_myHost-region'),
  bo_set_slotval(_event,'msg','modification denied !'),
  drop_change_request
 )
).

See Also

None.

drop_received_event

Discards an event after the rules are run.

Synopsis

drop_received_event

Description

This predicate causes the event under analysis to be discarded after the rules are evaluated with it.

Arguments

None.

Examples

The following example rule shows how to count the number of duplicate NFS_NOT_RESPONDING events that are received and then drop them so they're not stored in the event database. This results in one event kept with its repeat_count attribute updated each time a duplicate is received.
rule: dup_nfs_not_resp:(

  event: _event of_class 'NFS_NOT_RESPONDING',

  action: dup_and_drop_event:(
    first_duplicate(_event,event: _dup_nfs_ev
                    where [status: outside ['CLOSED'] ]
    ),

    add_to_repeat_count (_dup_nfs_ev, 1),

    drop_received_event
  )
).

See Also

None.

erase_globals

Removes all the global variables in a group from the knowledge base.

Synopsis

erase_globals(_group)

Description

This predicate removes all the global variables in a group from the knowledge base.

Arguments

_group
The group key whose variables to remove.

Examples

The following example removes all of the global variables in the Maintenance group from the knowledge base:
erase_globals('Maintenance')

See Also

None.

exec_program

Launches a program.

Synopsis

exec_program(_event, file_name, _format_string, _arg_list, watch_status)

Description

This predicate launches a program. The program's completion status can be monitored.
Note:
Null arguments to exec_program might crash the event server when this predicate is run. In addition, ensure that all attributes passed to exec_program are instantiated.

Arguments

_arg_list
A list of values (typically attributes of the event) to be supplied to the program in the form [1, 2, 3]. All of the attributes of the trigger event are also available to the program through environment variables; for example, the msg attribute value can be obtained from the $msg environment variable. See the IBM Tivoli Enterprise Console Command and Task Reference for additional information about environment variables available to running tasks and programs

For every format specification in the format string, there must be a corresponding element in the argument list. The data types in the format string must be compatible with their corresponding values in the argument list. If there are no format specifications in the format string, the argument list must be an empty list, written as [ ]. The length of a formatted command line is limited to 256 characters.

_event
A pointer to the event that triggers running of the program. All attributes of this event are available to the program as environment variables. See the IBM Tivoli Enterprise Console Command and Task Reference for additional information about environment variables available to running tasks and programs.
_format_string
The format string for formatting arguments to the command. %s (STRING), %d (INTEGER), and %ld (INT32) format specifications can be defined in the format string for use with the corresponding values in the argument list.
file_name
The path and file name of the program to run. Relative paths can be specified from the $BINDIR/TME/TEC directory.
watch_status
Specifies whether program execution should be monitored. The watch_status argument can be 'YES' or 'NO'. This argument must be enclosed in single quotation marks. If 'YES', the completion status command can be checked from the event console.

Examples

The following example shows a use of the predicate:
exec_program(_event,
            % Pass in the event pointer for access to
            % its environment variables.

            'scripts/send_notice',
            % Program path/name.

            '-m "%s" -s %s',
            % Format string.

            [_msg, _severity],
            % Argument list.

            'YES')
            % Watch status.
)

The %s format specifiers of the _format_string argument are bound to the msg and severity attributes of the event. The send_notice program is launched with four command line arguments, such as shown in following example:

send_notice -m "Su to root failed for Joe" -s CRITICAL

Note that double quotation marks are used to delimit the value of the msg attribute so it is presented as a single argument to the send_notice command.

See Also

exec_program_local

Launches a program on the local event server.

Synopsis

exec_program_local(_name, _event, file_name, format_string, _arg_list, watch_status)

Description

This predicate launches a program asynchronously on the local event server (local means the server where the rule engine is installed). When the program finishes, a TASK_COMPLETE event is generated if the watch_status argument is set to 'YES'. This event contains details about the program's execution. The TASK_COMPLETE event class is defined in the tec.baroc file. A description of its attributes are as follows:
command
The name of the command to launch the program.
end_time
The time when the program finished.
execution_msg
Output from the program. This attribute contains a list of strings, each string representing a line of output from the program or script. This list is limited to 512 lines.
exit_status
The exit status set by the operating system for the program.
start_time
The time when the program started.
task_name
The name assigned to the program. It was assigned with the _name argument of the predicate.
task_number
An identifier for the executing program. These identifiers start at 1 and are incremented by 1 for each launch of a program.
task_status
The completion status for the program.
trigger_event_id
The identifier of the event that triggered the launch of the exec_program_local predicate.

Usually a pair of rules are created when using this predicate. The first rule launches the program. The second rule evaluates the results of the program when it is done and might take some action depending on the results.

Arguments

_arg_list
A list of values (typically attributes of the event) to be supplied to the program in the form [1, 2, 3]. All of the attributes of the trigger event are also available to the program through environment variables; for example, the msg attribute value can be obtained from the $msg environment variable. See the IBM Tivoli Enterprise Console Command and Task Reference for additional information about environment variables available to running tasks and programs.

For every format specification in the format string, there must be a corresponding element in the argument list. The data types in the format string must be compatible with their corresponding values in the argument list. If there are no % format specifications in the format string, the argument list must be an empty list, written as [ ]. The length of a formatted command line is limited to 256 characters.

_event
A pointer to the event that triggers running of the program. All attributes of this event are available to the program as environment variables. See the Tivoli Management Framework Reference Manual for additional information about environment variables available to running tasks and programs.
_format_string
The format string for formatting arguments to the command. %s (STRING), %d (INTEGER), and %ld (INT32) format specifications can be specified in the format string for use with the corresponding values in the argument list. If a format string is not specified, an empty _format_string argument must be specified in the form '' (two single quotation marks). The example in exec_program shows the use of format strings.
_name
The name to assign the program. It is used to identify the program in a TASK_COMPLETE event.
file_name
The path and file name of the program to run. Relative paths can be specified from the $BINDIR/TME/TEC directory.
watch_status
Specifies whether a TASK_COMPLETE event is to be generated. Valid values are:
'NO'
Do not generate a TASK_COMPLETE event when the program finishes. This argument must be enclosed in single quotation marks.
'YES'
Generate a TASK_COMPLETE event when the program finishes. This argument must be enclosed in single quotation marks.

Examples

The following example shows:

 

  1. In the program_start rule, the ls (list) program is launched upon the reception of a TEC_DB event. The program is launched with the following characteristics:
    • The program is given the name of list_tmpdir.
    • There are no additional arguments for the program's command line.
    • A TASK_COMPLETE event is to be generated when the program finishes.
  2. The program_result rule is triggered by the reception of a TASK_COMPLETE event with the task_name attribute set to list_tmpdir, which is the name of the program invoked in the previous rule.
  3. The process_program_result action of the program_result rule does the following:
    1. Gets the value of the execution_msg attribute from the TASK_COMPLETE event and unifies that value with the _results variable. This attribute is a list of strings.
    2. _results is searched for an element with a value of OK.
    3. If an element with a value of OK is found in the list, the ok predicate is run. If it is not found, the not_ok predicate is run.
rule:  program_start: (

  event: _event of_class  'TEC_DB'
    where [  ],

  reception_action: start_it: (
    exec_program_local('lst_tmpdir',_event,'ls /tmp',
                       '',[ ],'YES')  
  )
).

rule:  program_result: (

  event: _event of_class  'TASK_COMPLETE'
    where [task_name: _task_name equals 'lst_tmpdir',
           % Test for program name. If passed, assign
           % value to variable.

           task_number: _task_num
           % Assign task_number attribute value to
           % variable.
    ],

  reception_action: process_program_result: (

    bo_get_slotval(_event,execution_msg,_results),
    % Get value of execution_msg attribute and assign to
    % variable. Attribute is a list type.

    member(_result_line,_results),
      (_result_line == 'OK'  ->
      % Test each element for OK value.

          ok
          % OK value found in list. Run ok predicate.

        ;     % Else.

          do_not_ok_thing
         % OK value not found in list. Run not_ok
         % predicate.)
  ) 
).

See Also

exec_task

Launches a task from a task library.

Synopsis

exec_task(_event, task_name, format_string, _arg_list, watch_status)

Description

This predicate launches a task from a task library. The task's completion status can be monitored. Tasks provided by the Tivoli Enterprise Console product are described in the IBM Tivoli Enterprise Console Command and Task Reference.
Note:
Null arguments to exec_task might crash the event server when this predicate is run. In addition, ensure that all attributes passed to exec_task are instantiated.

Arguments

_arg_list
A list of values (typically attributes of the event) to be supplied to the program in the form [1, 2, 3]. All of the attributes of the trigger event are also available to the program through environment variables; for example, the msg attribute value can be obtained from the $msg environment variable. See the IBM Tivoli Enterprise Console Command and Task Reference for additional information about environment variables available to running tasks and programs.

For every format specification in the format string, there must be a corresponding element in the argument list. The data types in the format string must be compatible with their corresponding values in the argument list. If there are no format specifications in the format string, the argument list must be an empty list, written as [ ]. The length of a formatted command line is limited to 256 characters.

_event
A pointer to the event that triggers running of the program. All attributes of this event are available to the program as environment variables. See the Tivoli Management Framework Reference Manual for additional information about environment variables available to running tasks and programs.
_format_string
The format string for formatting arguments to the task. %s (STRING), %d (INTEGER), and %ld (INT32) format specifications can be specified in the format string for use with the corresponding values in the argument list. The format string contains the name of the task library, the host name where the task will run, and any command line arguments to the task in the following form:
-l tasklibname -h hostname -a arg1 -a arg2...
The example in exec_program shows the use of format strings.
Notes:
  1. The -l, -h, and -a arguments in the format string are the same as those used in the Tivoli Management Framework wruntask command. See the Tivoli Management Framework Reference Manual for details.
  2. The task name (-t TaskName) and pass environment variables (-E) wruntask command arguments are provided internally by the exec_task predicate.
task_name
Specifies the name of the task to run.
watch_status
Specifies whether task execution should be monitored. The watch_status argument can be 'YES' or 'NO'. This argument must be enclosed in single quotation marks. If 'YES', the completion status command can be checked from the event console.

Examples

  1. The following example shows that the Send_Email task from the T/EC Tasks library is launched on host stumpy. Two arguments are passed to the task: the administrator's name to appear in the To field of the note, and the administrator's e-mail address. Task completion will not be monitored. The wruntask command example shows how the task is actually launched with the arguments resolved.
    exec_task(_event,
      'Send_Email',
      '-l "T/EC Tasks" -h "stumpy" -a "%s" -a "%s"',
      ['joe@company.com', 'joe@company.com'],
      'NO')
    wruntask -t Send_Email -l "T/EC Tasks" -h "stumpy" -E 
    -a "joe@company" -a "joe@company"
  2. The following example shows how to run a task based on an event sent from Tivoli Distributed Monitoring, which is monitoring an application instance of an MS SQL database. The filtering criteria for the rule is an event of class MSSQLDatabase_LogSpacePercentUsedDB with a severity of CRITICAL. The value for the collection attribute contains the resource type being monitored. The resource type and host name are instantiated in variables for use in the exec_task call.
    rule: plain_rule1_42: (
    description:'ADSM incremental backup task',
    
      event: _ev1 of_class within [
             'MSSQLDatabase_LogSpacePercentUsedDB']
        where [severity: _ev1_severity
               collection: _ev1_collection,
               hostname: _ev1_hostname
        ] ,
    
      reception_action: action0: (
        (exec_task(_ev1, 
                   'ADSMIncBackup', 
                   '-l MSSQLManagerTasks -h \'@%s:%s\'',
                   [_ev1_collection,_ev1_hostname], 
                    'YES'
                   )
        )
      )
    ).

    The exec_task call resolves to the following command when an event is received for an MSSQLDatabase collection on host master@holon@holon:

    wruntask -t ADSMIncBackup -l MSSQLManagerTasks \
    -h @MSSQLDatabase:master@holon@holon -E

See Also

exec_task_local

Launches a task from a task library on the local event server.

Synopsis

exec_task_local(_name, _event, file_name, format_string, _arg_list, watch_status)

Description

This predicate launches a task asynchronously from a task library on the local server (local means the event server where the rule engine is installed). Tasks provided by the Tivoli Enterprise Console product are described in the IBM Tivoli Enterprise Console Command and Task Reference.

 

Note:
This predicate can only be run on a managed node.

When the program finishes, a TASK_COMPLETE event is generated if the watch_status argument is set to 'YES'. This event contains details about the task's execution. The TASK_COMPLETE event class is defined in the root.baroc file. A description of its attributes are as follows:

command
The name of the command to launch the task.
end_time
The time when the task finished.
execution_msg
Output from the task. This attribute contains a list of strings, each string representing a line of output from the program or script. This list is limited to 512 lines.
exit_status
The exit status set by the operating system for the task.
start_time
The time when the task started.
task_name
The name assigned to the task. It was assigned with the _name argument of the predicate.
task_number
An identifier for the executing task. These identifiers start at 1 and are incremented by 1 for each launch of a task.
task_status
The completion status for the task. These values are defined in the root.baroc file as RUNNING, SUCCESS, FAILURE, and UNKNOWN.
trigger_event_id
The identifier of the event that triggered the launch of the exec_task_local predicate.

Usually a pair of rules are created when using this predicate. The first rule launches the task. The second rule evaluates the results of the task when it is done and might take some action depending on the results.

Arguments

_arg_list
A list values (typically attributes of the event) to be supplied to the program in the form [1, 2, 3]. All of the attributes of the trigger event are also available to the task through environment variables; for example, the msg attribute value can be obtained from the $msg environment variable. See the IBM Tivoli Enterprise Console Command and Task Reference for additional information about environment variables available to running tasks and programs.

For every format specification in the format string, there must be a corresponding element in the argument list. The data types in the format string must be compatible with their corresponding values in the argument list. If there are no format specifications in the format string, the argument list must be an empty list, written as [ ]. The length of a formatted command line is limited to 256 characters.

_event
The pointer to the event that triggers running of the task. All attributes of this event are available to the task as environment variables. See the Tivoli Management Framework Reference Manual for additional information about environment variables available to running tasks and programs.
_format_string
The format string for formatting arguments to the command. %s (STRING), %d (INTEGER), and %ld (INT32) format specifications can be defined in the format string for use with the corresponding values in the argument list. If a format string is not specified, an empty _format_string argument must be specified in the form '' (two single quotation marks). The format string contains the name of the task library, the host name where the task will run, and any command line arguments to the task in the following form:
-l tasklibname -h hostname -a arg1 -a arg2...

The example exec_program shows the use of format strings.

Notes:
  1. The -l, -h, and -a arguments in the format string are the same as those used in the Tivoli Management Framework wruntask command. See the Tivoli Management Framework Reference Manual for details.
  2. The task name (-t TaskName) and pass environment variables (-E) wruntask command arguments are provided internally by the exec_task_local predicate.
_name
The name to assign the task. It is used to identify the task in a TASK_COMPLETE event.
file_name
The path and file name of the task to run. Relative paths can be specified from the $BINDIR/TME/TEC directory.
watch_status
Specifies whether a TASK_COMPLETE event is to be generated. Valid values are:
'NO'
Do not generate a TASK_COMPLETE event when the task finishes. This argument must be enclosed in single quotation marks.
'YES'
Generate a TASK_COMPLETE event when the task finishes. This argument must be enclosed in single quotation marks.

Examples

The following example shows:

 

  1. In the task_start rule, the send_dbadmin task is launched upon the reception of a TEC_DB event. The program is launched with the following characteristics:
    • The task is given the name of send_dbadmin.
    • There are two arguments for the task's command line.
    • A TASK_COMPLETE event is to be generated when the task finishes
  2. The task_result rule is triggered by the reception of a TASK_COMPLETE event with the task_name attribute set to send_dbadmin, which is the name of the task launched in the previous rule.
  3. The process_task_result action of the task_result rule does the following:
    1. Gets the value of the execution_msg attribute from the TASK_COMPLETE event and unifies that value with the _results variable. This attribute is a list of strings.
    2. _results is searched for an element with a value of OK.
    3. If an element with a value of OK is found in the list, the ok predicate is run. If it is not found, the not_ok predicate is run.
rule:  task_start: (

  event: _event of_class  'TEC_DB'
    where [  ],

  reception_action: start_it: (
    exec_task_local(
      'send_dbadmin,
      _event,
      'Send_Email',
      '-l "T/EC Tasks" -h "stumpy" -a "%s"
      -a "%s"',
      ['joe@company.com','joe@company.com'],
      'YES'
    )  
  )
).

rule: task_result: (

  event: _event of_class  'TASK_COMPLETE'
    where [task_name: _task_name equals 'send_dbadmin',
           % Test task name. Assign task_name value to
           % variable if passed.

           task_number: _task_num
           % Assign task_number attribute value to
           % variable.
    ],

  reception_action: process_task_result: (

    bo_get_slotval(_event,execution_msg,_results),
    % Get value of execution_msg attribute and assign to
    % variable. Attribute is a list type.

    member(_result_line,_results),
      (_result_line == 'OK'  ->
      % Test each element for OK value.

          ok
          % OK value found in list. Run ok predicate.

        ;     % Else.

          do_not_ok_thing
         % OK value not found in list. Run not_ok
         % predicate.)
  ) 
).

See Also

first_causal_event

Searches the event cache for the root cause event related to an effect event.
Note:
Generally, the term causal event has been replaced by the term cause event, even though this command name has not been changed.

Synopsis

first_causal_event(_effect_event, _cause_event)

--OR--

first_causal_event(_effect_event, _cause_event, time_before, time_after)

Description

This predicate is used to search the event cache for the root cause event in an event sequence defined with the create_event_sequence predicate. The event must also meet the criteria defined with the create_event_sequence predicate. For example, if events A, B, C, and D are defined as an event sequence in that order, and event D is event under analysis, this predicate will return the most recent instance of event A if it exists and meets the defined criteria, otherwise return event B if it exists, otherwise return event C if it exists, otherwise it fails.

If the time_before and time_after arguments are not specified, the event cache search time window defaults to 2 years (1 year before and 1 year after). You should limit a time window to the smallest reasonable window whenever possible for better performance.

Arguments

_cause_event
A pointer to the root cause event found for the effect event. This argument must be free.
_effect_event
A pointer to the effect event whose cause event is being searched for. Typically the event under analysis.
time_after
The number of seconds after the effect event has been received. This argument is used to limit the event cache search to a time window.
time_before
The number of seconds before the effect event has been received. This argument is used to limit the event cache search to a time window.

Examples

The following example searches the event cache for a related cause event that has been previously received. If one is found, the effect event is acknowledged and linked to the cause event. This rule triggers on a superclass but it searches for a cause event. This design lets you create a single rule to process any number of events that are related.
rule: 'link_effect_to_cause':(

  event: _effect of_class 'EVENT',

  action: 'search_for_cause':(
    first_causal_event(_effect, _cause, 3600, 0),
    set_event_status(_effect, 'ACK'),
    link_effect_to_cause(_effect, _cause)
  )
).

See Also

first_duplicate

Succeeds once for the first (most recent) duplicate event in the event cache that satisfies the specified additional attribute and time window conditions.

Synopsis

first_duplicate(_event, event: _duplicate where attribute_conditions)

--OR--

first_duplicate(_event, event:_duplicate where attribute_conditions, _referenceEvent -time_before -time_after)

Description

No class specification is required, since the duplicate events are always of the same class. For additional information about duplicate events, see What is a duplicate event?.

If the -time_before and -time_after arguments are not specified, the event cache search time window defaults to 2 years (1 year before and 1 year after). You should limit a time window to the smallest reasonable window whenever possible for better performance.

Arguments

_event
A pointer to the event currently under analysis.
_referenceEvent
A pointer to the reference event for the time window, typically the event under analysis.
event:_duplicate where attribute_conditions
Specifies an event filter for querying the event cache. _duplicate is instantiated with a pointer to each duplicate event found. See Event filters for additional information.
-time_after
The number of seconds after the reference event.
-time_before
The number of seconds before the reference event.

Examples

The following example rule shows how to count the number of duplicate NFS_NOT_RESPONDING events that are received and then drop them so they're not stored in the event database. This results in one event kept with its repeat_count attribute updated each time a duplicate is received.

Note that this example doesn't specify a time window argument, thus defaulting to a 2 year window (1 year before and 1 year after). You should limit a time window to the smallest reasonable window whenever possible for better performance.

rule: dup_nfs_not_resp:(

  event: _event of_class 'NFS_NOT_RESPONDING',

  action: dup_and_drop_event:(
    first_duplicate(_event,event: _dup_nfs_ev
                    where [status: outside ['CLOSED'] ]
    ),

    add_to_repeat_count (_dup_nfs_ev, 1),

    drop_received_event
  )
).

See Also

first_effect_event

Searches the event cache for the logically earliest effect event related to a cause event.

Synopsis

first_effect_event(_cause_event, _effect_event)

--OR--

first_effect_event(_cause_event, _effect_event, time_before, time_after)

Description

This predicate is used to search the event cache for the effect event related to a cause event in an event sequence defined with the create_event_sequence predicate.

If the time_before and time_after arguments are not specified, the event cache search time window defaults to 2 years (1 year before and 1 year after). You should limit a time window to the smallest reasonable window whenever possible for better performance.

Arguments

_cause_event
A pointer to the cause event whose effect event is being searched for. Typically the event under analysis.
_effect_event
A pointer to the effect event found for the cause event. This argument must be free.
time_after
The number of seconds after the cause event has been received. This argument is used to limit the event cache search to a time window.
time_before
The number of seconds before the cause event has been received. This argument is used to limit the event cache search to a time window.

Examples

The following example searches the event cache for a related effect event that has been previously received. If one is found, the effect event is acknowledged and linked to its related cause event. This rule triggers on a superclass but it searches for a related effect event. This design lets you create a single rule to process any number of events that are related.
rule: 'link_cause_to_effect':(

  event: _cause of_class 'EVENT',

  action: 'search_for_effect':(
    first_effect_event(_cause, _effect, 3600, 0),
    set_event_status(_effect, 'ACK'),
    link_effect_to_cause(_effect, _cause)
  )
).

See Also

first_instance

Succeeds once for the first (most recent) event in the event cache that satisfies the specified class, attribute, and time window conditions.

Synopsis

first_instance(event: _event of_class class where attribute_conditions)

--OR--

first_instance(event: _event of_class class where attribute_conditions, _referenceEvent -time_before -time_after)

Description

Succeeds once for the first event that satisfies the specified class, attribute, and time window conditions.

Arguments

_referenceEvent
A pointer to the reference event for the time window, typically the event under analysis.
event:_event of_class class where attribute_conditions
Specifies an event filter. See Event filters for additional information.
-time_after
The number of seconds after the reference event.
-time_before
The number of seconds before the reference event.

Examples

The following example shows a rule that:
  1. Queries the event cache for the first instance of a universal_host event with the following additional conditions:
    • The status is not CLOSED.
    • The probe_arg attribute for the first instance of the event in the cache has the same value as the server attribute for the event under analysis, which is an NFS_No_Response event.
    • The event's severity is CRITICAL.
    • The time window for searching is 20 minutes surrounding the event under analysis.
  2. If an event meeting these conditions is found in the event cache, its severity is upgraded to FATAL.
rule: escalate: (
description: 'escalate host down events when causing NFS
              problems',

  event: _event of_class 'NFS_No_Response'
    where [ server: _server],

  action: 'increase_sev': (
    first_instance(event: _down_ev of_class
                          'universal_host'
                     where [status: outside ['CLOSED'],
                            probe_arg: equals _server,
                            severity: equals 'CRITICAL'],
                    _event - 600 - 600 ),

    set_event_severity(_down_ev, 'FATAL') 
  )
).

See Also

first_related_event

Searches the event cache for the logically earliest event related to a reference event.

Synopsis

first_related_event(_referenceEvent, _related_event, _relation)

--OR--

first_related_event(_referenceEvent, _related_event, _relation, time_before, time_after)

Description

This predicate is used to search the event cache for the logically earliest cause or effect event related to the reference event. Logically earliest means as defined from left-to-right in an event sequence, with the logically earliest event starting from the left. If the found event is a cause event, the _relation argument is instantiated with the value of c. If the found event is an effect event, the _relation argument is instantiated with a value of e. For example, if events A, B, C, and D are defined with the create_event_sequence predicate as an event sequence in that order and the first_related_event predicate is called with an instance of event C as the reference event, the first instance of event A would be returned with the _relation argument instantiated with a value of c if it exists, otherwise event B would be returned with _relation set to c if it exists, otherwise event D would be returned with _relation set to e if it exists, otherwise the predicate fails.

This predicate should be used whenever correlation is needed to find cause events in the event cache, and then find effect events if a cause event is not found. Because this predicate only performs one search, it is more efficient than using the first_causal_event predicate followed by the first_effect_event predicate sequence of calls.

If the time_before and time_after arguments are not specified, the event cache search time window defaults to 2 years (1 year before and 1 year after). You should limit a time window to the smallest reasonable window whenever possible for better performance.

Arguments

_referenceEvent
A pointer to the reference event whose logically earliest related event is being searched for.
_related_event
A pointer to the logically earliest related event found for the reference event. This argument must be free.
_relation
The relationship of the found event to the reference event. This argument must be free. Valid values are:
c
A cause event to the reference event.
e
An effect event to the reference event.
time_after
The number of seconds after the reference event has been received. This argument is used to limit the event cache search to a time window.
time_before
The number of seconds before the reference event has been received. This argument is used to limit the event cache search to a time window.

Examples

The following example searches the event cache for the logically earliest event related to the event under analysis. If one is found, the found event is acknowledged and linked to the reference event, either as a cause event or effect event, depending upon the returned value of the relation argument. This rule triggers on a superclass but it searches for a related event. This design lets you create a single rule to process any number of events that are related.
rule: 'link_effect_to_cause':(

  event: _ev of_class 'EVENT',

  action: 'search_for_cause_or_effect':(
    first_related_event(_ev, _related,_relation, 3600,0),
    (
      _relation == 'c',
      set_event_status(_ev, 'ACK'),
      link_effect_to_cause(_ev, _related)
    ;
      set_event_status(_related, 'ACK'),
      link_effect_to_cause(_related, _ev)
    )
  )
).

See Also

forward_event

Forwards an event to an event server.

Synopsis

forward_event(_event)

Description

This predicate forwards an event to an event server.

The predicate looks for the event server's location in the tec_forward.conf file (located in the rule_base_dir/TEC_RULES directory). You must edit the tec_forward.conf file and change the value for the ServerLocation option to the host name of the system for the forwarded event.

Note:
Because the forward_event predicate uses a non-Tivoli connection, you must specify the location of the server as an IP address or TCP/IP host name, regardless of whether there is a connection between the Tivoli regions.

The default setting in the tec_forward.conf file for the TestMode option is yes. This means that events are forwarded to a file. In order to actually forward an event to an event server, you must delete or comment out the TestMode option in the tec_forward.conf file.

Arguments

_event
A pointer to the event to forward, typically the event under analysis.

Examples

The following example forwards events with a severity of CRITICAL or FATAL to the event server specified in the tec_forward.conf file:
rule: escalate: (

  event: _evt of_class within [ 'EVENT' ] 
    where
      [severity: within ['CRITICAL', 'FATAL'],

  reception_action: action0:(
    forward_event(_evt)
  )
).

See Also

None.

generate_event

Generates an internal event.

Synopsis

generate_event(event_class, list_of_event_attributes)

Description

This predicate generates an event internally; that is, from within the event server instead of externally from a source such as an event adapter.

Arguments

event_class
The event class for the generated event.
list_of_event_attributes
The attributes for the generated event. The attributes must be specified in a list using the following format:

[attribute1=value1, attribute2=value2,...]

Examples

The following example generates an event of class TradingDBDown with 4 attributes:
action:(
  generate_event('TradingDBDown',
                 [source='SNMP',
                  origin=_origin,
                  hostname=_host,
                  msg='Trading DB host is down']
                )
)

See Also

None.

get_attributes

Retrieves event attribute values.

Synopsis

get_attributes(_event, [ attribute_name=_attribute_value, ...] )

Description

This predicate retrieves the values of event attributes and instantiates variables with those values. The second argument is in list format.

Arguments

_attribute_value
The variable to instantiate with the attribute value.
_event
The event from which to get the attribute values. Typically the event under analysis.
attribute_name
The name of the attribute whose value to retrieve.

Examples

The following example retrieves the hostname, severity, and status attribute values from the event under analysis and instantiates them in the _hostname, _severity and _status variables, respectively:
get_attributes(_event,[hostname=_hostname,
                       severity=_severity,
                       status=_status
                      ] 
)

See Also

None.

get_config_param

Gets a rule engine configuration setting.

Synopsis

get_config_param(_name, _variable, default)

Description

This predicate gets a rule engine configuration value defined in the $BINDIR/TME/TEC/.tec_config file and unifies it with a variable.

If the _name argument does not exist in the file, the default argument is unified with the _variable argument.

Arguments

_name
The name of the configuration setting.
_variable
The variable to unify with the value of the configuration setting.
default
The value to unify with _variable if _name does not exist as a configuration setting in the file.

Examples

The following example sets the _tec_rule_host variable to chair. If the tec_rule_host setting did not exist in the file, the variable would have been set to a value of not set. Some .tec_config file entries are shown first.
#.tec_config settings

#tec_rule_cache_size=10000
#tec_rule_cache_full_history=86400
#tec_rule_cache_non_closed_history=155520
#tec_rule_cache_clean_freq=3600
tec_rule_trace=YES
tec_rule_trace_file=/tmp/rules.trace
tec_rule_host=chair
tec_server_handle=5
get_config_param(tec_rule_host,_tec_rule_host,'not set')

See Also

None.

get_global_grp

Gets the value of all global variables in a group.

Synopsis

get_global_grp(_group, _key,_value)

Description

This predicate gets the value of all global variables in a group from the knowledge base. The predicate loops through the variables in the group and instantiates _value for each variable found. _value and _key must be free.

Arguments

_group
The group key for the variables.
_key
The key for the variables. Must be free.
_value
The value of the key. Must be free.

Examples

The following example gets all of the global variables for the Maintenance group.
get_global_grp('Maintenance', _key, _value),

See Also

get_global_var

Gets a value of a global variable.

Synopsis

get_global_var(_group, _key,_value,_default)

Description

This predicate gets the value of one global variable from the knowledge base and unifies it with _value. If the variable has no value, it is set to _default and _default is unified with _value. _value must be free.

Arguments

_default
The value for the variable if it currently has no value.
_group
The group key for the variable.
_key
The key for the variable.
_value
The value of the key. Must be free.

Examples

The following example rule:
  1. Gets the value for a global variable with a group key of Maintenance and a key value equal to the value of the origin attribute of the event under analysis. If there is no value for that key, it is initialized to a value of off.
  2. A check is performed on the value of the global variable to see if the host is in maintenance mode.
  3. If the check is true, the event under analysis is dropped and the rule set is exited.
rule:
check_maint_mode:
(
     event: _event of_class _event_class
          where [
               origin: _origin
          ],
     reception_action:
          (
          get_global_var('Maintenance', _origin, _maint_mode, 'off'),
          _maint_mode == 'on',
          drop_received_event,
          commit_rule
          )
).

See Also

get_globals

Gets all global variables.

Synopsis

get_globals(_group, _key,_value)

Description

This predicate returns the group key, key, and value for each global variable. The arguments must be free. The predicate loops through the global variables and instantiates the three arguments for each global variable found.

Arguments

_group
The group key.
_key
The key.
_value
The value of the key.

Examples

The following example shows predicate usage:
get_globals(_group,_key,_value)

See Also

get_gm_time

Gets the current time represented in Greenwich mean time (GMT).

Synopsis

get_gm_time(_time_gm_struct)

Description

This predicate gets the current time represented in GMT. _time_gm_struct must be free.

Arguments

_time_gm_struct
Represents a time structure in GMT. Do not confuse it with the data returned by the get_time predicate, in which the value for the _time_epoch argument is a number representing how many seconds have passed since an epoch.

Examples

The following example shows how to get the structure for the current time in GMT, convert the time to a string, and update the time_string attribute of the event with the string:
get_gm_time(_time_gm_struct),
convert_ascii_time(_time_gm_struct, _time_string),
bo_set_slotval(_event, time_string, _time_string)

See Also

get_local_time

Gets the current local system time.

Synopsis

get_local_time(_time_local_struct)

Description

This predicate gets the current local system time. _time_local_struct must be free.

Arguments

_time_local_struct
Represents a time structure in local system time. Do not confuse it with the data returned by the get_time predicate, in which the value for the _time_epoch argument is a number representing how many seconds have passed since an epoch.

Examples

The following example shows how to get the structure for the current local system time, convert the time to a string, and update the time_string attribute of the event with the string:
get_local_time(_time_local_struct),
convert_ascii_time(_time_local_struct, _time_string),
bo_set_slotval(_event, time_string, _time_string)

See Also

get_time

Gets the current time represented by an integer since the epoch, which is 00:00:00 Greenwich mean time (GMT) 01 Jan 1970 for most systems.

Synopsis

get_time(_time_epoch)

Description

This predicate gets the current time represented by an integer number of seconds since the epoch. _time_epoch must be free.

Arguments

_time_epoch
Represents an epoch time number. Do not confuse it with the data returned by the get_local_time predicate, in which the value for the _time_local_struct argument is a time structure.

Examples

The following example shows how to get the epoch time number and then update the time_epoch attribute of the event with the number.
get_time(_time_epoch),
bo_set_slotval(_event, time_epoch, _time_epoch)

See Also

global_exists

Checks the existence of a global variable.

Synopsis

global_exists(_group, _key)

Description

This predicate checks that the global variable in group key _group at key _key exists. If the variable exists, the predicate succeeds.

Arguments

_group
The group key for the variable to check.
_key
The key for the variable to check.

Examples

The following example checks for the global variable whose key is the value of the origin attribute of the event under analysis and belongs to the Maintenance group:
global_exists('Maintenance',_origin)

See Also

None.

increment_slot

Adds a number to the value of the specified integer attribute.

Synopsis

increment_slot(_event, _attribute_name, _by_value, _trigger)

Description

This predicate adds a number to the value of the specified integer attribute.
Note:
Generally, the term slot has been replaced by the term attribute, even though this command name has not been changed.

Arguments

_attribute_name
The attribute to change.
_by_value
The amount to add.
_event
A pointer to the event to change.
_trigger
Specifies whether change rules should be evaluated as a result of this attribute change. Valid values are: 'YES', yes, 'NO', or no.

Examples

The following example shows predicate usage:
increment_slot(_event,host_down,1,no)

See Also

init_count

Creates and initializes a counter.

Synopsis

init_count(_key1, _key2, _value)

Description

This predicate creates a counter identified by the values of _key1 and _key2. It also initializes the counter to the value of _value. Typically, the initial value is set to 0.

This predicate is used in conjunction with the check_and_increment_count predicate, which is used to increment a count and compare it to a threshold value.

Notes:
  1. If a counter is not created and initialized with the init_count predicate, it is created and initialized to 0 the first time the check_and_increment_count predicate is called to check the counter.
  2. Once initialized, a counter continues counting until explicitly reinitialized with a new starting value.
  3. You must reinitialize a counter that has reached its threshold if it is still needed for counting.

Counters are used to keep track of any arbitrary numeric value. The values of _key1 and _key2 can be set to easily identify the information being recorded. For example:

  • To keep track of the number of times a particular event occurs on each host, the keys could be named using an event_class,hostname scheme; thereby creating a counter for each event and each host. For example, perf_alert,orange.
  • To keep track of the number of times a particular failure occurs on a set of components, the keys could be named using a failure,component scheme; thereby creating a counter for each component and each failure. For example, paper_jam,flr4rm23.

If the event server stops, all counters are discarded.

Arguments

_key1
The primary key name for the counter. Must be instantiated.
_key2
The secondary key name for the counter. Must be instantiated.
_value
The value to initialize the counter. Must be instantiated.

Examples

The following example counts the number of paper jams on a set of printers, based on receiving an event class of Printer_Jam. Printer counters are identified using a failure,component scheme. Printer_Jam events identify each printer in the hostname attribute.

Each counter is created, initialized to 0, and incremented to 1 the first time the check_and_increment_count predicate is called for a particular printer. Each subsequent call for that printer increments the count and then compares the count to the threshold value.

An administrator is notified when the number of paper jams on a printer reaches 5, and then the counter for that printer is reset to 0 using the init_count predicate. The administrator notification and reset of a counter is done in an ELSE clause of a Prolog statement because the check_and_increment_count predicate behavior is to fail when the count matches the threshold value.

rule: printer_jam: (

  event : _ev of_class 'Printer_Jam'
    where [hostname: _hn within ['flr4rm23',
                                 'flr3rm12',
                                 'flr1rm11',
                                 'flr6rm9'
                                ],

  action: check_count: (
    (check_and_increment_count(printer_jam,_hn,5,_count)
    
    ;
    % ELSE clause follows

    exec_program(_ev,'scripts/notify.sh',
                 'Printer failure on %s', [_hn], no),

    init_count(paper_jam,_hn,0)
    )
  )
).

See Also

init_event_activity

Defines the reporting criteria for generating an event activity report.

Synopsis

init_event_activity(_file, _event_ exclusions, _attribute_criteria, _threshold)

Description

This predicate defines the file for the report and defines the criteria for generating the report. An event activity report contains summary counts of the events in the event cache. You can configure the report to exclude particular events, filter on event attributes, and exclude counts that fall below a threshold value.

This predicate should be run in a rule triggered by a TEC_Start event at event server start-up time. This loads the predicate once, instead of every time it is needed.

Arguments

_attribute_criteria
The attributes whose summary counts to include in the report. The argument must be in list format; for example, [source, hostname, severity].

A list element can be a single attribute or a nested list of multiple attributes; for example, [hostname, severity] can be one element. In the example of the [hostname, severity] nested list, a count of each severity is given for each host.

The class keyword can be used in a nested list to count by event class name. For example, the [class, hostname] nested list provides a count of each host for each event class.
_event_exclusions
The class names of the events whose information to exclude in the report. This argument must be in list format; for example, ['TEC_Heartbeat', 'TEC_Maintenance'].
_file
The path and file name where the report is written.
_threshold
Any specification in the _attribute_criteria argument whose count is less than this value is not be shown in the report.

Examples

  1. The following example shows how to use the predicate:
    _rep_freq is 20,
    init_event_activity(
      '/tmp/event_activity',  
      % Report file
    
      ['TEC_Heartbeat',     % Do not report these events
       'TEC_Maintenance'
      ],
    
      [source,              % Single attribute reporting
       hostname, 
       severity,
       status,
    
       [hostname,severity], % Multiple attribute reporting
    
       [class,hostname]     % Class reporting
      ],
      5                     % Do not report counts less
                            %than this 
    ),
  2. The initial timer for an event activity report must be started by a TEC_Tick event. The following example shows how to do this:
    rule: configure_event_activity: (
    
      event: _event of_class 'TEC_Tick'
        where [msg: _msg equals 'Event Activity Report',
               duration: _reporting_frequency],
    
      reception_action: start_timer: (
        set_timer(_event,_reporting_frequency,_msg),
        commit_rule
      )
    ).
  3. The following example shows a fragment of an event activity report:
                    Event Activity For Server tkennedy
    
    From: Thu Mar 02 14:14:02 2000.
    To  : Thu Mar 02 14:14:18 2000.
    
    Reporting Frequency: 0 Minutes.
    
    Total Events:        3332 
    
    Reporting Threshold: 5  
    
    =============================================================
                            Event Class Summary
    =============================================================
    
    Count                Class Name
    -------------------------------------------------------------
    849                TEC_Tick
    848                TEC_DB
    822                TEC_Notice
    812                TEC_Error
    
    
    =============================================================
                            Slot  Summary
    =============================================================
    
    Count                Slot Criteria
    -------------------------------------------------------------
    3332                status=OPEN
    590                severity=MINOR
    574                severity=WARNING
    564                severity=CRITICAL
    550                severity=UNKNOWN
    544                severity=HARMLESS
    510                severity=FATAL
    12                hostname=midnight.austin.lab.tivoli.com
    12                source=69.1.3.30
    11                hostname=dhcp12-235.austin.lab.tivoli.com
    11                source=69.1.12.235
    11                hostname=stingray.austin.lab.tivoli.com
    11                source=69.1.5.82
    10                hostname=austin.lab.tivoli.com
    10                source=69.1.1.6

See Also

ip_node_unreachable

Determines if the event was sent from an unreachable subnet.

Synopsis

ip_node_unreachable(_ipaddress, _event)

Description

This predicate tests to see if the given IP address is contained in the cache the event server maintains of unreachable IP addresses. If _ipaddress matches the subnet address and subnet mask of any of the subnets contained in the cache, the _event argument is set to the event handle for the corresponding TEC_ITS_SUBNET_STATUS event.

This event can then be used to correlate the current event to the corresponding TEC_ITS_SUBNET event by using the link_effect_to_cause predicate. To check the success of the correlation, test that the the _event argument is unified with an existing event, by using ground(_event). If it succeeds, there is a valid correlation.

This predicate requires that the NetView component be installed and that the netview.rls rule set is active.

Arguments

 
_ipaddress
The unreachable IP address.
_event
If the predicate is successful, this argument contains the event handle of the cached TEC_ITS_SUBNET_STATUS event that matches the IP address. If no event is found, this argument is unchanged.

is_clearing_event

Tests whether an event has been defined as a clearing event with the create_clearing_event or create_event_sequence predicate.

Synopsis

is_clearing_event(_event)

Description

This predicate is used to test whether a rule or rule action should be run. If the event has been defined as a clearing event with the create_clearing_event or create_event_sequence predicate, and meets all of the appropriate conditions of the definition, the is_clearing_event predicate succeeds.

Arguments

_event
A pointer to the event to test if it is a clearing event.

Examples

The following example rule fragment tests whether the event under analysis is a clearing event. If the test passes, processing would continue with the next statement in the action. This rule triggers on the base event, so every incoming leaf event is tested.
rule: 'process_clearing_events':(

  event: _ev of_class 'EVENT',

  reception_action: 'check_for_clear':(
    is_clearing_event(_ev),
    ...

See Also

link_effect_to_cause

Links an effect event to a cause event.

Synopsis

link_effect_to_cause(_effect_event, _cause_event)

Description

This predicate updates the cause_date_reception and cause_event_handle attributes of the effect event so that these attributes contain a reference to the cause event. The value of the date_reception attribute of the cause event is placed in cause_date_reception attribute and the value of event_handle attribute of the cause event is placed in the cause_event_handle attribute.

Arguments

_cause_event
The cause event.
_effect_event
The effect event.

Examples

The following example links a universal_oserv event to a universal_host event if they are related, determined by their probe_arg attribute values. If they are related, the status attribute for the universal_oserv event is set to ACK.
rule: link_oserv_to_host: (

  event: _event of_class 'universal_oserv'
    where [probe_arg: _probe_arg,
           severity: equals 'WARNING'
          ],

  action: 'link_host': (
    first_instance(event: _host_ev of_class 
                   'universal_host'

      where [severity: within ['CRITICAL','FATAL'],
             probe_arg: equals _probe_arg,
             status: outside ['CLOSED']
            ]),
    set_event_status(_event,'ACK'),
    link_effect_to_cause(_event, _host_ev) 
  )
).

See Also

load_globals

Loads global variables from a file into the knowledge base.

Synopsis

load_globals(_file)

Description

This predicate loads all the global variables from a file into the knowledge base.

Arguments

_file
The path and file name that contains the variable definitions to load.

Examples

The following example shows predicate usage:
load_globals('/tmp/globalvars.txt')

See Also

log_error

Generates error messages to assist in rule development.

Synopsis

log_error(format_string, variable_list, severity)

Description

This predicate should be run from within a predicate you've created to help you debug rules. The more recent rule language predicates provided by IBM have this predicate embedded within them to help you debug problems with rules.

This predicate provides error messages the following ways:

  • Written to a file
  • TEC_Error events sent to the event serve.

Before calling the log_error predicate, do the following setup tasks:

  • Define a rule that runs the tell_err built-in Prolog predicate. This predicate directs error messages to a specific file. You can call it from a rule that triggers on a TEC_Start event so it loads at event server start-up, or you can call it from within a rule that triggers on some other criteria. The tell_err predicate has the following format:
    tell_err('filename')
  • Define the source location within the rule, so you have a point of reference from a generated error message. See set_log_error_source for additional information.

Arguments

_format
The format specification for the output. The following format specifications are valid:
%c
Character.
%d
Integer printed in decimal notation.
%e
Real printed in exponential notation.
%f
Real printed in decimal notation.
%g
Real printed in its shortest form (decimal or exponential notation).
%o
Integer printed in octal notation, without sign and leading zero.
%s
String.
%u
Integer printed in unsigned decimal notation.
%x
Integer printed in hexadecimal notation, without sign and leading 0x.

You can supply more detailed conversion specifications between the % sign and the conversion character, as follows:

-
Left adjustment.
0
Zero padding to the left.
n
In cases of an integer or a string, n is the minimum length of the field.
n.m
In cases of a real, n is the minimum length of the field and m indicates the number of digits after the decimal point.
severity
Specifies the severity to assign the generated TEC_Error event.
variable_list
A list of variables whose values will appear in the message. There must be a matching format specification for each variable.

Examples

The following example shows a user-defined predicate named my_predicate that receives an argument (_data) that is actually passed in to be the argument of a user-defined predicate named check_data. The check_data predicate is the predicate to be debugged. The logic is as follows:
  1. Define a source location named my_predicate for a point of reference from an error message. This is done with the set_log_error_source predicate.
  2. Enable rule tracing of the check_data predicate with the trace_it predicate. To actually write the rule trace information to a file, the set_detailed_debugging predicate had to be run previously. The trace_it predicate just enables tracing.
  3. If the check_data predicate fails, the log_error predicate is run and the Bad Data message along with the my_predicate source identifier are written to the error file. The tell_err predicate had to be run previously to define the location and name of the error file.

    Additionally when the check_data predicate fails, a TEC_Error event is sent to the event server with a severity of CRITICAL, the Bad Data message, and the my_predicate source identifier. The message and the source identifier are assigned to the msg attribute of the event.

my_predicate(_data):-
  set_log_error_source(my_predicate),

  (
    trace_it(check_data), 
    process_data(_data)

  ;
    log_error('Bad Data %s',[_data],'CRITICAL')
  )

See Also

place_change_request

Requests a change to an attribute value.

Synopsis

place_change_request(_event, _attributename, _newattributevalue)

Description

Change rules are triggered in response to the requested change. If there are no change rules in the rule base, the bo_set_slotval predicate would be a more efficient choice to change an attribute value, because processing resources are not used to check the rule base for change rules.

Arguments

_attributename
The attribute to change.
_event
A pointer to the event containing the attribute to change.
_newattributevalue
The value to assign the updated attribute.

Examples

The following example requests to change the hostname attribute to a value of myhost:
place_change_request(_event, hostname, myhost)

See Also

print_cache

Writes the event cache to a file.

Synopsis

print_cache(file_name)

--OR--

print_cache(file_name, event:_event of_class class where attribute_conditions)

Description

This predicate writes the event cache to a file. Two forms are provided:
  • Without the event filter: writes the entire event cache
  • With an event filter: writes certain events in the cache

Arguments

event:_event of_class class where attribute_conditions
Specifies an event filter for identifying particular events to write to the file. Do not use the same for the _event and class variables as those used in the event filter for the rule.
file_name
The path and file name to which the event cache is written.

Examples

  1. The following example writes all of the events in the event cache to the /tmp/cache file.
    print_cache('/tmp/cache/')
  2. The following example writes all events of class TEC_Start in the event cache to the /tmp/cache file.
    rule: print_cache: (
          event: _event of_class _class,
          reception_action: (
    print_cache('/tmp/cache', event: _cached_event of_class 'TEC_Start')
           )
    ).
  3. The following example writes all events whose status attribute has a value of CLOSED in the event cache to the /tmp/cache file.
    rule: print_cache: (
          event: _event of_class _class,
          reception_action: (
                   print_cache('/tmp/cache',
                   event: _cached_event of_class _cached_class where 
                   [status: equals 'CLOSED'])
          )
    ).

See Also

print_class_tree

Formats and writes an event class hierarchy tree from the active rule base to a file.

Synopsis

print_class_tree(_file, _class)

Description

This predicate formats and writes an event class hierarchy tree from the active rule base to a file, starting from a specified class as the root and continuing down to the leaf classes. It also prints the maximum depth and the total width of the tree, which is a representation of the general size of the tree.

Arguments

_class
The name of the event class to start from in the class hierarchy.
_file
The path and file name to which the event class tree is written.

Examples

The following example formats and writes the entire event class tree of the active rule base (because the starting point is the base event class EVENT) to the /tmp/class_tree file.
print_class_tree('/tmp/class_tree', 'EVENT')

See Also

print_event_activity

Writes the event activity report defined with the init_event_activity predicate.

Synopsis

print_event_activity

Description

This predicate writes the event activity report using the criteria set in the init_event_activity predicate. It is usually run from within a timer rule.

The frequency of when to write the report is controlled with the _duration argument of the set_timer predicate.

Arguments

None.

Examples

  1. The initial timer for an event activity report must be started by a TEC_Tick event so it can run indefinitely. The following example shows how to do this:
    rule: configure_event_activity: (
    
      event: _event of_class 'TEC_Tick'
        where [msg: _msg equals 'Event Activity Report',
               duration: _rep_freq],
    
      reception_action: start_timer: (
        set_timer(_event,_rep_freq,_msg),
        commit_rule
      )
    ).
  2. The following example shows a use of the print_activity_report predicate. An example report is shown in init_event_activity.
    timer_rule: print_and_reset_event_activity: (
    
      event: _event of_class _class
        where [ ],
    
      timer_info: equals 'Event Activity Report',
      timer_duration: _rep_freq,
    
      action: print_and_reset_event_activity: ( 
        print_event_activity,
        reset_event_activity,     
        set_timer(_event,_rep_freq,
                  'Event Activity Report')
      )
    ). 

See Also

re_after_match

Searches for a match in a string using a named regular expression and returns the substring located after the match as a result.

Synopsis

re_after_match(_name, _string, _result)

Description

This predicate searches for a match in _string using a named regular expression defined with the re_create predicate. The predicate succeeds if a match is found. The substring after the match is returned in _result.

Refer to Perl documentation for information about regular expression syntax and usage.

Arguments

_name
The name of a regular expression defined with the re_create predicate.
_result
The substring located after the match.
_string
The string to search for a match.

Examples

The following example shows how to use the predicate:
re_create(test,'a.*i')
% Create regular expression test.

re_after_match(test,'chair',_result)
% Search 'chair' using regular expression test. 
% Return the substring after the match in _result.
% Succeeds, 'r' returned in _result.

See Also

re_before_match

Searches for a match in a string using a named regular expression and returns the substring located before the match as a result.

Synopsis

re_before_match(_name, _string, _result)

Description

This predicate searches for a match in _string using a named regular expression defined with the re_create predicate. The predicate succeeds if a match is found. The substring before the match is returned in _result.

Refer to Perl documentation for information about regular expression syntax and usage.

Arguments

_name
The name of a regular expression defined with the re_create predicate.
_result
The substring located before the match.
_string
The string to search for a match.

Examples

The following example shows how to use the predicate:
re_create(test,'a.*r')
% Create regular expression test.

re_before_match(test,'chair',_result)
% Search 'chair' using regular expression test. 
% Return the substring before the match in _result.
% Succeeds, 'ch' returned in _result.

See Also

re_create

Defines a regular expression for use with other regular expression predicates.

Synopsis

re_create(_name, _pattern)

Description

This predicate defines a named regular expression that can be referenced by other regular expression predicates. The predicate fails if the regular expression is not valid. It should be run in a rule triggered by a TEC_Start event at event server start-up time. This loads the predicate once, instead of every time it is needed.

Refer to Perl documentation for information about regular expression syntax and usage.

Arguments

_name
The name that uniquely identifies the regular expression.
_pattern
The regular expression.

Examples

The following example shows how to define a regular expression and reference it from another regular expression predicate:
re_create(test,'h.*i')
% Create regular expression test.

re_search_string(test,'chair')
% Compare 'chair' to regular expression test. 
% Succeeds, matches 'hai'.

See Also

None.

re_mark_as_modified

Updates information for an event in the event database.

Synopsis

re_mark_as_modified(_event, _)

Description

This predicate is typically run after using the bo_set_slotval predicate to update event consoles and event database with the latest attribute values for an event.

Arguments

_
Uninstantiated variable used internally by Prolog. Also referred to as the anonymous variable.
_event
A pointer to the event to update. This should be the same event pointed to by the _event argument in bo_set_slotval.

Examples

The following example shows how to update the data for the event pointed to by _oldevent:
re_mark_as_modified(_oldevent, _)

See Also

re_match

Searches for a match in a string using a named regular expression and returns a result.

Synopsis

re_match(_name, _string, _index, _result)

Description

This predicate searches for a match in _string using a named regular expression defined with the re_create predicate. The predicate succeeds if a match is found. The matched substring is returned in _result. The _index argument is used to specify which part of the matched substring to return.

Refer to Perl documentation for information about regular expression syntax and usage.

Arguments

_index
The part of the match to return in _result. A value of 0 returns the entire matching substring, a value of 1 indexes into the matched substring one position and returns the result, a value of 2 indexes into the substring two positions and returns the result, and so forth.
_name
The name of a regular expression defined with the re_create predicate.
_result
The matched substring, subject to the _index specification.
_string
The string to search for a match.

Examples

The following example shows how to use the predicate:
re_create(test,'a.*r')
% Create regular expression test.

re_match(test,'chair',0,_result)
% Search 'chair' using regular expression test. 
% Return the entire result in _result.
% Succeeds, 'air' returned in _result.

See Also

re_search_string

Searches for a match in a string using a named regular expression.

Synopsis

re_search_string(_name, _string)

Description

This predicate searches for a match in _string using a named regular expression defined with the re_create predicate. The predicate succeeds if a match is found.

Refer to Perl documentation for information about regular expression syntax and usage.

Arguments

_name
The name of a regular expression defined with the re_create predicate.
_string
The string to search for a match.

Examples

The following example shows how to use the predicate:
re_create(test,'h.*i')
% Create regular expression test.

re_search_string(test,'chair')
% Search 'chair' using regular expression test. 
% Succeeds, matches 'hai'.

See Also

re_send_event_conf

Sends an event to a remote event server.

Synopsis

re_send_event_conf(_conf_file, _event)

Description

This predicate sends an event to a remote event server defined in a configuration file. The configuration file must be located in the TEC_RULES subdirectory of the loaded rule base. The predicate references the configuration file by file name only, leaving off the .conf file name extension; for example, with a configuration file named host.conf, specify the value of host for the _conf_file argument. Each time a new configuration file is referenced by the predicate, its name is added to an internal configuration file table and its file handle is kept open. A maximum of 50 concurrent different configuration files are supported.

This predicate supports both connection and connectionless modes of operations. If events are to be forwarded to a remote event server frequently, ensure that the configuration file specifies connection_oriented for the ConnectionMode option. This communication prevents the event server from having to establish a communications channel each time an event is forwarded.

In addition, to ensure that the events intended for a remote event server are separated from events intended for another remote event server, the configuration file must specify the location and name of the cache file for events destined to a remote event server; for example, BufEvtPath=/etc/tivoli/orange.cache.

The following figure shows an example of a configuration file for use with the re_send_event_conf predicate. Configuration file options are described in the IBM Tivoli Enterprise Console Adapters Guide.

 

ServerLocation=orange.tivoli.com
TestMode=no
BufEvtPath=/etc/Tivoli/orange.cache
# ConnectionMode=connection_oriented

 

Note:
If the configuration file is used in a rule base target, it must be distributed with the rule base target. This can be done by using the -imptgtdata option of the wrb command. See the IBM Tivoli Enterprise Console Command and Task Reference IBM Tivoli Enterprise Console Command and Task Reference for complete details about the wrb command.

Arguments

_conf_file
The configuration file that defines remote event server information.
_event
The event to be sent.

Examples

The following example sends the event under analysis to the remote event server specified in the configuration file host.conf:
re_send_event_conf('host', _event)

See Also

re_split_event_id

Parses an element of the server_path event attribute.

Synopsis

re_split_event_id(_path_element, _host, _server_handle, _date_reception, _event_handle)

Description

This predicate receives a list element from the server_path attribute as input, parses the element, and unifies (assigns) the parsed values with variables provided as arguments to the predicate. The server_path attribute is a list of elements that provides information about each event server that an event has passed through. Each element contains the information about one event server. The information is for each element is in the format of an event ID, which is described in the section, Event cache. See the IBM Tivoli Enterprise Console Adapters Guide for additional information about the server_path attribute.

Arguments

_date_reception
The reception date of the event at the server is unified with this variable. This value was obtained from the date_reception attribute when the event was received at the server.
_event_handle
The handle of the event at the server is unified with this variable. This value was obtained from the event_handle attribute when the event was received at the server.
_host
The hostname of the server is unified with this variable. The value was obtained from the tec_rule_host configuration setting in the $BINDIR/TME/TEC/.tec_config file of the host where the server resides.
_path_element
The list element of the server_path attribute to parse. Each element represents a server that the event has passed through.
_server_handle
The handle of the server is unified with this variable. This value was obtained from the tec_server_handle configuration setting in the $BINDIR/TME/TEC/.tec_config file of the host where the server resides.

Examples

The following example iterates through each element in the server_path attribute and parses it:
bo_get_slotval(_event,server_path,_server_path),
% Get the list for the server_path attribute.

member(_item,_server_path),
% Get an element of the list.
% Because _item is free, the list will be traversed
% and each element will be returned in succession.

  re_split_event_id(_item,_host,_server_handle,
                    _date_reception,_event_handle)
  % Parse each element into variables.

See Also

re_substitute

Searches for a match in a string using a named regular expression, replaces the match, and returns the new string as a result.

Synopsis

re_substitute(_name, _string, _substitute, _result)

Description

This predicate searches for a match in _string using a named regular expression defined with the re_create predicate. The predicate succeeds if a match is found. The value in _substitute replaces the match and the new string is returned in _result.

Refer to Perl documentation for information about regular expression syntax and usage.

Arguments

_name
The name of a regular expression defined with the re_create predicate.
_result
The new string after substitution has been done.
_string
The string to search for a match.
_substitute
The value to replace the match with.

Examples

The following example shows how to use the predicate:
re_create(test,'a.*w')
% Create regular expression test.

re_substitute(test,'hawk','oo',_result)
% Search 'hawk' using regular expression test. 
% Return the new string in _result.
% Succeeds, 'hook' returned in _result.

See Also

re_substitute_global

Searches for all matches in a string using a named regular expression, replaces them, and returns the new string as a result.

Synopsis

re_substitute_global(_name, _string, _substitute, _result)

Description

This predicate searches for all matches in _string using a named regular expression defined with the re_create predicate. The predicate succeeds if at least one match is found. The value in _substitute replaces all occurrences of a match and the new string is returned in _result.

Refer to Perl documentation for information about regular expression syntax and usage.

Arguments

_name
The name of a regular expression defined with the re_create predicate.
_result
The new string after substitution has been done.
_string
The string to search for a match.
_substitute
The value to replace matches with.

Examples

The following example shows how to use the predicate:
re_create(test,'a.*w')
% Create regular expression test.

re_substitute_global(test,'hawkhawkhawk','oo',_result)
% Search 'hawk' using regular expression test. 
% Return the new string in _result.
% Succeeds, 'hookhookhook' returned in _result.

See Also

redo_analysis

Requests a reanalysis for an event.

Synopsis

redo_analysis(_event)

Description

When correlating events, it might be necessary to re-evaluate the analysis of a previously received event. This predicate requests that the rule engine redo the analysis of regular rules for the specified event.
Note:
It is possible to make the rule engine loop by using the redo_analysis predicate. For example if in the analysis of event A you ask to redo analysis of event B and vice-versa, the rule engine might enter an infinite loop.

Arguments

_event
Specifies the event for which the analysis is to be redone.

Examples

The following example places a redo request on previously received INSTALLATION_FAILED events that might have been caused by the disk being full:
rule: disk_full_check_install_failed: (
  description: 'look for installationfailed events for this
                host',

  event: _event of_class 'DISK_FULL' 
    where [status: equals 'OPEN',
           hostname: _hostname ],

  action: (
    all_instances(event: _install_ev
                  of_class 'INSTALLATION_FAILED'

                    where [target_host: equals _hostname],
                           _event -600 -600 ),
    redo_analysis(_install_ev)
  )
).

See Also

None.

remove_bslashes

Converts back slashes to forward slashes in directory paths. In order to prevent backslashes from being interpreted as escape sequences, backslashes used as path separators must be specified as double backslashes.

Synopsis

remove_bslashes(_path1, _path2)

Description

This predicate converts the back slashes in the _path1 argument to forward slashes and unifies the new path with the _path2 argument.

Arguments

_path1
The directory path to convert.
_path2
The converted directory path.

Examples

The following example converts back slashes in a directory path to forward slashes:
% Assign value.
_path="\\tivoli\\data\\repository',

% Convert back slashes.
% _new_path is unified with /tivoli/data/repository.
remove_bslashes(_path,_new_path)

See Also

None.

reset_event_activity

Resets the counts for all event reporting criteria to 0.

Synopsis

reset_event_activity

Description

This predicate resets the counts for all event reporting criteria to 0. It is usually run from within a timer rule following the print_event_activity predicate.

Arguments

None.

Examples

The following example shows a use of the predicate:
timer_rule: reset_event_activity: (

  event: _event of_class _class
    where [ ],

  timer_info: equals 'Event Activity Report',
  timer_duration: _rep_freq,

  action: reset_activity: ( 
    print_event_activity,
    reset_event_activity,     
    set_timer(_event,_rep_freq,'Event Activity Report')
  )
). 

See Also

reset_global_grp

Resets the value of all global variables in a group.

Synopsis

reset_global_grp(_group,_value)

Description

This predicate changes the value of all global variables in a group in the knowledge base. The predicate loops through the variables in the group and for each variable found, changes its value to _value.

Arguments

_group
The group key for the variables.
_value
The new value for the variables.

Examples

The following example resets all of the global variables in the Maintenance group to off.
reset_global_grp('Maintenance', 'off'),

See Also

resolve_time

Retrieves the attributes of a time structure.

Synopsis

resolve_time(_time_structure, _seconds, _minutes, _hours, _day_of_month, _month, _year, _day_of_week, _day_of_year, _daylight_saving)

Description

This predicate retrieves attributes from a time structure represented by the _time_structure argument, and instantiates the remaining arguments to those attributes. _time_structure must be instantiated before calling resolve_time. The other arguments must be free. The values are in Greenwich mean time (GMT).

Arguments

_day_of_month
Instantiated to an integer in the range 1-31.
_day_of_week
Instantiated to an integer in the range 0-6.
_day_of_year
Instantiated to an integer in the range 0-364.
_daylight_saving
Instantiated to an integer as reflected by the DST_ macros in <sys/time.h>.

You can use this number to determine the type of daylight savings time style used on the current system. This could be useful if you need to manipulate time values returned by this predicate. The following example shows the values from a Solaris system:

#define  DST_NONE    0  /* not on dst */
#define  DST_USA     1  /* USA style dst */
#define  DST_AUST    2  /* Australian style dst */
#define  DST_WET     3  /* Western European dst */
#define  DST_MET     4  /* Middle European dst */
#define  DST_EET     5  /* Eastern European dst */
#define  DST_CAN     6  /* Canada */
#define  DST_GB      7  /* Great Britain and Eire */
#define  DST_RUM     8  /* Rumania */
#define  DST_TUR     9  /* Turkey */
#define  DST_AUSTALT 10 /* Australian style with
                           shift in 1986 */
_hours
Instantiated to an integer in the range 0-23.
_minutes
Instantiated to an integer in the range 0-59.
_month
Instantiated to an integer in the range 0-11.
_seconds
Instantiated to an integer in the range 0-59.
_time_structure
Represents a time structure. Do not confuse it with the data returned by the get_time predicate, in which the value for the _time_epoch argument is a number representing how many seconds have passed since an epoch.
_year
Instantiated to an integer in the range 00-99.

Examples

The following example shows how to get the structure for the current local system time, retrieve the attributes of the local system time structure, and update the month attribute of the event with the value of the _month argument.
get_local_time(_time_local_struct),
resolve_time(_time_local_struct, _seconds, _minutes,
             _hours,
             _day_of_month, _month, _year, _day_of_week,
             _day_of_year, _daylight_savings),
bo_set_slotval(_event, month, _month)

See Also

save_globals

Writes all global variables from a group to a file.

Synopsis

save_globals(_file, _group)

Description

This predicate writes all the global variables from a group to a file.

Arguments

_file
The path and file name to write the variables.
_group
The group key whose variables to write.

Examples

The following example shows how to write the global variables in the Maintenance group to a file:
save_globals('/tmp/globalvars.txt', 'Maintenance')

See Also

search_cache

Performs a query of the event cache based on a named search defined with the create_cache_search_criteria predicate.

Synopsis

search_cache(search_name, _referenceEvent, _maxEvents, _foundEvent)

--OR--

search_cache(search_name, _referenceEvent, _timeBefore, _timeAfter, _maxEvents, _foundEvent)

Description

This predicate performs a query of the event cache. It is used in conjunction with the create_cache_search_criteria predicate, which defines a named search.

The second form of the predicate lets you specify a time window with the _timeBefore and _timeAfter arguments.

Succeeds once for each event that satisfies the search criteria.

You can use the get_attributes predicate to get the values for each found event's attributes.

Arguments

_foundEvent
A pointer to a matching event.
_maxEvents
The maximum number of events to return that meet the search criteria.
_referenceEvent
A pointer to the reference event, typically the event under analysis.
search_name
The name of the search criteria to use in the query. This name and its associated search criteria are defined with the create_cache_search_criteria predicate.
-timeAfter
The number of seconds after the reference event.
-timeBefore
The number of seconds before the reference event.

Examples

The following example uses the search named db_critical_search to find matching events within a 20 minute time window of the reference event. The search has been instructed to return no more than five matching events.

 

search_cache('db_critical_search',
             _refevent,
             600,
             600,
             5,
              _found_event)

See Also

set_detailed_debugging

Writes rule trace information to the rule trace file for predicates.

Synopsis

set_detailed_debugging(on)

--OR--

set_detailed_debugging(off)

Description

This predicate toggles the writing of rule trace information for predicates that have rule tracing enabled with the trace_it predicate. You can run the set_detailed_debugging predicate from any rule.

Arguments

off
Specifies to not write rule trace information.
on
Specifies to write rule trace information.

Examples

The following example shows a user-defined predicate named my_predicate that receives an argument (_data) that is actually passed in as the argument to a user-defined predicate named check_data. The check_data predicate is the predicate to be debugged. The logic is as follows:
  1. Enable writing of the rule trace information to a file with the set_detailed_debugging predicate.
  2. Define a source location named my_predicate for a point of reference from an error message. This is done with the set_log_error_source predicate.
  3. Enable rule tracing of the check_data predicate with the trace_it predicate.
  4. If the check_data predicate fails, the log_error predicate is run and the Bad Data message along with the my_predicate source identifier are written to the error file. The tell_err predicate had to be run previously to define the location and name of the error file.

    Additionally when the check_data predicate fails, a TEC_Error event is sent to the event server with a severity of CRITICAL, the Bad Data message, and the my_predicate source identifier. The message and the source identifier are assigned to the msg attribute of the event.

Regardless of whether the predicate succeeds or fails, trace information is written to the rule trace file.

set_detailed_debugging(on),

my_predicate(_data):-
  set_log_error_source(my_predicate),

  (
    trace_it(check_data), 
    process_data(_data)

  ;
    log_error('Bad Data %s',[_data],'CRITICAL')
  )

See Also

None.

set_event_administrator

Sets the administrator for an event.

Synopsis

set_event_administrator(_event, new_administrator)

Description

This predicate sets the value for the administrator attribute for the specified event.
Note:
This predicate directly modifies the value of the attribute without issuing an internal change request that goes through the change rules. To trigger change rules, call the place_change_request predicate following the set_event_administrator call.

Arguments

_event
A pointer to the event for which the administrator is to be set.
new_administrator
Specifies the new administrator for the event.

Examples

The following example shows predicate usage:
set_event_administrator(_event, bjones)

See Also

set_event_message

Sets the msg attribute of an event.

Synopsis

set_event_message(_event, _format, [_value])

Description

The format specification is similar to that for the sprintf() function in the C programming language.
Note:
This predicate directly modifies the value of the attribute without issuing an internal change request that goes through the change rules. To trigger change rules, call the place_change_request predicate following the set_event_message call.

Arguments

_event
A pointer to the event containing the msg attribute to assign the value.
_format
The format specification for the msg attribute value. The following conversion specifications are valid:
%c
Character.
%d
Integer printed in decimal notation.
%e
Real printed in exponential notation.
%f
Real printed in decimal notation.
%g
Real printed in its shortest form (decimal or exponential notation).
%o
Integer printed in octal notation, without sign and leading zero.
%s
String.
%u
Integer printed in unsigned decimal notation.
%x
Integer printed in hexadecimal notation, without sign and leading 0x.

You can specify more detailed conversion specifications between the % sign and the conversion character, as follows:

-
Left adjustment.
0
Zero padding to the left.
n
In cases of an integer or a string, n is the minimum length of the field.
n.m
In cases of a real, n is the minimum length of the field and m indicates the number of digits after the decimal point.
_value
The text to be formatted for the msg attribute. This argument is in list format.

Examples

The following example shows various uses of the predicate:
_integer is 123,
_real is 12.3,
_string = 'Hello, World',
% Assign values.

set_event_message(_event, '%s', [_string]),
% msg attribute assigned 'Hello, World'.

set_event_message(_event, '%20s', [_string]),
% msg attribute assigned '        Hello, World'.

set_event_message(_event, '%-20s', [_string]),
% msg attribute assigned 'Hello, World        '.


set_event_message(_event, 'Integer in decimal notation:
 %d', [_integer]),
% msg attribute assigned 'Integer in decimal
% notation: 123'.

set_event_message(_event, 'Integer in decimal notation
 with field width: %10d', [_integer]),
% msg attribute assigned 'Integer in decimal
% notation with field width:        123'

set_event_message(_event, 'Integer in decimal notation
with leading zeros: %010d', [_integer]),
% msg attribute assigned 'Integer in decimal
% notation with leading zeros: 0000000123'.

set_event_message(_event, 'Integer in octal notation:
%o', [_integer]),
% msg attribute assigned 'Integer in octal
% notation: 173'.

set_event_message(_event, 'Integer in hexadecimal notation:
%x', [_integer]),
% msg attribute assigned 'Integer in hexadecimal
% notation: 7b'

set_event_message(_event, 'Real in decimal notation:
%f', [_real]),
% msg attribute assigned 'Real in decimal
% notation: 12.300000'.

set_event_message(_event, 'Real in decimal notation with
field width: %3.2f', [_real]),
% msg attribute assigned 'Real in decimal
% notation with field width: 12.30'.

set_event_message(_event, 'Real in real notation:
%f', [_real]),
% msg attribute assigned 'Real in real notation:
% 12.300000'.

set_event_message(_event, 'Real in exponential notation:
%e', [_real]),
% msg attribute assigned 'Real in exponential
% notation: 1.230000e+01'.

set_event_message(_event, 'Real in its shortest form:
%g', [_real])
% msg attribute assigned 'Real in its shortest form: 12.3'.

See Also

set_event_severity

Sets the severity of an event.

Synopsis

set_event_severity(_event, new_severity)

Description

This predicate sets the severity of the specified event.
Note:
This predicate directly modifies the value of the attribute without issuing an internal change request that goes through the change rules. To trigger change rules, call the place_change_request predicate following the set_event_severity call.

Arguments

_event
A pointer to the event for which the severity is to be set.
new_severity
The new event severity.

Examples

The following example shows predicate usage:
set_event_severity(_event, 'CRITICAL')

See Also

set_event_status

Sets the status of an event.

Synopsis

set_event_status(_event, new_status)

Description

This predicate sets the status attribute of the specified event.
Notes:
  1. This predicate directly modifies the value of the applicable event attribute without issuing an internal change request that goes through the change rules. To trigger change rules, call the place_change_request predicate following the set_event_status call.
  2. If the status attribute was set to a value of CLOSED by this predicate, the duration attribute is not modified to provide the age. The value remains at 0. See duration for additional information.

Arguments

_event
A pointer to the event for which the status is to be set.
new_status
The new value to assign the status attribute. Valid values for the status attribute are described on page ***.
Notes:
  1. You cannot change a status of CLOSED.
  2. A change from ACK to OPEN status is not valid for the new_status argument.

Examples

The following example shows how to set the status attribute of the event under analysis to ACK:
set_event_status(_event, 'ACK')

See Also

set_global_var

Sets the value of a global variable.

Synopsis

set_global_var(_group, _key,_value)

Description

This predicate sets the value of one global variable in the knowledge base. To set a variable to a list, use the [ ] notation.

Arguments

_group
The group key for the variable.
_key
The key for the variable.
_value
The value to set.

Examples

The following example shows various uses of the predicate:
set_global_var('My group key', _key, 'My value')
set_global_var('My group key', _key, ['a', 'b', 'c'])
set_global_var('Maintenance', _origin, 'on')

See Also

set_log_error_source

Defines a source identifier for a point of reference from an error message generated by the log_error predicate.

Synopsis

set_log_error_source('source_location')

Description

This predicate defines a source location within a rule action so you have a point of reference from a generated error message to help you debug the rule. For the argument, specify the name of a rule action or other significant identifier (for example, a predicate name).

Arguments

source_location
A string identifying a meaningful location in a rule.

Examples

The following example shows a user-defined predicate named my_predicate that receives an argument (_data) that is actually passed in to be the argument of a user-defined predicate named check_data. The check_data predicate is the predicate to be debugged. The logic is as follows:
  1. Define a source location named my_predicate for a point of reference from an error message. This is done with the set_log_error_source predicate.
  2. Enable rule tracing of the check_data predicate with the trace_it predicate. To actually write the rule trace information to a file, the set_detailed_debugging predicate had to be run previously. The trace_it predicate just enables tracing.
  3. If the check_data predicate fails, the log_error predicate is run and the Bad Data message along with the my_predicate source identifier are written to the error file. The tell_err predicate had to be run previously to define the location and name of the error file.

    Additionally when the check_data predicate fails, a TEC_Error event is sent to the event server with a severity of CRITICAL, the Bad Data message, and the my_predicate source identifier. The message and the source identifier are assigned to the msg attribute of the event.

my_predicate(_data):-
  set_log_error_source(my_predicate),

  (
    trace_it(check_data), 
    process_data(_data)

  ;
    log_error('Bad Data %s',[_data],'CRITICAL')
  )

    
)

See Also

set_timer

Sets a timer on an event.

Synopsis

set_timer(_event, timer_duration, timer_info)

Description

This predicate sets a timer on a received event. When the timer expires, the rule engine finishes processing the current event and then triggers timer rules for the specified event. In essence, the expiration of a timer is a transaction requesting the execution of timer rules on a specified event.

The maximum number of active timers that can be placed on events with this predicate is 1000.

An event of class TEC_Tick always exists in the event cache; that is, it is never aged out of the event cache. You can search for this event in the cache and use it to start a timer, knowing that it will always be there.

Arguments

_event
A pointer to the event on which the timer is being set.
timer_duration
The duration (in seconds) of the timer. It can also be used for event filtering in a timer rule.
timer_info
The timer information. This argument can be anything, such as an integer, a string, or a structured item. It can be used for event filtering in a timer rule.

Examples

The first rule in the following example initially sets the timer for event activity report generation. The second rule is evaluated whenever the timer expires for event activity reporting. The action in the second rule prints the event activity report, resets counters for the next event activity report, and sets a new timer for generation of the next event activity report.
rule: (

  event: _event of_class 'TEC_Start'
    where [ ],

  reception_action: (
    first_instance(event:_ev of_class 'TEC_Tick' where []),
   set_timer(_event, 600, 'Event Activity Report')  )
).

 

timer_rule: reset_print_activity: (
  event: _event of_class _class
    where [ ],

    timer_info: equals 'Event Activity Report',
    timer_duration: _rep_freq,

  action: reset_print_activity: ( 
    print_event_activity,
    reset_event_activity,     
    set_timer(_event,_rep_freq,'Event Activity Report'
  )
).

See Also

None.

trace_it

Enables tracing of user-defined predicates.

Synopsis

trace_it(predicate_name)

Description

This predicate designates which user-defined predicates to trace.
Note:
Do not use this predicate to enable tracing for anything but predicates you have created.

Arguments

predicate_name
The name of the predicate to trace.

Examples

The following example shows a user-defined predicate named my_predicate that receives an argument (_data) that is actually passed in to be the argument of a user-defined predicate named check_data. The check_data predicate is the predicate to be debugged. The logic is as follows:
  1. Define a source location named my_predicate for a point of reference from an error message. This is done with the set_log_error_source predicate.
  2. Enable tracing of the check_data predicate with the trace_it predicate. To actually write the rule trace information to a file, the set_detailed_debugging predicate had to be run previously. The trace_it predicate just enables tracing.
  3. If the check_data predicate fails, the log_error predicate is run and the Bad Data message along with the my_predicate source identifier are written to the error file. The tell_err predicate had to be run previously to define the location and name of the error file.

    Additionally when the check_data predicate fails, a TEC_Error event is sent to the event server with a severity of CRITICAL, the Bad Data message, and the my_predicate source identifier. The message and the source identifier are assigned to the msg attribute of the event.

my_predicate(_data):-
  set_log_error_source(my_predicate),

  (
    trace_it(check_data), 
    process_data(_data)

  ;
    log_error('Bad Data %s',[_data],'CRITICAL')
  )

    
)

See Also

unlink_from_cause

Unlinks an effect event from a cause event.

Synopsis

unlink_from_cause(_effect_event)

Description

This predicate updates the cause_date_reception and cause_event_handle attributes of the effect event to a value of 0, breaking the link between the two events.

Arguments

_effect_event
The event to be unlinked.

Examples

The following example shows predicate usage:
unlink_from_cause(_oserv_down_event)

See Also

update_event_activity

Captures event information for reporting by the print_event_activity_report predicate.

Synopsis

update_event_activity(_event)

Description

This predicate captures information from an event and stores it internally for later reporting. It is typically called in a rule that runs on every event class.

Arguments

_event
A pointer to the reference event, which is typically the event under analysis.

Examples

The following example shows a use of the predicate:
rule: update_event_activity: (

  event: _event of_class _class
    where [ ],

  reception_action: update_activity: ( 
    % recorded(event_activity,active),
    % Line above used with im.rls (intermediate mgr. rules)

    update_event_activity(_event)
  )
). 

See Also


  • -s to split a line after the last space at a position that is less than or equal to the specified width.

  • -w width to specify the line width.

Examples Let us assume that we have file1 containing one line of 129 characters which is shown below:

View Code

If you want to split the line at byte position 40, use the following command:

fold -w 40  file1 > file2; more file2
The fold command can be used on files wh
ich have line lengths more than 80 bytes
, it breaks the line into multiple 80 by
te lines

In the above example, the split happens in the middle of words. If you do not want to split words, use the -s flag as in the following command:

fold -w 40 -s  file1 > file2; more file2
The fold command can be used on files
which have line lengths more than 80
bytes, it breaks the line into multiple
80 byte lines

join

The join command can be used to merge two files (one can be standard input) to create a third file (can be standard output). Each line in the file is merged on the basis of a field that has the same value in both input files to create one line in the output file. The fields in each file are separated by either a space or tab character.

Following is a list of flags that can be used with the join command:

  • -1 field or -j1 field to specify that the join should be made on the basis of the field in the first file.
  • -2 field or -j2 field to specify that the join should be made on the basis of the field in the second file.

  • -e string to specify that blank fields in the output file be replaced by the specified string.

  • -o fileid.fieldnumber to specify that the output should consist of the specified fields. You can specify multiple fields separating them by commas.

  • -t character to modify the field separator character from the default value of space.

  • -a fileid to generate output line for each line in the file specified by fileid parameter for lines that cannot be matched to the lines in the other file using join field. The output lines are produced in addition to the default output.

  • -v fileid to generate output line for each line in the file specified by fileid parameter for lines that cannot be matched to the lines in the other file using join field. The default output is not produced.

Examples Let us assume we have two files, file1 and file2, whose contents are shown as follows:

more file1
computer1 16MB 1.2GB 17inch CDROM
computer2 8MB 840MB 14inch
computer3 12MB 1.6GB 17inch
computer4 4MB 270MB 14inch

more file2
computer1 1stfloor office5
computer3 2ndfloor office9A
computer4 1stfloor office2
computer5 3rdfloor office1

If you want to join the two files and display only the matching lines, execute the following command:

join file1 file2
computer1 16MB 1.2GB 17inch CDROM 1stfloor office5
computer3 12MB 1.6GB 17inch 2ndfloor office9A
computer4 4MB 270MB 14inch CDROM 1stfloor office2

If you want to join the two files and display the matching lines as well as the nonmatching lines from the specified file, use the -a flag in the following command:

join -a1 file1 file2
computer1 16MB 1.2GB 17inch CDROM 1stfloor office5
computer2 8MB 840MB 14inch
computer3 12MB 1.6GB 17inch 2ndfloor office9A
computer4 4MB 270MB 14inch CDROM 1stfloor office2

The above example displays the line with computer2 from file1 because it does not have a matching line in file2. If you want to display only the lines that do not match lines from the specified file, use the -v flag in the following command:

join -v2 file1 file2
computer5 3rdfloor office1

The above example displays the line with computer5 from file2 because it does not have a matching line in file1.

If you want to display only certain fields from the input files to the output file, use the -o flag as in the following command:

join -o 1.1 2.2 2.3 1.5 file1 file2
computer1 1stfloor office5 CDROM
computer3 2ndfloor office9A
computer4 1stfloor office2 CDROM

In the above example, the line with computer3 is displayed with one field short because that field is not present in the input file. You can insert a fixed legend in the empty field in the output by using the -e flag in the following command:

join -o 1.1 2.2 2.3 1.5 -e"NO CDROM" file1 file2
computer1 1stfloor office5 CDROM
computer3 2ndfloor office9A NO CDROM
computer4 1stfloor office2 NO CDROM

paste

The paste command can be used to paste lines from one or more files (one of them can be a standard input) to the standard output, which can be redirected to a file. The paste command concatenates the line from each input file to the output file separating them by the tab character (default).

Following is a list of flags that can be used with the paste command:

  • -dlist to specify characters that will be used to separate corresponding lines from the input files in the output file. You can specify multiple characters if you have multiple input files.
  • -s to merge subsequent lines from input file for each input file, one at a time, separated by the specified delimiter character.

Examples Let us assume that we have two files, file1 and file2, whose contents are shown below:

more file1
computer1 16MB 1.2GB 17inch CDROM
computer2 8MB 840MB 14inch
computer3 12MB 1.6GB 17inch
computer4 4MB 270MB 14inch

more file2
computer1 1stfloor office5
computer3 2ndfloor office9A
computer4 1stfloor office2
computer5 3rdfloor office1

If you want to merge file1 and file2, use the following command:

paste file1 file2
computer1 16MB 1.2GB 17inch CDROM       computer1 1stfloor office5
computer2 8MB 840MB 14inch      computer3 2ndfloor office9A
computer3 12MB 1.6GB 17inch     computer4 1stfloor office2
computer4 4MB 270MB 14inch      computer5 3rdfloor office1

The lines from file1 and file2 are separated by tab characters.

If you want to modify the default separator from the tab character to, say, / (slash), use the -d flag in the following command:

paste -d"/" file1 file2
computer1 16MB 1.2GB 17inch CDROM/computer1 1stfloor office5
computer2 8MB 840MB 14inch/computer3 2ndfloor office9A
computer3 12MB 1.6GB 17inch/computer4 1stfloor office2
computer4 4MB 270MB 14inch /computer5 3rdfloor office1

If you want to merge the lines from within each input file, use the -s flag in the following command:

View Code

sort

The sort command is used to sort one or more files in the specified order by the specified key. It can also be used to merge files that have already been sorted. When more than one file is used, the sort command concatenates these files before sorting according to specifications.

Following is a list of some of the flags that can be used with the sort command:

  • -kkey to specify the key on which to sort. The specification for the key includes the starting field and column position and end field and column position.
  • -A to specify that sorting be done according to ASCII collating sequence.

  • -c to check whether the specified files are sorted according to the specified key and order.

  • -d to sort according to dictionary order.

  • -f to change all letters to uppercase before the sort.

  • -i to ignore nondisplayable characters for comparison.

  • -m to merge pre-sorted input files.

  • -n to sort according to numeric value.

  • -ofile to redirect the output to the specified file instead of standard output.

  • -r to sort the output in the reverse order of the specified order.

  • -u to create only one line in the output for lines that sort identically.

Examples Let us assume that we have a file called file1 whose contents are shown below:

more file1
disk drive
memory
video memory
monitor
[tape drive]
CD-ROM
3.5inch diskette
modem
monitor
sound blaster

If you want to sort file1, use the following command:

sort file1
3.5inch diskette
CD-ROM
[tape drive]
disk drive
memory
modem
monitor
monitor
sound blaster
video memory

If you want to sort in the reverse order, use the -r flag in the following command:

sort -r file1
video memory
sound blaster
monitor
monitor
modem
memory
disk drive
[tape drive]
CD-ROM
3.5inch diskette

If you want to sort according to alphabetic order, use the -d flag in the following command:

sort -d file1
3.5inch diskette
CD-ROM
disk drive
memory
modem
monitor
monitor
sound blaster
[tape drive]
video memory

In the above example, the line [tape drive] is sorted as tape drive because the [ and ] are ignored due to the -d flag.

If you want only one line to be retained in case more than one line are sorted equally, use the -u flag in the following command:

sort -u file1
3.5inch diskette
CD-ROM
[tape drive]
disk drive
memory
modem
monitor
sound blaster
video memory

In the above example, the line monitor appears only once, although there are two such entries in the file, due to use of the -d flag.

If you want to sort file1 according to the uppercase letter sort order, use the -f flag as in the following command:

sort -f file1
3.5inch diskette
CD-ROM
disk drive
memory
modem
monitor
monitor
sound blaster
video memory
[tape drive]

tr

You can use the tr command to translate or delete characters from standard input to generate standard output. Following is the detail of the main functions of the tr command:

  • Translate characters specified input to a new specified character in the output.
  • Delete specified characters input from the input to generate the output.

  • To delete all but the first occurrence of the specified characters.

Following is a list of some of the flags that can be used with the tr command:

  • -c to translate all but the specified characters by the specified new character.
  • -d to delete the specified characters.

  • -s to delete all but the first occurrence of the specified characters.

You can specify the input and output sequence of characters in certain special ways as follows:

  • [character1-character2] to specify a range of characters including character1 and character2.
  • [character*number] to specify number occurrences of character.

  • [character*] to specify the use of as many as are needed occurrences of character so that the input string of characters to be translated matches the output characters to be translated to.

  • [:characterlist:] to specify a list of characters as input or output string. The characterlist can be upper, lower, alpha, space, digit, and so on.

Examples Let us assume that we have file1 whose contents are shown as follows:

more file1
"this       is a test file
for tr command"
"it has 4 lines
but should be 1 line"

If you want to change the double quotes to spaces use the following command:

tr '\"' ' ' < file1
 this       is a test file
for tr command
 it has 4 lines
but should be 1 line

If you want to change all lowercase letters to uppercase letter, use the following command:

tr [:lower:] [:upper:] < file1
"THIS       IS A TEST FILE
FOR TR COMMAND"
"IT HAS 4 LINES
BUT SHOULD BE 1 LINE"

If you want to delete all the newline characters from this file, use the -d flag in the following command:

View Code

If you want to delete all but first occurrence of a space and replace the space by a - (hyphen) use the -s flag in the following command:

tr -s ' ' '-' < file1
"this-is-a-test-file-
for-tr-command"
"it-has-4-lines-
but-should-be-1-line"

uniq

The uniq command can be used to eliminate duplicate adjacent lines from a file or from standard input to generate standard output or another file. This is the default operation. However, it is possible to compare only part of a line for comparison by using certain flags.

Following is a list of some of the flags that can be used with the uniq command:

  • -c to precede each line with a number while displaying the output (the number specifies the number of occurrence of the line in the input file).
  • -d to display only the lines that occur multiple times adjacent to each other in the input file.
  • -u to display only the lines that appear only once in the input file.

  • -s numberofcharacter or +numberofcharacters to specify the number of characters from the start of a line that will be ignored while comparing adjacent lines.

  • -numberoffields or -f numberoffields to specify the number of fields from the start of a line that will be ignored while comparing adjacent lines.

Examples Let us assume that we have file1 whose contents are displayed below:

more file1
This is line 1
This is line 1
This is line 2
This is line 3
THIS IS line 3
This is line 4

If you want to find out unique lines in file1, use the following command:

uniq file1
This is line 1
This is line 2
This is line 3
THIS IS line 3
This is line 4

In the above example, the first line has been dropped because it is identical to the second line. If you want to display only the duplicate lines use the -d flag in the following command:

uniq -d file1
This is line 1

If you want to display the lines that appear only once in file1, use the -u flag in the following command:

uniq -u file1
This is line 2
This is line 3
THIS IS line 3
This is line 4

In the above example, the first two lines have not been displayed because they are identical. If you want to skip the first two fields while comparing adjacent lines, use the -f flag in the following command:

uniq -f 2 file1
This is line 1
This is line 2
This is line 3
This is line 4

sed

You can use the sed command to edit a file using a script. In the script, you can specify commands to edit one or more lines according to rules specified as part of one or more commands.

Following is a list of some of the flags that can be used with the sed command:

  • -e command to use the specified sed command to edit the file.
  • -f filename to use the filename as the editing script to edit the file.

  • -n to suppress messages from sed.

The sed command uses two different areas while performing editing:

  • pattern area to hold selected lines for editing.
  • hold area to temporarily hold the lines.

The sed sub-commands can affect either all of the lines or only the specified lines.

Following is a list of some of the sub-commands that can be used with the sed command:

  • # to specify start of comments. Everything in a line following the # is treated as comments.
  • :label to specify an addressable label that can be used in the script.

  • [/pattern/]= to write to output the line number of each line that contains the specified pattern.

  • [address]a\textstring to append textstring to each lines specified by the address.

  • [address1][,address2]c\textstring to replace the lines in the specified address range with the textstring.
  • [address1][,address2]d to delete the lines in the specified address range.

  • [address]i\textstring to insert textstring before each specified line.

  • [address1][,address2]p to print the lines in the specified address range.

  • [address1][,address2]n to specify that the current line be displayed and the next line be made the current line.

  • [address1][,address]N to specify that the current line be appended to the contents of the pattern area separated by a newline character.

  • [address]q to exit when the specified address is encountered.

  • [address1][,address2]s/old pattern/new pattern/[flag] to specify that change the old pattern by new pattern in the specified range. The behavior of the replacement can be modified by specified flags.

  • [address1][,address2]w file to write the contents of the specified range to the specified file.

  • [address1][,address2]y/old character list/new character list/ to modify each character in the old character list by the corresponding character in the new character list.

The above sub-commands are the ones that affect the pattern area used by the sed command. Now let us look at some of the sub-commands that affect the hold area:

  • [address1][,address2]g to copy the contents of the hold area to the pattern area which then become the new content of the pattern area.
  • [address1][,address2]G to append the contents of the hold area to the pattern area following the specified address.

  • [address1][,address2]h to copy the contents of the pattern area to the hold area which then become the new contents of the hold area.

  • [address1][,address2]H to append the contents of the pattern area to the hold area following the specified address.

  • [address1][,address2]x to exchange the contents of pattern and hold area.

Examples Let us assume that we have file1 whose contents are displayed below:

more file1
This file is a test file for sed command
----------------------------------------
The sed command is used for stream editing files
------------------------------------------------
The sed command a number of sub-commands which may be used to do the
--------------------------------------------------------------------
editing in specified line
-------------------------

If you want to print the line numbers of the line in which a specified pattern is found, use the following command:

sed -e "/sed/=" file1
1
This file is a test file for sed command
----------------------------------------
3
The sed command is used for stream editing files
------------------------------------------------
5
The sed command a number of sub-commands which may be used to do the
--------------------------------------------------------------------
editing in specified line
-------------------------

In the above example, the line numbers are displayed for the lines containing the pattern sed. If you want to add a specified text after each specified line, use the following command:

sed -f sfile file1
This file is a test file for sed command
+++++++++++++++++++++++++++++++++
----------------------------------------
The sed command is used for stream editing files
+++++++++++++++++++++++++++++++++
------------------------------------------------
The sed command a number of sub-commands which may be used to do the
+++++++++++++++++++++++++++++++++
--------------------------------------------------------------------
editing in specified line
-------------------------

where the file sfile contains the following line:

/sed/a+++++++++++++++++++++++++++++++++

In the above example, a string of +s (plus signs) is printed after each line containing the string sed. If you want to delete lines containing a specified string, use the following command:

sed -f sfile file1
This file is a test file for sed command
The sed command is used for stream editing files
The sed command a number of sub-commands which may be used to do the
editing in specified line

where sfile contains the following:

/---/d

In the above example, all lines that contain the string --- will be deleted. If you want to change all occurrences of a particular string by another one, use the following command:

sed -f sfile file1
This file is a test file for sed command
++++++++++++++++++++++++++++++++++++++++
The sed command is used for stream editing files
------------------------------------------------
The sed command a number of sub-commands which may be used to do the
--------------------------------------------------------------------
editing in specified line
-------------------------

where sfile contains the following:

1,3s/----/++++/g

In the above example, all occurrences of ---- are replaced by ++++ for lines 1 through 3. If you want to insert a specified string before each line containing a specified string, use the following command:

sed -f sfile file1
++++
This file is a test file for sed command
----------------------------------------
++++
The sed command is used for stream editing files
------------------------------------------------
++++
The sed command a number of sub-commands which may be used to do the
--------------------------------------------------------------------
editing in specified line
-------------------------

where sfile contains the following:

/sed/i++++

In the above example, a string ++++ is printed before each line in which the string sed appears. If you want to change each occurrence of a character by another, use the following command:

sed -f sfile file1
This file is A test file for sed commAnd
++++++++++++++++++++++++++++++++++++++++
The sed commAnd is used for streAm editing files
------------------------------------------------
The sed command a number of sub-commands which may be used to do the
--------------------------------------------------------------------
editing in specified line
-------------------------

where sfile contains the following:

1,3s/-/+/g 

In the above example, each occurrence of a - (hyphen) is modified to a + (plus) and each occurrence of a is modified to A between lines 1 through 3, both inclusive. If you want to delete all lines but the ones in which the specified pattern occurs, use the following command:

sed -f sfile file1
This file is a test file for sed command
The sed command is used for stream editing files
The sed command a number of sub-commands which may be used to do the

where sfile contains the following:

/sed/!d

In the above example, the ! (exclamation mark) is used to denote that all lines except those which contain the string sed are to be processed.

Miscellaneous Commands

In this section we will discuss some of the commands available to do miscellaneous operations in UNIX.

banner

You can use the banner command to print one or more characters in large size.

Example If you want to print the word banner in large size on the standard output, use the following command:

banner banner

 #####     ##    #    #  #    #  ######  #####
 #    #   #  #   ##   #  ##   #  #       #    #
 #####   #    #  # #  #  # #  #  #####   #    #
 #    #  ######  #  # #  #  # #  #       #####
     #  #    #  #   ##  #   ##  #       #   #
 #####   #    #  #    #  #    #  ######  #    #

bc

If you want to perform simple arithmetic expression in UNIX, use the bc command. By default, all the numbers are assumed to be decimal numbers, but you can perform operations on octal or hexadecimal numbers. You can also scale the decimal numbers. The bc command accepts input first from the specified file followed by standard input. You can, however, use input redirection to accept input only from a file.

The arguments that can be used with the bc commands are as follows:

  • Variable name (one letter)
  • Variable array name (letter[expression])
  • A literal like scale

Some of the other operands that can be used are as follows:

  • + for adding
  • - for subtracting
  • / for division
  • * for multiplication
  • % for percentage
  • ++ for adding one to the preceding variable
  • -- for subtracting one from the preceding variable
  • = to assign a value
  • sqrt for square root computation
  • length for getting length of a number
  • scale for specifying the number of digits after the decimal

You can also use C program-like statements, expression, and functions. There are some special arithmetic functions you can use in bc. Some of these functions are:

  • s(x) for sine of x
  • c(x) for cosine of x
  • l(x) for log of x

Following is a list of flags that can be used with the bc command:

  • -c to compile the bc program parameters but not execute them
  • -l to include the library of math functions

Examples Let us assume that we have file1, which contains the following bc command parameters:

more file1
b=5
c=10
a=b+c
a

If you want to compile the contents of file1 without executing them, use the -c flag in the following command:

bc -c < file1
 5sb
 10sc
lblc+sa
laps.
q

If you want to execute the contents of file1, use the following command:

bc < file1
15

Let us assume that we have file1 whose contents are displayed below:

a=0
j=50
for (i=1; i<=j; i++) a=i+a;
a

If we execute the bc command with this file as input, this will add all numbers from 1 through 50 and display the total as follows:

bc < file1
1275

cal

You can use the cal command to display the calendar for one or more months on standard output. If you do not specify any arguments, cal displays the calendar for the current month. You can specify the month and year for which you want to display the calendar. If you specify only one argument, cal will display a calendar for all 12 months of the specified year.

Examples

If you want to display the calendar of the current month, execute the following command:

cal
       December 1996
Sun Mon Tue Wed Thu Fri Sat
 1   2   3   4   5   6   7
 8   9  10  11  12  13  14
15  16  17  18  19  20  21
22  23  24  25  26  27  28
29  30  31

If you want to display the calendar for January, 1995, use the following command:

cal 1 1995
       January 1995
Sun Mon Tue Wed Thu Fri Sat
 1   2   3   4   5   6   7
 8   9  10  11  12  13  14
15  16  17  18  19  20  21
22  23  24  25  26  27  28
29  30  31

If you want to obtain calendars for all 12 months of 1997, use the following command:

cal 1997
                                1997

          January                     February
Sun Mon Tue Wed Thu Fri Sat  Sun Mon Tue Wed Thu Fri Sat
             1   2   3   4                            1
 5   6   7   8   9  10  11    2   3   4   5   6   7   8
12  13  14  15  16  17  18    9  10  11  12  13  14  15
19  20  21  22  23  24  25   16  17  18  19  20  21  22
26  27  28  29  30  31       23  24  25  26  27  28

           March                        April
Sun Mon Tue Wed Thu Fri Sat  Sun Mon Tue Wed Thu Fri Sat
                         1            1   2   3   4   5
 2   3   4   5   6   7   8    6   7   8   9  10  11  12
 9  10  11  12  13  14  15   13  14  15  16  17  18  19
16  17  18  19  20  21  22   20  21  22  23  24  25  26
23  24  25  26  27  28  29   27  28  29  30
30  31
            May                         June
Sun Mon Tue Wed Thu Fri Sat  Sun Mon Tue Wed Thu Fri Sat
                 1   2   3    1   2   3   4   5   6   7
 4   5   6   7   8   9  10    8   9  10  11  12  13  14
11  12  13  14  15  16  17   15  16  17  18  19  20  21
18  19  20  21  22  23  24   22  23  24  25  26  27  28
25  26  27  28  29  30  31   29  30

           July                        August
Sun Mon Tue Wed Thu Fri Sat  Sun Mon Tue Wed Thu Fri Sat
         1   2   3   4   5                        1   2
 6   7   8   9  10  11  12    3   4   5   6   7   8   9
13  14  15  16  17  18  19   10  11  12  13  14  15  16
20  21  22  23  24  25  26   17  18  19  20  21  22  23
27  28  29  30  31           24  25  26  27  28  29  30
                             31
         September                     October
Sun Mon Tue Wed Thu Fri Sat  Sun Mon Tue Wed Thu Fri Sat
     1   2   3   4   5   6                1   2   3   4
 7   8   9  10  11  12  13    5   6   7   8   9  10  11
14  15  16  17  18  19  20   12  13  14  15  16  17  18
21  22  23  24  25  26  27   19  20  21  22  23  24  25
28  29  30                   26  27  28  29  30  31

         November                     December
Sun Mon Tue Wed Thu Fri Sat  Sun Mon Tue Wed Thu Fri Sat
                         1        1   2   3   4   5   6
 2   3   4   5   6   7   8    7   8   9  10  11  12  13
 9  10  11  12  13  14  15   14  15  16  17  18  19  20
16  17  18  19  20  21  22   21  22  23  24  25  26  27
23  24  25  26  27  28  29   28  29  30  31
30

calendar

You can use the calendar command to get reminders from messages stored in a special file named calendar in the current directory. The messages are stored in the following format:

  • date message
  • message date

where date can be in a variety of formats such as:

  • March 7
  • Mar 7
  • mar 7
  • march 7th
  • 3/7
  • */7 (7th of each month)

On a Friday, the calendar command will display the messages for four days--Friday, Saturday, Sunday, and Monday.

clear

You can use the clear command to clear the screen of your workstation. This command checks the terminal type to find out the terminal type to determine how to clear the screen.

Examples To clear the screen on your terminal, use the following command:

clear

time

You can use the time command to obtain the execution time of a script, command, or program. The execution time is displayed with the following three times:

  • real
  • user
  • system

Examples If you want to find out the execution time of a script sample, use the following command:

time sample
real    0m6.49s
user    0m0.02s
sys     0m0.03s

xargs

You can use the xargs command to group multiple arguments and input them to a command. xargs passes as many arguments to the command as necessary to ensure that the maximum size limit for command line arguments is not exceeded.

Following is a list of some of the flags that can be used with the xargs command:

-eendoffilecharacter to specify the character to be used to terminate the argument string. The default is _ (underline) character

-istring to use each line as a single parameter in place of the string variable specified as part of the command line. The default string is {}.

-lnumber to specify the number of non-empty lines to be used as arguments to the command for each invocation. The last invocation can use less than the specified number.

-nnumber to specify the number of arguments to be used in each invocation. The last invocation can use less than the number specified.

-p to ask for confirmation before executing the command.

-ssize to set the maximum size of the argument list for each invocation.

-t to echo the constructed command to the standard error.

Examples Let us assume that we have xfile whose contents are shown below:

more xfile
file1 file2 file3
file4 file5 file6
file7 file8 file9

If you want to pass only two arguments to the ls command at a time, use the -n flag in the following command:

xargs -n2  ls  < xfile
file1  file2
file3  file4
file5  file6
file7  file8
file9

If you want to pass two lines at a time to the ls command, use the -l flag in the following command:

xargs -l2  ls  < xfile
file1  file2  file3  file4  file5  file6
file7  file8  file9

If you want to confirm the command to be executed before actually executing the command, use the -p flag in the following command:

xargs -l2 -p ls  < xfile
sfile1file2 file3 file4 file5 file6 ?...y
file1  file2  file3  file4  file5  file6
ls file7 file8 file9 ?...y
file7  file8  file9

In the above example, you have to use the character y to confirm that the command should be executed. If you want to rename all the files that start with the name file (file1 through file9), use the -i flag following command:

ls file* | xargs -t -i cp {} {}.old
cp file1 file1.old
cp file2 file2.old
cp file3 file3.old
cp file4 file4.old
cp file5 file5.old
cp file6 file6.old
cp file7 file7.old
cp file8 file8.old
cp file9 file9.old

In the above example, the -t flag forces the display of the constructed command to the standard error.

Regular Expression

A regular expression in UNIX is a string of one or more characters and meta-characters. The commands that accept regular expressions, first have to expand it to get the specified pattern before matching it to the input. The matching is done character by character.


CAUTION: The regular expression looks similar to the file matching pattern used by some commands like the find command. But the regular expression is not same as file matching pattern.

A regular expression contains the following:

  • character set, which matches one or more characters at the specified position.
  • count, which specifies the number of the previous character to be repeated. This is an * (asterisk) to specify that zero or more of the previous character should be repeated.

  • position specifier, which is a set of special characters to indicate certain fixed positions like start of a line, end of a line, and so on.

  • meta characters to specify special meaning.

character set

A character set is a list of one or more specified characters. The character set can be specified as follows:

  • range of characters, which can be specified as two characters separated by a hyphen enclosed within square brackets. This matches one occurrence of a character within the specified range. If you specify a ^ (caret) in front of the range, the matching is reversed, that is, all characters except those in the specified range will be matched.
  • list of characters, which can be a list of individual characters enclosed within square brackets. This matches one occurrence of one of the characters in the list. You can specify a ^ (caret) in front of one or more characters to match all characters except those.

Position Specifier

UNIX allows use of a number of special characters to specify certain special positions in a line. Following is a list of these special characters:

  • ^ at the start of a regular expression to specify beginning of a line.
  • $ at the end of the regular expression to specify end of a line.

Meta Characters

A meta character is a character that, when used as part of a regular expression, has a special meaning. Following is a list of these meta-characters:

  • . to match all characters except newline.
  • * to match zero or more of the preceding characters or regular expressions.

  • ^ to match the regular expression following it at the beginning of the line (for this to work, you must specify ^ as the first character of the regular expression).

  • $ to match the regular expression preceding it at the end of the line (for this to work you must specify $ as the last character of the regular expression).

  • [ ] to match exactly one of the enclosed character. The characters enclosed can be a range or a list of individual characters.

  • \{n1,n2\} to match a minimum of n1 and a maximum of n2 occurrences of the preceding character or regular expression.

  • \ to interpret the following character as a regular character rather than a meta-character.

  • \( \) to save the enclosed regular expression for later use. These can then be reused by using \1 through \9.

  • \< to match the following regular expression at the beginning of a word.

  • \> to match the preceding regular expression at the end of a word.

  • ? to match zero or one instance of the preceding regular expression.

  • + to match one or more instance of the preceding regular expression.

Examples Let us assume that we have a file called file1 whose contents are shown below:

more file1
This is a test
THIS IS A TEST
This is really a test
Beleive it, this is really a test
This is a test, better believe it

In its simple form, you can specify a string of characters as a regular expression. If you want to find the string really in file1 use the following command:

grep really file1
This is really a test
Believe it, this is really a test

If you want to find the string THIS at the beginning of a line, use ^(caret) at the beginning of a regular expression in the following command:

grep ^THIS file1
THIS IS A TEST

If you want to find the string it at the end of a line, use $ (dollar sign) at the end of a regular expression in the following command:

grep it$ file1
This is a test, better believe it

If you want to find both believe and Believe, use the following command:

grep [Bb]elieve file1
Believe it, this is really a test
This is a test, better believe it

In the above example, [Bb] matches the single characters B or b. If you want to find characters other than the specified one, use the following command:

grep [T][^h] file1
THIS IS A TEST

In the above example, the [^h] matches anything other than h, hence the [T][^h] matches anything that starts with T and followed by any character other than h. If you want to match any six-character string preceded and followed by a space, use the following command:

grep " ...... " file1
This is really a test
Believe it, this is really a test
This is a test, better believe it

In the above example, the " ......" will match a string such as really or better preceded and followed by a space. If you want to modify all strings that start with a t and have a t at the end and have any two characters in the middle, use the following command:

sed "s/\(t\)..\1/----/g" file1
This is a ----
THIS IS A TEST
This is really a ----
Believe i----his is really a ----
This is a ----, better believe it

In the above example, \(t\) saves the character t and \1 uses the t at the specified position. If you want to find one or more instances of a regular expression, use the following command:

egrep it+ file1
Believe it, this is really a test
This is a test, better believe it

In the above example, it+ tells egrep to find one or more instances of the string it. If you want to find whether a regular expression is repeated a specified number of times, use the following command:

egrep tt\{1,4\} file1
This is a test, better believe it

In the above example, at least one, and a maximum of four, repetitions of the expression it are matched. If you wan to modify all characters other than letters, use the following command:

sed "s/[^a-zA-Z ]/:/g" file1
This is a test
THIS IS A TEST
This is really a test
Believe it: this is really a test
This is a test: better believe it

In the above example, all characters other than a through z, A through Z and spaces will be replaces by a : (colon).

Executing Commands

There are several ways to execute the commands you have learned so far. In this section, we will learn about some of the ways in which a command can be executed in isolation and in conjunction with other commands.

UNIX, by default, accepts input from standard input, which is the keyboard, and displays output on standard output, which is the terminal. You can, however, use the UNIX redirection facility to redirect the input from a file or output to a file.

You can execute a command in the foreground or in the background. When you invoke a command, by default it executes in the foreground. You can force a command in the background by using the & (ampersand) sign. You can start a command in the foreground, then force it into the background. To achieve this, you use Ctrl-z to suspend it and then use the bg command to put it in the background.

Because all UNIX commands accept input from standard input and generate output to standard output, there is a convenient way of passing output of one command to the next using the | (pipe) character. You can have a string of commands, each connected to the next using a pipe.

Summary

In this chapter you have learned about various UNIX commands. Most of these commands should work on different UNIX systems as described, but you may find that some of the commands or flags behave differently. Following is a list of some of the activities you can do using various UNIX commands:

  • Log in to related activities using commands like login, rlogin, and passwd.
  • Creating, renaming, deleting, and copying files and directories using commands like cp, rm, rmdir, and mkdir.

  • Searching for text in files using commands like grep.

  • Granting access to files and directories for users using commands like chmod and chgrp.

  • Modify contents of files using commands like vi and sed.

  • Displaying contents of files using commands like more, tail, head, and pg.

   

UNIX Unleashed: System Administrator's Edition

Buy This Product Add To MyInformIT

< Back Contents Next >

 

Featured Author

Mark Tripod Mark Tripod

Mark Tripod is a senior backbone engineer in the Network Architecture group at Exodus Communications.

Featured Book

Sams Teach Yourself Samba in 24 Hours Sams Teach Yourself Samba in 24 Hours

Offers a step-by-step, easy-to-follow guide to using Samba, teaching the fundamentals and then showing how to apply that knowledge to solve your problems.

 
 
 

Contact Us

|

Copyright, Terms & Conditions

|

Privacy Policy

|

Advertising

 

© Copyright 1999-2000 Macmillan USA. All rights reserved.
InformIT is a trademark of Macmillan USA.