|
List of all articles for this month |
| Newsgroups: |
comp.compilers |
| From: |
cpdas@groper.jcu.edu.au (David Spuler) |
| Keywords: |
errors, analysis, summary, bibliography |
| Organization: |
Compilers Central |
| References: |
91-11-087 |
| Date: |
Thu, 5 Dec 91 20:04:26 +1100 |
I received quite a large number of replies to my request
for information
about compiler-like static analysis tools. This article is a
summary of:
1) my work/ideas and
2) responses I received.
Hope it's useful.
David Spuler
MY OWN WORK
===========
Here is a summary of my own bug lists and references on the
use of static
analyzers for detecting program errors. Myself, I implemented
a full C
checker (better than Lint) as my 4th year Honours project -
hence there is an
Honours thesis. Naturally, the bug lists and references have
a C bias.
C Bugs
======
Expression Errors
-----------------
assignment instead of equality (if(x=y) versus if(x==y) )
confusing &, | and &&,|| operators
null effect statements (x<<1; instead of x<<=1)
order of evaluation errors (a[i++]=i)
side effects to 2nd/3rd operand of ||, && and ?:
Lexical Errors
==============
Nested comments
Multibyte character constants (novices use 'abc' instead of
"abc")
Flow of Control Errors
======================
unreachable statements
use of uninitialized local variables
dead assignments
function return anomalies - return statements with and without
expression
function has no return statement on some paths
constants in conditional tests (if(1>0)...)
accidental empty loops (bad semicolon) for(...); { }
missing break in switch statement
Library Usage Errors
====================
bad formats (and argument type) to printf/scanf/etc
Preprocessor Errors
===================
missing braces #define swap(i,j) temp =i; i=j; j=temp;
precedence - need brackets around whole expresion
precedence - need brackets around macro formal parameters
side effects in macro arguments
Declarations
=============
unused local variables
inner scope redeclarations hiding outer name
global problems - inconsistent use of fns etc - Lint does all
this stuff
REFERENCES (papers from my Thesis bibliography)
(some aren't highly relevant to static checking, but the titles
tell all)
=============================================================================
Spuler, D.A. "Check: A Better Checker for C", Honours Thesis,
Dept of Computer
Science, James Cook University of North Queensland, 4811. AUSTRALIA
W. R. Adrion, M. A. Branstad and J. C. Cherniavsky (1990),
"Validation, Verification, and Testing of Computer Software",
ACM Computing Surveys, Vol. 14, No. 2, pp. 159-192.
R. E. Berry and B. A. E. Meekings (1985),
"A Style Analysis of C Programs",
Communications of the ACM, Vol. 28, No. 1, pp. 80-88.
R. E. Fairley (1978),
"Static Analysis and Dynamic Testing of Computer Software",
IEEE Computer, Vol. 11, No. 4, pp. 14-23.
L. D. Fosdick and L. J. Osterweil (1976),
"Data Flow Analysis In Software Reliability",
ACM Computing Surveys, Vol. 8, No. 3, pp. 305-330.
M. T. Harandi and J. Q. Ninq (1990), "Knowledge-Based Program
Analysis",
IEEE Software, January 1990, pp. 74-81.
W. Harrison (1977),
"Compiler Analysis of Value Ranges for Variables",
IEEE Transactions on Software Engineering, Vol. 3, No. 3, pp.
243-250.
W. S. Hecht and J. D. Ullman (1975),
"A Simple Algorithm for Global Data Flow Analysis Problems",
SIAM Journal of Computing, Vol. 4, pp. 528-532.
T. R. Hopkins (1980), "PBASIC - A Verifier for BASIC",
Software Practice and Experience, Vol. 10, No. 3, pp. 175-181.
W. E. Howden (1982), "Validation of Scientific Programs",
ACM Computing Surveys, Vol. 14, No. 2, pp. 193-227.
W. E. Howden (1990), "Comments Analysis and Programming Errors",
IEEE Transactions on Software Engineering, Vol. 16, No. 1, pp.
72-81.
S. C. Johnson (1978), "Lint, a C Program Checker", Bell Laboratories,
Murray Hill, NJ, Computer Science Technical Report, July 1978
(also
appearing as part of the documentation for many versions/variants
of the
UNIX operating system, as in: Ultrix Programmers Manual - Supplementary
Documents, 1983).
W. L. Johnson (1986), Intention-Based Diagnosis of Novice
Programming
Errors, Pitman.
J. Katzenelson and A. Strominger (1987), "Debugging Programs
that use
Macro-Oriented Data Abstractions", Software Practice and Experience,
Vol.
17, No. 2, pp. 79-103.
J. C. King (1976), "Symbolic Execution and Program Testing",
Communications of the ACM, Vol. 19, No. 7, pp. 385-394.
D. E. Knuth (1971), "An Empirical Study of FORTRAN Programs",
Software Practice and Experience, Vol. 1, pp 105-133.
S. Lauesen (1979), "Debugging Techniques", Software Practice
and Experience,
Vol. 9, pp. 51-63.
T. E. Lindquist and J. R. Jenkins (1988), "Test-Case Generation
with IOGen",
IEEE Software, January 1988, pp. 72-79.
P. G. Moulton and M. E. Muller (1967), "DITRAN - A Compiler
Emphasizing
Diagnostics", Communications of the ACM, Vol. 10, No. 1, pp.
45-52.
S. S. Muchnick and N. D. Jones (1981),
Program Flow Analysis: Theory and Applications, Prentice-Hall.
L. J. Osterweil and L. D. Fosdick (1976), "DAVE - A Validation,
Error
Detection and Documentation System for Fortran Programs", Software
Practice and Experience, Vol. 6, No. 4, pp. 473-486.
K. A. Redish and W. F. Smyth (1986), "Program Style Analysis:
A Natural
By-Product of Program Compilation", Communications of the ACM,
Vol. 29,
No. 2, pp. 126-133.
B. G. Ryder (1974), "The PFORT Verifier",
Software Practice and Experience, Vol. 4, No. 4, pp. 359-377.
R. E. Seviora (1987), "Knowledge-Based Program Debugging
Systems", IEEE
Software, Vol. 4, No. 3, pp. 20-32.
N. Suzuki and K. Ishihata (1977), "Implementation of an Array
Bound
Checker", Fourth ACM Symposium on Principles of Programming
Languages, pp.
132-143.
C. Wilson and L. J. Osterweil (1985),
"Omega - A Data Flow Analysis Tool for the C Programming Language",
IEEE Transactions on Software Engineering, Vol. 11, No. 9, pp.
832-838.
============================
RESPONSES FROM OTHERS
============================
>From @yonge.csri.toronto.edu:hugh@redvax Thu Dec 5 15:06:51
1991
I too am interested in static analysis tools. I have produced
a C
compiler (intended to be commercial) that does extensive lint-like
checking.
Over a decade ago Fosdick and Osterweil (sp?) at Colorado
produced a
program to detect "data-flow anomalies" in FORTRAN programs.
I thought
that work was very interesting. At the time, I prototyped a
similar
system (theirs was so slow that they recommended it be used
only once in
the life of a program; mine was fast enough that it would not
have
noticeably slowed a compiler that had the checks added).
Do you know of the PFORT Verifier? I think that it is available
for free
from Bell Labs. I think that it is sort of a Lint for FORTRAN,
emphasizing checking for FORTRAN 77 standard conformance.
Again, a decade ago, there was some work at eliminating runtime
range
checks from Pascal programs. Clearly, this could be turned around
into
compile-time warnings, perhaps without being annoying. I think
Welsh of
Queens University of Belfast wrote one of the better papers
(dim
recollection).
Anyway, I would be interested in your bug lists and references.
Hugh Redelmeier
{utcsri, yunexus, uunet!attcan, utzoo, scocan}!redvax!hugh
When all else fails: hugh@csri.toronto.edu
+1 416 482-8253
===================================================
>From jackson@larch.lcs.mit.edu Tue Nov 26 03:23:07 1991
David,
Raymie Stata, a fellow graduate in my research group passed
me your
request from the net.
I am completing a thesis on a bug detection scheme I have
invented called
Aspect. I am attaching some blurb from a paper I have just written.
I'm
including the bibliography too, which may give you some useful
references.
If you're interested, I'd be happy to tell you more. There
is one paper
published that describes the state of Aspect about 1 year ago;
it's
`Aspect: an Economical Bug Detector', International Conf. On
Software
Engineering, May 1991.
I'd also be very interested in any references or ideas that
you have.
Regards,
--Daniel Jackson
About the Aspect Bug Detector:
Aspect is an annotation language for detecting bugs in imperative
programs. The programmer annotates a procedure with simple assertions
that relate abstract components (called `aspects') of the pre-
and
post-states. A checker has been implemented that can determine
efficiently whether the code satisfies an assertion. If it does
not,
there is a bug in the code (or the assertion is wrong) and an
error
message is displayed. Although not all bugs can be detected,
no
spurious bugs are reported....
The purpose of a compiler is not just to make it easier to
write good
programs but also to make it harder to write bad ones. Catching
errors
during compilation saves testing. It also spares the greater
cost of
discovering the error later when it is harder to fix.
Programming errors can be divided into two classes. {\em
Anomalies} are
flaws that are apparent even to someone who has no idea what
the
program is supposed to do: uninitialized variables, dead code,
infinite loops, etc. {\em Bugs}, on the other hand, are faults
only with
respect to some intent. An anomaly detector can at best determine
that a program does something right; a bug detector is needed
to tell
whether it does the right thing.
Aspect detects bugs with a novel kind of dataflow annotation.
Annotating the code is extra work for the programmer, but
it is mitigated by two factors. First, some sort of redundancy
is
inevitable if bugs, rather than just anomalies, are to be caught.
Moreover, Aspect assertions may be useful documentation: they
are
generally much shorter and more abstract than the code they
accompany.
Second, no mimimal annotation is demanded; the programmer can
choose
to annotate more or less according to the complexity of the
code or
the importance of checking it.
A procedure's annotation relates abstract components of objects
called
`aspects'. The division of an object into aspects is a kind
of data
abstraction; the aspects are not fixed by the object's representation
but are chosen by the programmer.
Each assertion of the annotation states that an aspect of
the
post-state is obtained from some aspects of the pre-state. The
checker examines the code to see if such dependencies are plausible.
If there is no path in the code that could give the required
dependencies, there must be an error: the result aspect was
computed
without adequate information. An error message is generated
saying
which abstract dependency is missing.
...
Many compilers, of course, perform some kind of anomaly analysis,
and
a variety of clever techniques have been invented (see, e.g.
\cite{carre}). Anomaly detection has the great advantage that
it
comes free to the programmer. Aspect might enhance existing
methods
with a more precise analysis that would catch more anomalies
(using
annotations of the built-in procedures alone). But there will
always
be a fundamental limitation: most errors are bugs and not anomalies.
The Cesar/Cecil system \cite{cesar}, like Aspect, uses annotations
to
detect bugs. Its assertions are path expressions that constrain
the
order of operations. Errors like failing to open a file before
reading it can be detected in this way. Flavor analysis \cite{flavor}
is a related technique whose assertions make claims about how
an
object is used: that an integer is a sum in one place in the
code and
a mean in another, for instance. Both techniques, however, report
spurious bugs in some cases: an error may be signalled for a
path that
cannot occur. Aspect, on the other hand, is sound: if an error
is
reported, there is a bug (or the assertion is wrong).
Type checking may also be viewed as bug detection when there
is name
equality or data abstraction. Aspect is more powerful for two
reasons. First, since procedures with the same type signature
usually
have different annotations, Aspect can often tell that the wrong
procedure has been called even when there is no type mismatch.
Second, type systems are usually immune to changes of state
and so, in
particular, cannot catch errors of omission. Even models that
classify side-effects, such as FX \cite{FX}, do not constrain
the
order of operations like Aspect.
The version of Aspect described here advances previous work
\cite{icse} by incorporating alias analysis. It can handle multi-level
pointers, whose precise analysis is known to be intractable
\cite{Landi}. The alias scheme adopted is most similar to
\cite{larus}, but it is less precise and cannot handle cyclic
and
recursive structures.
...
\bibitem[BC85]{carre}
Jean-Francois Bergeretti \& Bernard A. Carre,
`Information-Flow and Data-Flow Analysis of while-Programs',
ACM Trans. Programming Languages and Systems, Vol. 7, No. 1,
January 1985.
\bibitem[CWZ90]{chase}
David R. Chase, Mark Wegman \& F. Kenneth Zadeck,
`Analysis of Pointers and Structures',
Proc. SIGPLAN '88 Conf. Prog. Lang. Design \& Implementation,
1990.
\bibitem[GL88]{FX}
David K. Gifford \& John M. Lucassen,
`Polymorphic Effect Systems',
ACM Conf. Principles of Programming Languages, 1988.
\bibitem[How89]{flavor}
William E. Howden,
`Validating Programs Without Specifications',
Proc. 3d ACM Symposium on Software Testing, Analysis and Verification
(TAV3),
Key West, Florida, Dec. 1989.
\bibitem[Jac91]{icse}
Daniel Jackson,
`Aspect: An Economical Bug-Detector',
Proc 13th Int. Conf. on Software Engineering,
Austin, Texas, May 1991.
\bibitem[Lan91]{Landi}
William Landi and Barbara G. Ryder,
`Pointer-induced Aliasing: A Problem Classification',
ACM Conf. Principles of Programming Languages, 1991.
\bibitem[LH88]{larus}
James R. Larus and Paul N. Hilfinger,
`Detecting Conflicts Between Structure Accesses'
ACM Conf. Programming Language Design and Implementation,
June 1988.
\bibitem[OO89]{cesar}
Kurt M. Olender \& Leon J. Osterweil,
`Cesar: A Static Sequencing Constraint Analyzer',
Proc. 3d ACM Symposium on Software Testing, Analysis and Verification
(TAV3),
Key West, Florida, Dec. 1989.
===================================================
>From adv5@shakti.ernet.in Tue Nov 26 03:17:02 1991
I am a member of a quality control team in Citicorp Overseas
Software Ltd.
I do a lot of desk work, to test code validity.
Till now I have used manual methods for static analysis of
code, using
tables of states of variables, and basically sweating it out.
I wish to
know if you have more information on known bugs or pitfalls
in various
constructs of a language.
I will dig out some information on static analysers, and
mail them to you
Pankaj Fichadia.
===================================================
>From @bay.csri.toronto.edu:norvell@csri.toronto.edu Tue
Nov 26 02:04:23 1991
I'd be very interested in your list of references. Unfortunately
I can't
find my own list of references to give you in return. Perhaps
I didn't
type it in yet. I _can_ send you an unpublished paper (complete
except
for bibliography) on detecting dataflow anomalies in procedural
languages.
There is also an experimental program that implements the ideas
of the
paper. The paper can be sent in postscript or dvi and the program
in in
Turing -- a Pascal type language.
Theo Norvell norvell@csri.toronto.edu
U of Toronto norvell@csri.utoronto.ca
===================================================
>From ae2@cunixa.cc.columbia.edu Sun Nov 24 21:40:04 1991
Greetings!
I have read your note about error detections in compilers.
I have a great
interest in this particular field as my final project has to
do with the
implemetation of expetional handlin in "Small C", a compiler
designed on
the IBM 8088 and it's sole interest is educational, something
equivalent
to minix. I would greatly appreciate if you could help me in
finding
sources that dwell on this subject, anything that would be related
to
errors and how one might deal with them would be relavant.
Many Thanks In Advance
--Amiran
ae2@cunixa.cc.columbia.edu
===================================================
>From paco@cs.rice.edu Sun Nov 24 04:42:03 1991
The Convex Application Compiler (TM?) apparently does a pretty
good job.
Any interprocedural analyzer has to catch a lot of errors just
to avoid
crashing. They also do pointer tracking, array section analysis,
and
generally just all-out analysis and optimization. Bob Metzger
<metzger@convex.com> et al. have a paper on the pointer tracking
in the
proceedings of Supercomputing '91, which was held last week.
It is alleged that Convex has sold machines, or at least
copies of their
compiler, just for use in static error checking.
Paul Havlak
Graduate Student
===================================================
>From jvitek@csr.UVic.CA Sat Nov 23 11:54:49 1991
You might be interested in looking at abstract interpretation.
It is a
technique related to data-flow analysis (you can express data-flow
anlyses
as abstract interpreation problems) and is quite popular in
among the
functional programming crowd.
There is a number of paper (even books!) on the subject,
if you are
interested I can provide references.
Regards, Jan Vitek/ Graduate Student/ University of Victoria/
BC/ Canada
===================================================
>From twinsun!twinsun.com!eggert@CS.UCLA.EDU Fri Nov 22
07:17:14 1991
Here's a few references that I've written:
Paul Eggert,
Toward special-purpose program verification,
.i "Proc. ACM SIGSOFT International Workshop on Formal Methods
in Software Development",
Napa, CA (9-11 May 1990);
.i "Software engineering notes"
.b 15 ,
4 (September 1990), 25-29.
Paul Eggert,
An Example of Detecting Software Errors Before Execution,
.i "Proc. workshop on effectiveness of testing and proving methods,"
Avalon, CA
(11-13 May 1982), 1-8.
Paul Eggert,
Detecting Software Errors Before Execution,
UCLA Computer Science Department report CSD-810402 (April 1981).
The last reference contains an extensive survey of the pre-1980
literature.
I'd be interested in seeing your list of references as well,
when you have the time.
===================================================
>From beck@cs.cornell.edu Fri Nov 22 05:31:04 1991
I'm interested in hearing your list of bugs. I have given
some thought to
the detection and propagation of error information in a program
using
dependence information. Propagation of errors uses the idea
that any
computation on a path which leads only to an error condition
is dead
unless a side-effect intervenes, and any code after the error
is
unreachable. Thus one can actually integrate program errors
into control
flow information as an unstructured jump to the end of the program.
In an
optimizing compiler, one might be tempted to say that any statement
which
can be scheduled after the last side effect before the error
is dead.
Thus, in this fragment:
1: x := 2
2: print "hi there"
3: y := z/0
statement 1 is actually dead, but statement 2 is not.
Micah Beck
--
|