Journal:    Data Based Advisor  April 1993 v11 n4 p83(9)
* Full Text COPYRIGHT Data Based Solutions Inc. 1993.
-------------------------------------------------------------------------
Title:     Programs up to the test.
             (software testing tools)(includes related bibliography)
Author:    Farley, Kevin J.


Abstract:  Software testing is possible with two types of tools:
           regression and coverage.  Regression tools are used to
           selectively retest software to detect faults introduced during
           modification.  Coverage testing is used to test an application
           and report what lines of code or paths through which the code
           did or did not get exercised.  An effective regression testing
           toolkit should support modular event capture files, help
           manage all event-capture files, support documentation for test
           files, provide screen comparison, provide a programming
           language, support data driven testing and help manage
           exception events.  Detailed descriptions are provided for Dr.
           Taylor's Test from Vermont Creative Software Inc, Evaluator
           from Eastern Systems Inc, Ferret from Tiburon Systems Inc,
           Gate and On Stage from InfoMentor Software, KeyLog from
           SofTech Microsystems, Software TestWorks from Software
           Research Inc, and TestRunner from Mercury Interactive Corp.
-------------------------------------------------------------------------
Full Text:

Two primary categories dominate software testing systems today:
regression tools and coverage tools.  Coverage testing is the process of
checking an application and reporting what lines of code or paths through
the code did or didn't get exercised.  The Coverage Analyzer included in
WordTech's Arago was an example of this type of tool.  Since coverage
test tools are language-specific and have yet to cover Xbase dialects
(they're primarily for C and Ada), this comparison will focus on
regression tools.

Let's begin with a definition of regression testing.  According to the
IEEE Standard Glossary of Software Engineering Terminology, "Regression
Testing is the selective retesting [of software] to detect faults
introduced during modification of a system or system component, to verify
that modifications have not caused unintended adverse effects, or to
verify that a modified system or system component still meets specified
requirements." It means running the same inputs through the system over
and over until everything works.  Note that this becomes more important
(and effective) the more revisions a product goes through.  A new release
should handle everything the old release handled that hasn't explicitly
changed.  This makes for fast, easy retesting of a new release.

A regression test tool is intended to simulate a user's keystrokes or
mouse movements and to monitor the screen (at a minimum) to compare
expected behavior with actual results.  Input from the user--whether
keystrokes or mouse movements--are referred to as "event-capture files"
when captured or recorded by the regression test tool.  This capture
process can be handled in numerous ways, and each product goes about it
in its fashion.  Based on the companion article's discussion of the
general theory and practice of testing methods, let me propose the
following features as comprising an effective regression test toolkit.

Features to look for

In addition to the simple feature of recording and playing back input
events and monitoring the screen, the product should:

1.  Support modular event-capture files to promote modular and
maintainable test suites.  In other words, each menu option, or discrete
function should have an associated event capture/playback file.

2.  Help manage all of the event-capture files (and associated data
comparison files) since these can number in the hundreds, if not
thousands.  In my company, we have over 200 options in one application.
Each option needs not only a separate event file, but an expected results
file (results file).  So, the product should help manage these 600 files
for us.

3.  Support some level of documentation for the files.  Every test file
has a specific purpose, an intended context, and a means to determine
whether the test passed or failed.  This must be documented simply for
the tester to be able to do his or her job.  A good product will support
documentation of each test.

4.  Provide a screen-comparison tool with masking features to indicate
what parts of the screen are irrelevant.  Screen comparisons are often a
simple test to determine whether or not an operation has passed or
failed.  Yet portions of the screen are subject to change (date, time,
release number, and user I.D.  at a minimum).  A screen tool must let the
user specify what portions of the screen to ignore for the sake of a
pass/fail test.

Even better, the product should:

5.  Provide a programming language to allow WHILE loops, IFs, CASEs, file
I/O, and parameter passing.  Try writing an application with none of
these constructs.  You can't do much.  So it is with test applications.
You don't get far without language support, testing becomes exciting,
powerful, and profitable.

6.  Support for data-driven testing so that testers can concentrate more
on creating test transaction databases or sequences and less on simply
capturing and playing back recorded input events.  Although you might
think this is covered under the previous feature, it's actually more
advanced.  Just because a language theoretically supports this type of
construct doesn't make it practical to perform data-driven testing.

7.  Help manage exception events so unattended testing can proceed more
reliably and effectively.  Printer errors, assertion violations, memory
errors, etc., cause big headaches during systematic regression testing.
The ability to specify a global exception and an associated resolution of
the event (restart the test, reboot the PC and skip to the next test, or
beep until paper is added to the printer) is an important feature when
doing serious regression testing.

With these sets of features in mind, let me break up the products to be
reviewed into three categories.  The first category represents the
economical, relatively simple products that take care of the basics of
capture and playback.  The second group consists of mid-range products
that add value to test management.  The last category includes higher-end
products that take a more sophisticated approach to the whole process.

Category 1: Just the basics

This category of product provides the ability to capture a keystroke
sequence to a file, capture screens, play the keystroke sequence back,
and compare screens.  These products don't support mouse events.

Dr.  Taylor's Test

If you've heard of any testing product, this is probably the one, since
it's been around for several years--until recently it's been known as
Ghost.  Dr. Taylor's Test is a stable product providing the basics and
offering responsive support and good documentation.

The product installs easily, takes up a minimum of memory, and has an
easy-to-use menu for you to specify whether you're in capture, playback,
or idle mode.  You can probably master this product in 30 minutes.

Strengths: It does what it purports to do, clearly and effectively.  It
captures keystrokes and plays them back, supports embedded comments,
supports rudimentary result reports, and offers a screen capture and
comparison tool.

Weaknesses: It doesn't play back more than one capture file at a time,
forcing you to create "spaghetti" macros to run tests of even the
simplest magnitude.  The way around this problem, according to the
vendor, is to merge modular files before executing the test suite, but
this is quite tedious and a bit unrealistic if the process of testing is
to be efficient and cost effective.  Last, the screen capture/comparison
tool supports only 80 x 25 screens.  My shop does most of its work in 94
x 36 or 80 x 50.

Suitability to task: This product meets almost none of the proposed
criteria for a testing product.  The only feature it purports to provide
is screen comparison, but, again, this is limited to 80 x 25 screens.

System requirements: 15K memory, PC running DOS.  Not language specific.

KeyLog

A recent player in this market, Ted Means, KeyLog's creator, has come up
with a nifty way for CA-Clipper programmers to link capture/playback
capabilities right into their Clipper applications.  Like any function
library, the source code must be compiled and linked using function
calls.  This is a different approach to the testing process but can be
extremely powerful in the right hands.

Strengths: Since KeyLog is a Clipper add-on library, it has the advantage
of allowing very complex test applications to be developed using all of
the power of the Clipper language.  It handles playback of multiple
record files and virtually any kind of reporting or data-driven testing.
However, since you have to write the code yourself, its usefulness
depends on your programming prowess.  One interesting aspect is that
Clipper applications can include the ability to allow users to report
bugs by sending in a keystroke capture file that demonstrates the bug.
This makes replicating the bug--the most important phase of
bug-eradication--a non-issue.  This also makes for an efficient means of
growing the test-point database to use for future releases.

Weaknesses: Naturally, KeyLog is only useful to Clipper programmers.  It
comes with no screen comparison tool.  Since it's a library, it lacks an
interface.  Also, documentation comes only in the form of a Norton Guide
data file.

Suitability to task: This product addresses criteria 1 and 5 explicitly
and, with the help of some advanced programming, could actually address
all of the specified criteria for Clipper applications.  It has a lot of
potential but would need additional features to provide a true test
environment to the Clipper community.

Requirements: 2K additional loadsize in application, PC running DOS,
CA-Clipper 5.x.

Category 2: The basics-plus

This group provides the features of Category 1, with the addition of
mouseevent capture/playback, test-script language support, and a test
management system.  The management feature is able to control individual
capture/playback files and pass/fail results for specified test sessions
running numerous individual capture scripts.  Mouse playback accuracy is
a touchy area.  Different mice and bios versions complicate this.  I've
formed the impression that high-end tools (the next category) pay much
more attention to precise mouse tracking.  For the purpose of this
review, I didn't closely examine mouse accuracy.

Software TestWorks

This company makes a variety of tools for non-DOS operating systems.
They also sponsor a Software Quality Week conference, which has some very
impressive participants.  The DOS version of their product line
incorporates capture/playback (CapBak) with test integration (Smarts),
screen comparison (ExDiff), and test data generation (TDGen).  The
capture playback tool is effective but will work on multiple files only
in conjunction with its companion, Smarts.  These products are ported to
DOS from UNIX and X-Windows--and it definitely shows.

Menu navigation is primitive to the point of working like '70s
technology.  You can't use the integrated product without learning an
arcane, format-intensive script language.  The documentation comes from a
copy machine in a three-ring binder.  It's poorly organized and
bewildering in its repetition of non-essential material and complete
omission of other more crucial information.  To top it off, the product
requires at least 250K of free memory to run in DOS--making it virtually
unusable with any application I want to test.

Strengths: Smarts provides some rudimentary control structures and the
ability to execute tests based on previous pass/fail performance.  It can
also feed CapBak any capture files to play.  The package supports screen
masking and any size of screen comparison.

Weaknesses: This product needs the attention of somebody PC-literate.
The documentation and user interface are inane and obtuse.  The test
language is a mish-mash of C-like function calls and format-intensive
templates to specify capture/playback scripts and output logs.  All of
this should be integrated into one common language.  Finally, I consider
the memory requirements unacceptable.

Suitability to task: Smarts meets criteria 1, 4, and 5 (given that you
can meet memory requirements).  It fails on all other counts.  It is
neither an easy nor intelligent product to use.

Requirements: CapBak: 57K memory, Smarts with CapBak: 240K memory, PC
running DOS (available in UNIX X-Windows as well).  Not language
specific.

Gate with On Stage

This software was a pleasure to use.  It offers excellent documentation
and an elegant user interface.  On Stage is the test-suite manager and
Gate is the event record/playback tool.  On Stage adds a great deal of
functionality to the test process by integrating event record/playback
features with test documentation support and comparison data file
management.  This means you don't have to invent incredibly complex
naming conventions or directory structures to support the file-intensive
test process necessitated by modular test scripts, baseline, and result
files.  This is a must for any non-trivial testing.

Strengths: Gate processes unlimited playback files, which are listed in a
Logfile and processed sequentially.  On Stage provides a great way to
graphically organize test libraries and maintain documentation and
references to which test file should be compared to which result file.
On Stage allows creation of a log that can be run by Gate.  The user then
exits On-Stage, leaving upwards of 536K (under DOS 5 with DOS=HIGH) to
run your application.  This package offers impressive price/performance
features not available in any other product regardless of price.

Weaknesses: It lacks a test language to support control structures and
parameter passing.  This means you have to spend more time recording
macros than is desirable.  Testers also will have less flexibility
setting up interesting test suites and handling various error conditions.

Suitability to task: Gate and On Stage meet the criteria of 1 through 4
and is exceptionally well engineered.  The product is easy to use--and
the only one to truly incorporate test file management and test
documentation.  This is my choice to meet entry-level to intermediate
test requirements.  Gate can be purchased alone as the first stage of
recording and playing back input events, and On Stage can be added if you
want an integrated test environment.

Requirements: Gate: 92K, PC running DOS.  On-Stage with Gate: 92K
dedicated RAM, PC running DOS.  Neither product is language specific.

Category 3: the high-end

These products take a different approach to the testing process.  All of
them are non-intrusive (to varying degrees).  This means they run on a PC
other than the application under test.  This allows the application to
run under more lifelike conditions and provides a way to apply more
horse-power to the test software.  It also implies platform independence,
since the operating system of the test software is irrelevant to the
operating system of the application software to a large degree.  Thus,
all of the following products can test non-DOS applications as well.  All
incorporate the features of category 1, as well as some features of
category 2.

Evaluator

The most moderately-priced product of this class, Evaluator, ships in a
box that contains cards and cables to install in the PC running the
application to be tested and the PC to run the test software on.  It's
easy to install and operate.  To operate Evaluator, you replace the VGA
card in the PC running the application to be tested with a card that
attaches software.  The printer and keyboard of the system under test
attach to a box that splits this I/O into both PCs.  In this way the test
product gets all of the I/O to the application while letting the
application run in its native environment with no competing instructions
in the CPU.

Strengths: The product supports a

Strengths: The product supports a good programming language with control
structures and parameter passing.  (This means that one capture macro can
be used for all kinds of situations instead of having to record every
possible permutation of a feature to test it).  It includes informative
documentation. It's also the only product I evaluated that allows capture
of the printer output for comparison and review.  Evaluator supports
multiple screen masks and windows (areas of interest to the testing
software).

Weaknesses: The user interface to the product is primitive.  It isn't
clear what all the options mean or why they're organized the way they
are.  Menus and inputs are all text-based and cryptic.

Suitability to task: This product meets criteria 1, 4, and 5.  I found it
useful, intelligently engineered, and relatively easy to use.  It's far
less pricey than the other products reviewed in this category although it
lacks some key features that they provide.

Requirements: Additional 386 with VGA to run Test.  VGA adapter on system
under test.

Ferret

Ferret was developed in-house by an established defense contractor (doing
projects like Cruise Missile inertial guidance systems, and other
not-so-trivial applications), that had a need for serious testing tools.
Ferret consists of a separate CPU and graphics box combined with software
working under DOS and Windows 3.1 to run the test CPU.  It's the only
product that uses only a parallel feed from the CRT signal instead of
replacing a VGA card or slaving off of it.  This means it's completely
portable for any CRT regardless of hardware; it doesn't care whether it's
attached to a 3090, RISC processor, AS/400, SUN, VAX, or PC.  In fact,
one unit can simultaneously be used on eight different processors.

Also of interest: its holistic approach to testing.  The package comes
with a tool to help determine acceptable bug limits and testing
guidelines to assure a desired reliability rating of the application
under test.  This is an important principle from quality assurance (QA)
that can be helpful in the PC development world.  Test engineers know
that all software has bugs regardless of revision or user base.  Working
from a projected number of bugs per 1,000 lines of code (typically from
three to 10), the test phase proceeds until the probability of a bug
remaining is acceptable to the product management team (based on how many
bugs are found per unit of time).

Ferret helps you explicitly manage this process and create management
reports on how reliable your software is predicted to be based on the
current "bug extraction" rate.  This is an important concept that I'm
sure we'll hear more about in coming years.

Strengths: The true portability of this product is unmatched.  Additional
features--bug management, highly configurable screen-capture events,
meticulous attention to mouse accuracy, and an integrated Windows
environment--make this a strong tool for serious testers.  This product
also supports limited event synchronization (discussed in more detail in
the TestRunner review).

Weaknesses: The test language for Ferret is still under-developed and
architecturally separates program statements in "Suites" from capture
events in "Scripts." This is the same approach that Software TestWorks
took (which I disliked so much), but in Ferret it isn't nearly as
painful.  The language doesn't allow a program to call another program,
placing undue limitations on serious testers.

Even with an emphasis on recording and playback as the primary means of
testing a product (which I'll discuss in the context of TestRunner),
editing a recorded script isn't as easy as it could be.  Instead of
starting the playback of a recorded script, stopping at the desired edit
point, starting a record session to insert some new events (such as a new
field to edit), and then allowing the rest of the playback session to
continue, the user has to record a new script or manually insert a new
script into the existing one.

The bottom line: This sophisticated hardware product has an
unsophisticated software environment.  (The vendor says it's aware of
these limitations and is currently working on improvements).

Suitability to task: This product supports criteria 1, 3, 4, and 5 and
has made a strong effort at integrating the test environment into one
platform.  A Ferret installation comes with intensive training and a
Windows environment--including Word for Windows and Excel.  Its Bug
Management Facility is outstanding.  Weak software architecture and some
under-developed features make this a product to watch for future
developments.

Requirements: None (system includes hardware and software).

TestRunner

This product is the creme de la creme of all the products I reviewed--and
probably belongs in a category of its own.  TestRunner not only does
non-intrusive testing, it's a splendid example of how automation can
bring simplicity and power to a tedious and labor-intensive process.

A TestRunner installation comes with two days of on-site training by a
test engineer.  The product runs under OS/2 and has an associated
graphics accelerator processor to handle the video-intensive task of
"output synchronization." Output synchronization addresses a classic
problem with capture/playback scripts.  In the course of development, the
application may not always have exactly the same response time based on
processor, disk I/O, memory constraints, and other variables, which can
be a significant problem if you've spent many hours testing a product.
TestRunner handles this transparently by allowing the user to specify
what input events (mouse or keyboard) are important and then saving the
before- and after-screen-shot once a significant key is pressed.  When
the session is played back, the processor automatically checks to see if
the appropriate screen has appeared prior to sending the next input event
to the application under test.

Next, TestRunner has an OCR capability that will recognize any language,
font, or icon so that GUIs are as easy to test as traditional text
interfaces.  This means the test script doesn't need to use an explicit
screen location to replay mouse events or read the screen.  It simply
scans the screen for key text or icons and locates the matching item
before executing an event or evaluating the output.  This makes GUI
testing a breeze.

Another feature of note is its exception-handling capability.  TestRunner
allows the user to specify any screen event to be a global exception.  No
matter what record macro is being played back, if any of these events are
detected, they'll be handled in the specified exception process.  This is
valuable to detect system error messages, device errors, and assertion
errors embedded in the test application.  Once these exceptions are
detected, you have the option of skipping or restarting the test or even
rebooting or powering down the application hardware.

Strengths: An excellent user interface and application design make this
sophisticated product easy to use.  Use of OS/2 and Windows on the test
CPU make this product even more powerful, since the test workstation can
run tests and reports while the tester is designing or documenting other
tests simultaneously.  This environment is definitely the most effective.

A robust programming language combined with a unique code-generating
recording methodology make this the easiest product to create data-driven
test applications for.  It took me 30 minutes to write a routine that
read an input file of customers and amounts and have my POS application
cash checks, print receipts, compare the results, and write them to a
report.  This was the only product that could completely perform
data-driven testing in the sense that the test routines read files as the
sole basis for performing transactions and comparing results.  This is a
far more powerful means of testing than simply recording and playing back
hundreds (if not thousands) of canned scripts.  Testing becomes entirely
more meaningful when it can get out of the constraint of simply replaying
a prerecorded sequence of events.  There are simply too many test cases
to pre-record and maintain in the course of a product life-cycle.

Combine these features with output synchronization, exception handling,
and excellent OCR capabilities, and you get a top-of-the-line test tool.

Weaknesses: It can't capture printer or serial port events.

Suitability to task: This product meets all of the criteria for a test
tool, although criteria 2 and 3--addressing test file management--aren't
as rigorous or intuitive as the On Stage approach.  I consider this a
minor issue compared to the wealth of features and power that this
product offers.  It costs in the five figures, but there's no question in
my mind that the cost/benefit supports it.

Requirements: The system under test must be running on an x86 processor
with standard video output.

Which one should you try?

Software testing will, no doubt, be a hot topic for this decade.
Automated tools will become standard in any shop committed to quality.
In this review I've focused on DOS-based testing tools, however, a review
of tools specifically for Windows programmers will soon be forthcoming.

Of the tools I tested, three stand out.  For the economy solution with
some luxury features, I choose Infomentor's Gate/On-Stage combination.
The vendor has created an elegant and powerful method of organizing test
scripts, specifications, and results.  Their capture/playback tool is
clean and well architected.

In the mid-range category, you can't go wrong with the Evaluator unit.
The non-intrusive approach (to the CPU not the video) combined with good
test language architecture make this a solid product at a reasonable
price.  It doesn't have all the features of the top-of-the-line, but it
comes in at a fifth of the price.

In the category of "Best of Show," there's no question that TestRunner
rules. It defines the meaning "Computer Aided Software Testing," a term
applied to many of the tools on the market.  TestRunner is the only
product that actually made my life easier and enabled me to do the kind
of testing our company has only dreamed of before.  Our goal--to find a
tool that could read a transaction database from any of our stores on any
given day and perform the same transactions for any new software
release--could only be met by TestRunner.  If you want the best, this is
it.

SOFTWARE TESTING

Art of Software Testing, The Glen J. Myers John Wiley & Sons

Building Quality Software Robert L. Glass Prentice-Hall

Complete Guide to Software Testing, The William Hetzel QED Information
Sciences

Object-Oriented Software Construction Bertrand Meyer Prentice-Hall

Personal Computer Quality Boris Beizer Van Nostrand Reinhold

Software Configuration Management H. Ronald Berlack John Wiley & Sons

Software Engineering: A Practitioner's Approach Roger S. Pressman
McGraw-Hill

Software System Testing and Quality Assurance Boris Beizer Van Nostrand
Reinhold

Software Testing Techniques, 2nd Ed.  Boris Beizer Van Nostrand Reinhold

Testing Computer Software C. Kaner TAB Books

Tutorial: Software Testing & Validation Techniques Miller & Howden,
Editors.

Zero-Defect Software G. Gordon Schulmeyer McGaw-Hill

5th International Software Quality Week 1992, Proceedings Software
Research Institute IEEE Computer Society

Related article: Products and Companies

Dr.  Taylor's Test $249 Vermont Creative Software, Inc. Pinnacle Meadows
Richford, VT 05476 (802) 848-7731, 800-242-1114 fax (802) 848-3502

Evaluator, Pricing based on graphics card VGA, $5,895
(backward-compatible) Super VGA, $6,595 (supports 1024 x 768 (256
colors)) C interface, $695 (option) Microchannel, $5,995 C-based GUI
Toolkit, $695 Eastern Systems, Inc./Elverex P.O. Box 310 Hopkinton, MA
01748 (508) 435-2151 fax (508) 435-2517

Ferret, $34,949 for single PC platform with VGA (standard) resolution
Tiburon Systems, Inc. 1290 Parkmoor Ave. San Jose, CA 95126 (408)
293-9098 fax (408) 293-9090

Gate, $395 On Stage, $595 InfoMentor Software 20380 Town Center Lane,
Suite 230 Cupertino, CA 95014-3212 (408) 253-8080 fax (408) 253-8096

KeyLog $49 SofTech Microsystems P.O. Box 12582 Wichita, KS 67277
Phone/fax (316) 729-9315 CompuServe 73067, 3332

Software TestWorks, Regression tools for DOS, $2,100 per CPU (Code
coverage tools are also available for $3,400 per language; a full system
including both groups of tools for DOS is $4,800.) Software Research,
Inc. 625 Third St. San Francisco, CA 94107-1997 (415) 957-1441,
800-942-7638 fax (415) 957-0730

TestRunner, Initial unit, $30,000 (includes 2 days of training); second
license, $20,000 Mercury Interactive Corp. 3333 Octavius Dr. Santa Clara,
CA 95054 (408) 987-0100 fax (408) 982-0149
-------------------------------------------------------------------------
Type:      buyers guide
Company:   Vermont Creative Software
           Eastern Systems Inc.
           Tiburon Systems Inc.
           InfoMentor Software
           SofTech Microsystems Inc.
           Software Research Inc.
           Mercury Interactive Corp.
Topic:     Software Validation
           Testing
           Directories
           Software Packages


Record#:   13 775 783.
                              *** End ***
