May the source be with you, but remember the KISS principle ;-) Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers (slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

# Software renovation: Refactoring vs Restructuring

 News Software Engineering Recommended Links Real Insights into Architecture Come Only From Actual Programming Understanding Program Slicing Control Flow Decompilation Differential Testing Peephole Refactoring Literate Programming Conway Law Brooks law SE quotes Humor Random Findings Etc

In 1976, Belady and Lehman formulated their ‘Laws of Program Evolution Dynamics’.

• First, a software system that is used will undergo continuous modification.
• Second, the unstructuredness (entropy) of a system increases with time, unless specific work is done to improve the system’s structure.

This activity of improving legacy software systems is called system renovation. It aims at making existing systems more comprehensible, extensible, robust and reusable

Legacy systems are software systems that resist change. Making modification are costly. For pricing purposes there are usually three category of project

1. With codebase of any single isolated software component less then 1K lines of code. You usually can provide reasonable estimate of the cost and timeframe.  Recoding  in a new language often represents a viable approach (PL/1 to Java, Perl to Python, etc)
2. With codebase less then 10K lines of code. You usually underestimate the cost and time two times. If this is the case of OO codebase double this estimate again ;-) Recoding in new language still represent a viable option, but increasingly less so. And it tend dramatically increase costs.
I had a pile of C++ dropped in my lap 2 years ago. (Score:3, Informative) by Richard Steiner (1585) <rsteiner@visi.com> on Friday January 18, @02:02PM (#22098046) Homepage Journal

My main tool for figuring it all out was to use exuberant ctags [sourceforge.net] to create a tags file, and Nedit [nedit.org] to navigate through the source under Solaris, with a little grep thrown in. I also used gdb with the DDD [gnu.org] front-end to do a little real-time snooping.

I've since added both cscope [sourceforge.net] and freescope [sourceforge.net], as well as the old Red Hat Source Navigator [sourceforge.net] for good measure.

I used to work at a company with a lot of Pascal and C code... It was extremely common (as in, all but a few) for programs to be written entirely in one code file. These files would go on for 20,000 lines or more. So many lines in fact that after the compiler had imported the header files at the top of the file that they would be over 65,000 lines long and the debugger would crap out because it had exceeded the int that it used for line number counting.

3. With codebase over 100K lines of code. Those projects can succeed if team is strong and highly dedicated. and it there is no read take involved. If not they are doomed.

A few years back I had to maintain a large module written in C#. I had about 200K lines of code, 50 classes, zero documentation, zero comments, zero error logging support, and I was expected to find and fix bugs and add functionality the day after the module was handled over.

So if you were never in this position, just STFU. Yeah, the code is there, but [what] is this flag for? Is this part really used, or is obsolete? What are the side-effects of using that method? And so on...

Eventually, I learned it, especially after some intensive debugging sessions, but it was frustrating to say the least. I would have loved to have some aiding tools.

4. With codebase over 100K lines of code.  This is the level of code complexity typical for modern interpreters from languages like Python and Perl with their huge libraries. The success is not given, no matter what you try and what approach you choose.

The key part of the work is investigative work to understand the thought processes of original developers.  The involves tracing code in the debugger, which usually is more productive the static analysis.  I've always found that stepping through the debugger at runtime is a decent way to start making sense of a large code base. Just set a breakpoint at a point of interest, fire up the application, and use it as a starting point of exploring the codebase.  This is typically how malware is analyzed.   After a couple weeks or a month usually the technical architect of the project can start to create the map of the key components, creating a subset of those which can be called "The most scary components/sections of the code"  Is case of OO  codebase (which is a very small portion of legacy code but still happens) it can be harder to reverse-engineer the  logic because it's distributed among various classes. A debugger that lets you step through running code is almost essential in this case.

I would suggest a slight variation on the theme. Fire up the application, start it on one of its typical tasks, and then interrupt it in the debugger to catch it. While the process is stopped mid-flight, take note of the call stack to see which classes and methods are being used. Maybe step through a few calls, then let the program run some more.

By doing this repeatedly, you will quickly get a sense for which parts of the code see the most action, and would provide the most obvious places to start studying the code base, and provide the best bang-for-buck return on your time.

NOTE: This approach doesn't work for a code base with over 100K of lines of atrociously written code. You will just drown in it.  "Large" projects might have 250 source code files and thousands of functions or classes and likely a dozen or so interacting executable programs. Just printouts of source code that fill several bookcase shelves.

In large case tests are very good means that help to understand the code base. Writing a test framework with the battery of tests for each subsystem that interests you  helps to understand what particular parts of the code actually do. And they when you need to change something you can be more sure that you didn't break anything.

Mitch Rosenberg of InfoVentions.net, Inc. claims that the following law exist (he calls it code or data archaeology law):

Everything that is there is there for a reason, and there are 3 possible reasons:

1. It used to need to be there but no longer does
2. It never needed to be there and the person that wrote the code had no clue
3. It STILL needs to be there and YOU have no clue

The corollary to this "law" is that, until you know which was the reason, you should NOT modify the code (or data).

In other words large part of the work is simply software architecture recovery, the extraction of architectural information from lower level representations of a software system..

Architecture recovery became even more necessary as typically those systems do not often have the reliable  architectural documentation, and when some documentation exists it is typically reflects the version that is many versions below the current version of the system. Systems like Doxygen should definitely be used at this phase. If its in a language that Doxygen can understand, that's the tool I would HIGHLY recommend. You can configure doxygen to extract the code structure from undocumented source files. This is very useful to quickly find your way in large source distributions.

Along with dynamic analysis via debugger,  static analysis also needs to be performs. Call graph is an important element here (For C and C++ Doxygen  generates caller and callee graphs for all functions.) But even such a simple task as creation of cross reference tables of variables used and slow and painful work of understanding their meaning and interactions cn greatly help.

Slicing and slicing tools are also very important elements of static analysis of the code.  See old butstill good The Wisconsin Program-Slicing Tool, Version 1.0.1

But dynamic analysis via debugger is the key, especially in the mess that an OO system for which key developer left many years ago typically represents. In this case the dynamic analysis becomes an essential technique to comprehend the system behavior, object interactions, and hence to reconstruct its architecture. In this work, the criteria used to determine how source code entities should be clustered in architectural elements are mainly based on the dynamic analysis of the system, taking into account the occurrences of interaction patterns and types (classes and interfaces) in use-case realizations.

 Top Visited

Your browser does not support iframes.

Switchboard Latest Past week Past month

## Old News ;-)

#### [Oct 15, 2019] Learning doxygen for source code documentation by Arpan Sen

###### Jul 29, 2008 | developer.ibm.com
Maintaining and adding new features to legacy systems developed using Maintaining and adding new features to legacy systems developed using C/C++ is a daunting task. There are several facets to the problem -- understanding the existing class hierarchy and global variables, the different user-defined types, and function call graph analysis, to name a few. This article discusses several features of doxygen, with examples in the context of projects using C/C++ .

However, doxygen is flexible enough to be used for software projects developed using the Python, Java, PHP, and other languages, as well. The primary motivation of this article is to help extract information from C/C++ sources, but it also briefly describes how to document code using doxygen-defined tags.

Installing doxygen

You have two choices for acquiring doxygen. You can download it as a pre-compiled executable file, or you can check out sources from the SVN repository and build it. You have two choices for acquiring doxygen. You can download it as a pre-compiled executable file, or you can check out sources from the SVN repository and build it. Listing 1 shows the latter process.

##### Listing 1. Install and build doxygen sources

bash‑2.05$svn co https://doxygen.svn.sourceforge.net/svnroot/doxygen/trunk doxygen‑svn bash‑2.05$ cd doxygen‑svn
bash‑2.05$./configure –prefix=/home/user1/bin bash‑2.05$ make

bash‑2.05$make install  Show more Note that the configure script is tailored to dump the compiled sources in /home/user1/bin (add this directory to the PATH variable after the build), as not every UNIX® user has permission to write to the /usr folder. Also, you need the Note that the configure script is tailored to dump the compiled sources in /home/user1/bin (add this directory to the PATH variable after the build), as not every UNIX® user has permission to write to the /usr folder. Also, you need the Note that the configure script is tailored to dump the compiled sources in /home/user1/bin (add this directory to the PATH variable after the build), as not every UNIX® user has permission to write to the /usr folder. Also, you need the Note that the configure script is tailored to dump the compiled sources in /home/user1/bin (add this directory to the PATH variable after the build), as not every UNIX® user has permission to write to the /usr folder. Also, you need the svn utility to check out sources. Generating documentation using doxygen To use doxygen to generate documentation of the sources, you perform three steps. To use doxygen to generate documentation of the sources, you perform three steps. Generate the configuration file At a shell prompt, type the command doxygen -g At a shell prompt, type the command doxygen -g doxygen -g . This command generates a text-editable configuration file called Doxyfile in the current directory. You can choose to override this file name, in which case the invocation should be doxygen -g <_user-specified file="file" name_="name_"> doxygen -g <user-specified file name> , as shown in Listing 2 . ##### Listing 2. Generate the default configuration file  bash‑2.05b$ doxygen ‑g
Configuration file 'Doxyfile' created.
Now edit the configuration file and enter
doxygen Doxyfile
to generate the documentation for your project
bash‑2.05b\$ ls Doxyfile
Doxyfile

Show more Edit the configuration file The configuration file is structured as The configuration file is structured as <TAGNAME> = <VALUE> , similar to the Make file format. Here are the most important tags:
• <OUTPUT_DIRECTORY> : You must provide a directory name here -- for example, /home/user1/documentation -- for the directory in which the generated documentation files will reside. If you provide a nonexistent directory name, doxygen creates the directory subject to proper user permissions.
• <INPUT> : This tag creates a space-separated list of all the directories in which the C/C++ source and header files reside whose documentation is to be generated. For example, consider the following snippet:
INPUT = /home/user1/project/kernel /home/user1/project/memory

Show more In this case, doxygen would read in the C/C++ sources from these two directories. If your project has a single source root directory with multiple sub-directories, specify that folder and make the <RECURSIVE> tag Yes .
• <FILE_PATTERNS> : By default, doxygen searches for files with typical C/C++ extensions such as .c, .cc, .cpp, .h, and .hpp. This happens when the <FILE_PATTERNS> tag has no value associated with it. If the sources use different naming conventions, update this tag accordingly. For example, if a project convention is to use .c86 as a C file extension, add this to the <FILE_PATTERNS> tag.
• <RECURSIVE> : Set this tag to Yes if the source hierarchy is nested and you need to generate documentation for C/C++ files at all hierarchy levels. For example, consider the root-level source hierarchy /home/user1/project/kernel, which has multiple sub-directories such as /home/user1/project/kernel/vmm and /home/user1/project/kernel/asm. If this tag is set to Yes , doxygen recursively traverses the hierarchy, extracting information.
• <EXTRACT_ALL> : This tag is an indicator to doxygen to extract documentation even when the individual classes or functions are undocumented. You must set this tag to Yes .
• <EXTRACT_PRIVATE> : Set this tag to Yes . Otherwise, private data members of a class would not be included in the documentation.
• <EXTRACT_STATIC> : Set this tag to Yes . Otherwise, static members of a file (both functions and variables) would not be included in the documentation.
Listing 3 shows an example of a Doxyfile.
##### Listing 3. Sample doxyfile with user-provided tag values

OUTPUT_DIRECTORY = /home/user1/docs
EXTRACT_ALL = yes
EXTRACT_PRIVATE = yes
EXTRACT_STATIC = yes
INPUT = /home/user1/project/kernel
#Do not add anything here unless you need to. Doxygen already covers all
#common formats like .c/.cc/.cxx/.c++/.cpp/.inl/.h/.hpp
FILE_PATTERNS =
RECURSIVE = yes

Show more Run doxygen Run doxygen in the shell prompt as Run doxygen in the shell prompt as doxygen Doxyfile (or with whatever file name you've chosen for the configuration file). Doxygen issues several messages before it finally produces the documentation in Hypertext Markup Language (HTML) and Latex formats (the default). In the folder that the <OUTPUT_DIRECTORY> tag specifies, two sub-folders named html and latex are created as part of the documentation-generation process. Listing 4 shows a sample doxygen run log.
##### Listing 4. Sample log output from doxygen

Searching for include files...
Searching for example files...
Searching for images...
Searching for dot files...
Searching for files to exclude
Preprocessing /home/user1/project/kernel/kernel.h

Parsing input...
Parsing file /project/user1/project/kernel/epico.cxx

Freeing input...
Building group list...
..
Generating docs for compound MemoryManager::ProcessSpec

Generating docs for namespace std
Generating group index...
Generating example index...
Generating file member index...
Generating namespace member index...
Generating page index...
Generating graph info page...
Generating search index...
Generating style sheet...

Show more Documentation output formats Doxygen can generate documentation in several output formats other than HTML. You can configure doxygen to produce documentation in the following formats: Doxygen can generate documentation in several output formats other than HTML. You can configure doxygen to produce documentation in the following formats:
• UNIX man pages: Set the <GENERATE_MAN> tag to Yes . By default, a sub-folder named man is created within the directory provided using <OUTPUT_DIRECTORY> , and the documentation is generated inside the folder. You must add this folder to the MANPATH environment variable.
• Rich Text Format (RTF): Set the <GENERATE_RTF> tag to Yes . Set the <RTF_OUTPUT> to wherever you want the .rtf files to be generated -- by default, the documentation is within a sub-folder named rtf within the OUTPUT_DIRECTORY. For browsing across documents, set the <RTF_HYPERLINKS> tag to Yes . If set, the generated .rtf files contain links for cross-browsing.
• Latex: By default, doxygen generates documentation in Latex and HTML formats. The <GENERATE_LATEX> tag is set to Yes in the default Doxyfile. Also, the <LATEX_OUTPUT> tag is set to Latex, which implies that a folder named latex would be generated inside OUTPUT_DIRECTORY, where the Latex files would reside.
• Microsoft® Compiled HTML Help (CHM) format: Set the <GENERATE_HTMLHELP> tag to Yes . Because this format is not supported on UNIX platforms, doxygen would only generate a file named index.hhp in the same folder in which it keeps the HTML files. You must feed this file to the HTML help compiler for actual generation of the .chm file.
• Extensible Markup Language (XML) format: Set the <GENERATE_XML> tag to Yes . (Note that the XML output is still a work in progress for the doxygen team.)
Listing 5 provides an example of a Doxyfile that generates documentation in all the formats discussed.
##### Listing 5. Doxyfile with tags for generating documentation in several formats

#for HTML
GENERATE_HTML = YES
HTML_FILE_EXTENSION = .htm

#for CHM files
GENERATE_HTMLHELP = YES

#for Latex output
GENERATE_LATEX = YES
LATEX_OUTPUT = latex

#for RTF
GENERATE_RTF = YES
RTF_OUTPUT = rtf

#for MAN pages
GENERATE_MAN = YES
MAN_OUTPUT = man
#for XML
GENERATE_XML = YES

Show more Special tags in doxygen Doxygen contains a couple of special tags. Doxygen contains a couple of special tags. Preprocessing C/C++ code First, doxygen must preprocess First, doxygen must preprocess C/C++ code to extract information. By default, however, it does only partial preprocessing -- conditional compilation statements ( #if #endif ) are evaluated, but macro expansions are not performed. Consider the code in Listing 6 .
##### Listing 6. Sample C code that makes use of macros

#include <cstring>
#include <rope>

#define USE_ROPE

#ifdef USE_ROPE
#define STRING std::rope
#else
#define STRING std::string
#endif

static STRING name;

Show more With With With With <USE_ROPE> defined in sources, generated documentation from doxygen looks like this:
Defines
#define USE_ROPE
#define STRING std::rope

Variables
static STRING name

Show more Here, you see that doxygen has performed a conditional compilation but has not done a macro expansion of Here, you see that doxygen has performed a conditional compilation but has not done a macro expansion of Here, you see that doxygen has performed a conditional compilation but has not done a macro expansion of Here, you see that doxygen has performed a conditional compilation but has not done a macro expansion of STRING . The <ENABLE_PREPROCESSING> tag in the Doxyfile is set by default to Yes . To allow for macro expansions, also set the <MACRO_EXPANSION> tag to Yes . Doing so produces this output from doxygen:
Defines
#define USE_ROPE
#define STRING std::string

Variables
static std::rope name

Show more If you set the If you set the If you set the If you set the <ENABLE_PREPROCESSING> tag to No , the output from doxygen for the earlier sources looks like this:
Variables
static STRING name

Show more Note that the documentation now has no definitions, and it is not possible to deduce the type of Note that the documentation now has no definitions, and it is not possible to deduce the type of Note that the documentation now has no definitions, and it is not possible to deduce the type of Note that the documentation now has no definitions, and it is not possible to deduce the type of STRING . It thus makes sense always to set the <ENABLE_PREPROCESSING> tag to Yes . As part of the documentation, it might be desirable to expand only specific macros. For such purposes, along setting As part of the documentation, it might be desirable to expand only specific macros. For such purposes, along setting As part of the documentation, it might be desirable to expand only specific macros. For such purposes, along setting <ENABLE_PREPROCESSING> and <MACRO_EXPANSION> to Yes , you must set the <EXPAND_ONLY_PREDEF> tag to Yes (this tag is set to No by default) and provide the macro details as part of the <PREDEFINED> or <EXPAND_AS_DEFINED> tag. Consider the code in Listing 7 , where only the macro  CONTAINER would be expanded.
##### Listing 7. C source with multiple macros

#ifdef USE_ROPE
#define STRING std::rope
#else
#define STRING std::string
#endif

#if ALLOW_RANDOM_ACCESS == 1
#define CONTAINER std::vector
#else
#define CONTAINER std::list
#endif

static STRING name;
static CONTAINER gList;

##### Listing 8. Doxyfile set to allow select macro expansions

ENABLE_PREPROCESSING = YES
MACRO_EXPANSION = YES
EXPAND_ONLY_PREDEF = YES
EXPAND_AS_DEFINED = CONTAINER


Show more Here's the doxygen output with only Here's the doxygen output with only Here's the doxygen output with only Here's the doxygen output with only CONTAINER expanded:
Defines
#define STRING   std::string
#define CONTAINER   std::list

Variables
static STRING name
static std::list gList

Show more Notice that only the Notice that only the Notice that only the Notice that only the CONTAINER macro has been expanded. Subject to <MACRO_EXPANSION> and <EXPAND_AS_DEFINED> both being Yes , the <EXPAND_AS_DEFINED> tag selectively expands only those macros listed on the right-hand side of the equality operator. As part of preprocessing, the final tag to note is As part of preprocessing, the final tag to note is As part of preprocessing, the final tag to note is <PREDEFINED> . Much like the same way you use the -D switch to pass the G++ compiler preprocessor definitions, you use this tag to define macros. Consider the Doxyfile in Listing 9 .
##### Listing 9. Doxyfile with macro expansion tags defined

ENABLE_PREPROCESSING = YES
MACRO_EXPANSION = YES
EXPAND_ONLY_PREDEF = YES
EXPAND_AS_DEFINED =
PREDEFINED = USE_ROPE= \
ALLOW_RANDOM_ACCESS=1

Show more Here's the doxygen-generated output: Here's the doxygen-generated output: Here's the doxygen-generated output: Here's the doxygen-generated output:
Defines
#define USE_CROPE
#define STRING   std::rope
#define CONTAINER   std::vector

Variables
static std::rope name
static std::vector gList

Show more When used with the When used with the When used with the When used with the <PREDEFINED> tag, macros should be defined as <_macro name_="name_">=<_value_> <macro name>=<value> . If no value is provided -- as in the case of simple #define -- just using <_macro name_="name_">=<_spaces_> <macro name>=<spaces> suffices. Separate multiple macro definitions by spaces or a backslash ( \ ). Excluding specific files or directories from the documentation process In the In the <EXCLUDE> tag in the Doxyfile, add the names of the files and directories for which documentation should not be generated separated by spaces. This comes in handy when the root of the source hierarchy is provided and some sub-directories must be skipped. For example, if the root of the hierarchy is src_root and you want to skip the examples/ and test/memoryleaks folders from the documentation process, the Doxyfile should look like Listing 10 .
##### Listing 10. Using the EXCLUDE tag as part of the Doxyfile

INPUT = /home/user1/src_root
EXCLUDE = /home/user1/src_root/examples /home/user1/src_root/test/memoryleaks


Show more Generating graphs and diagrams By default, the Doxyfile has the By default, the Doxyfile has the <CLASS_DIAGRAMS> tag set to Yes . This tag is used for generation of class hierarchy diagrams. The following tags in the Doxyfile deal with generating diagrams:
• <CLASS_DIAGRAMS> : The default tag is set to Yes in the Doxyfile. If the tag is set to No , diagrams for inheritance hierarchy would not be generated.
• <HAVE_DOT> : If this tag is set to Yes , doxygen uses the dot tool to generate more powerful graphs, such as collaboration diagrams that help you understand individual class members and their data structures. Note that if this tag is set to Yes , the effect of the <CLASS_DIAGRAMS> tag is nullified.
• <CLASS_GRAPH> : If the <HAVE_DOT> tag is set to Yes along with this tag, the inheritance hierarchy diagrams are generated using the dot tool and have a richer look and feel than what you'd get by using only  <CLASS_DIAGRAMS> .
• <COLLABORATION_GRAPH> : If the <HAVE_DOT> tag is set to Yes along with this tag, doxygen generates a collaboration diagram (apart from an inheritance diagram) that shows the individual class members (that is, containment) and their inheritance hierarchy.
Listing 11 provides an example using a few data structures. Note that the <HAVE_DOT> , <CLASS_GRAPH> , and <COLLABORATION_GRAPH> tags are all set to Yes in the configuration file.
##### Listing 11. Interacting C classes and structures

struct D {
int d;
};

class A {
int a;
};

class B : public A {
int b;
};

class C : public B {
int c;
D d;
};

##### Figure 1. The Class inheritance graph and collaboration graph generated using the dot tool
Code documentation style So far, you've used doxygen to extract information from code that is otherwise undocumented. However, doxygen also advocates documentation style and syntax, which helps it generate more detailed documentation. This section discusses some of the more common tags doxygen advocates using as part of So far, you've used doxygen to extract information from code that is otherwise undocumented. However, doxygen also advocates documentation style and syntax, which helps it generate more detailed documentation. This section discusses some of the more common tags doxygen advocates using as part of C/C++ code. For further details, see resources on the right. Every code item has two kinds of descriptions: one brief and one detailed. Brief descriptions are typically single lines. Functions and class methods have a third kind of description known as the Every code item has two kinds of descriptions: one brief and one detailed. Brief descriptions are typically single lines. Functions and class methods have a third kind of description known as the Every code item has two kinds of descriptions: one brief and one detailed. Brief descriptions are typically single lines. Functions and class methods have a third kind of description known as the in-body description, which is a concatenation of all comment blocks found within the function body. Some of the more common doxygen tags and styles of commenting are:
• Brief description: Use a single-line C++ comment, or use the <\brief> tag.
• Detailed description: Use JavaDoc-style commenting /** test */ (note the two asterisks [ * ] in the beginning) or the Qt-style /*! text */ .
• In-body description: Individual C++ elements like classes, structures, unions, and namespaces have their own tags, such as <\class> , <\struct> , <\union> , and <\namespace> .
To document global functions, variables, and enum types, the corresponding file must first be documented using the To document global functions, variables, and enum types, the corresponding file must first be documented using the <\file> tag. Listing 12 provides an example that discusses item 4 with a function tag ( <\fn> ), a function argument tag ( <\param> ), a variable name tag (  <\var> ), a tag for #define ( <\def> ), and a tag to indicate some specific issues related to a code snippet ( <\warning> ).
##### Listing 12. Typical doxygen tags and their use

/∗! \file globaldecls.h
\brief Place to look for global variables, enums, functions
and macro definitions
∗/

/∗∗ \var const int fileSize
\brief Default size of the file on disk
∗/
const int fileSize = 1048576;

/∗∗ \def SHIFT(value, length)
\brief Left shift value by length in bits
∗/
#define SHIFT(value, length) ((value) << (length))

/∗∗ \fn bool check_for_io_errors(FILE∗ fp)
\brief Checks if a file is corrupted or not
\param fp Pointer to an already opened file
∗/
bool check_for_io_errors(FILE∗ fp);


Here's how the generated documentation looks:

Defines
#define SHIFT(value, length)   ((value) << (length))
Left shift value by length in bits.

Functions
bool check_for_io_errors (FILE ∗fp)
Checks if a file is corrupted or not.

Variables
const int fileSize = 1048576;
Function Documentation
bool check_for_io_errors (FILE∗ fp)
Checks if a file is corrupted or not.

Parameters
fp: Pointer to an already opened file

Warning


This article discusses how doxygen can extract a lot of relevant information from legacy C/C++ code. If the code is documented using doxygen tags, doxygen generates output in an easy-to-read format. Put to good use, doxygen is a ripe candidate in any developer's arsenal for maintaining and managing legacy systems.

#### Welcome to IEEE Xplore 2.0 Structural Epochs in the Complexity of Software over Time

A case study using a new complexity measurement framework called Structure 101 tracked the structural complexity of three open source software products through their different releases. The analysis found that, as these software products evolved, a large proportion of structural complexity in early releases at the application-code level progressively migrated to higher-level design and architectural elements in subsequent releases, or vice-versa. This pattern repeated itself throughout the evolution of the software product. Refactoring efforts successfully reduced complexity at lower levels, but shifted the complexity to higher levels in the design hierarchy. Conversely, design restructuring at higher levels shifted complexity to lower levels. If this trend holds true for other software products, then mere code refactoring might not be enough to effectively managing structural complexity. Periodic major restructuring of software applications at the design or architectural level could be necessary.

#### Software Renovation by Arie van Deursen

In 1976, Belady and Lehman formulated their 'Laws of Program Evolution Dynamics'. First, a software system that is used will undergo continuous modification. Second, the unstructuredness (entropy) of a system increases with time, unless specific work is done to improve the system's structure. This activity of improving legacy software systems is called system renovation. It aims at making existing systems more comprehensible, extensible, robust and reusable.

Due to the fact that a typical industrial or governmental organization has millions of lines of legacy code in continuous maintenance, well-applied software renovation can lead to significant information technology budget savings. For that reason, in 1996 Dutch bank ABN AMRO and Dutch software house Roccade commissioned a renovation research project. The research was carried out by CWI, the University of Amsterdam, and ID Research. The goals of the project included the development of a generic renovation architecture, as well as application of this architecture to actual renovation problems.

Of the various facets of software renovation - such as visualization, database analysis, domain knowledge, and so on - an enabling factor is the analysis and transformation of legacy sources. Since such source code analysis has much in common with compilation (in which sources are analyzed with the purpose of translating them into assembly code), many results from the area of programming language technology could be reused. Of great significance for software renovation are, for example, lexical source code analysis, parsing, dataflow analysis, type inference, etc.

Program Transformations

Software renovation at the source code level includes automated program transformations for the purpose of step-by-step code improvement. In this project, we successfully applied transformations to COBOL programs, dealing with goto elimination, dialect migration (between COBOL-85 and COBOL-74) and modifications in the conventions for calling library utilities.

To make this possible, we developed a COBOL grammar, instantiated the ASF+SDF Meta-Environment with this grammar to obtain a COBOL parser and pretty printer, and designed term rewriting rules describing the desired transformations. The resulting system is capable of automatically performing the desired transformations on hundreds of thousands of lines of code, yielding a fully automatic transformation factory.

#### Origin Tracking and Software Renovation

Legacy systems are software systems that resist change. Software renovation is an activity aiming at improving legacy systems such that become more adaptable, or at actually carrying out required mass modifications. A typical renovation is the year-2000 problem. Tools for carrying out year-2000 conversions look for initial date infections (seeds, such as suspicious keywords), propagate these through MOVEs and CALLs, try to reduce the number of infections found, and then (semi)-automatically modify the code using a widening or windowing approach.

Of great importance for year-2000 conversions and software renovation is an accurate data flow analysis tool that can be easily connected to all source languages used in the system to be renovated. In the context of the ASF+SDF formalism, the DHAL Data Flow High Abstraction Language has been proposed. Languages are easily mapped to DHAL and on top of DHAL several elementary data flow operations such as goto elimination and alias propagation have been defined.

Origin tracking is a general technique concerned with linking back analysis results, obtained for example from DHAL operations to the original source code. For transformations expressed in a functional style (using, e.g., term rewriting), origin information can be maintained automatically. For each reduction, origin annotations in the reduct are constructed in a way that depends on the form of the rewrite rule applied. We discuss several approaches (syntax-directed, common subterms, collapse-variables, any-to-top, non-linear rules), as well as their use in typical specifications occurring in a renovation setting.

#### Scaffolding for Software Renovation

University of Amsterdam

We discuss an approach that explores the use of scaffolding of source code to facilitate its renovation. We show that scaffolding is a useful paradigm for software renovation. We designed syntax and semantics for scaffolding, that enables all relevant applications of scaffolding.

The automatic generation of extensions to a normal grammar, so that the resulting extension grammar can parse code with scaffolding, is discussed. We used the scaffolding paradigm itself to implement the generation process, thereby showing that our approach towards scaffolding is also useful in software development. Finally, we discuss real-world applications of scaffolding for software renovation, in both our own work and work from people in the reengineering IT industry.

Keywords: Reengineering, System renovation, Software renovation factories, Language description development, Grammar reengineering, Scaffolding, Computer aided language engineering (CALE)

Proceedings of the Conference on Software Maintenance and Reengineering

#### cc2e.com/2436

Contents

Related Topics

Myth: a well-managed software project conducts methodical requirements development and defines a stable list of the program's responsibilities. Design follows requirements, and it is done carefully so that coding can proceed linearly, from start to finish, implying that most of the code can be written once, tested, and forgotten. According to the myth, the only time that the code is significantly modified is during the software-maintenance phase, something that happens only after the initial version of a system has been delivered.

All successful software gets changed.

-Fred Brooks

Reality: code evolves substantially during its initial development. Many of the changes seen during initial coding are at least as dramatic as changes seen during maintenance. Coding, debugging, and unit testing consume between 30 to 65 percent of the effort on a typical project, depending on the project's size. (See Chapter 27, "How Program Size Affects Construction," for details.) If coding and unit testing were straightforward processes, they would consume no more than 20–30 percent of the total effort on a project. Even on well-managed projects, however, requirements change by about one to four percent per month (Jones 2000). Requirements changes invariably cause corresponding code changes-sometimes substantial code changes.

Another reality: modern development practices increase the potential for code changes during construction. In older life cycles, the focus-successful or not-was on avoiding code changes. More modern approaches move away from coding predictability. Current approaches are more code-centered, and over the life of a project, you can expect code to evolve more than ever.

### Sites

Learning doxygen for source code documentation – IBM Developer By Arpan Sen; July 29, 2008

Code ported from one to another language - licensing - Stack Overflow

Porting - Wikipedia

## Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

 You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.