"Without wanting to be
elitist, the thing that will prevent literate programming from
becoming a mainstream method is that it requires thought and
discipline. The mainstream is established by people
who want fast results while using roughly the same methods that
everyone else seems to be using, and literate programming is
never going to have that kind of appeal. This doesn't take away
from its usefulness as an approach."
Patrick TJ McPhee
The idea of literate programming is an idea of
content management applied
to program sources. It was proposed by Donald Knuth in 1984 in his article
Donald Knuth.
Literate Programming published in Computer Journal (British computer
society publication).
While the term got some traction, unfortunately the idea itself cannot
be completely counted as one of Donald Knuth successes. Unlike TAOCP or
TeX it never caught up and had a rather cool initial reception. But that
does not mean that the idea was/is without merits and later some components
of it became standard part of every decent IDE.
But the whole idea of literate programming was three key components:
The first is essentially an invention of benefits of hypertext
representation of program and its documentation for software writing.
As originally conceived by Don Knuth, literate programming involves
prettyprinting code: displaying it using several fonts with
proper nesting and systematic line breaks. It probably was inspired
by the ``publication syntax'' of Algol 60. Not it is easy to covert
any program test to HTML as there converters almost for any language
in existence. As such HTML is a better markup language then TeX but
we need to remember that at this time when Donald Knuth wrote his article
HTML did not exist (This idea was described in his paper in Computer
Journal, 1984). TeX was created using this approach and simultaneously
used in a bootstrap fashion but several key notions connected with the
idea of hypertext.
The philosophy behind WEB is that an experienced system programmer,
who wants to provide the best possible documentation of his or her
software products, needs two things simultaneously: a language like
TeX for formatting, and a language like C for programming. Neither
type of language can provide the best documentation by itself; but
when both are appropriately combined, we obtain a system that is
much more useful than either language separately.
The structure of a software program
may be thought of as a web that is made up of many interconnected
pieces. To document such a program we want to explain
each individual part of the web and how it relates to its neighbors.
The typographic tools provided by TeX give us an opportunity to
explain the local structure of each part by making that structure
visible, and the programming tools provided by languages such as
C or Fortran make it possible for us to specify the algorithms formally
and unambiguously. By combining the two,
we can develop a style of programming that maximizes our ability
to perceive the structure of a complex piece of software,
and at the same time the documented programs can be mechanically
translated into a working software system that matches the documentation.
Now it is simpler both to discuss and implement ideas of literate
programming in the HTML context as the latter is now dominant markup
language. It is actually a historical accident that the markup language
for Web was created on the base of SGML and not on the base of
TeX. But right now HTML rules and Web server can be the cornerstone
of an implementation of a literate programming platform. Most
utilities and www browsers will convert HTML back to plain text, for
example the Linemode browser or Lynx:
Netscape and Internet Explorer have the ability "save as" plain text
any WEB page.
The second important idea is the view of program writing as a
new type of literate work. The key finding is that writing
documentation/notes along with program improves quality of both, often
dramatically. An ideal program, Knuth used to say, can be read by the
fireside, like good prose. This idea was field tested by Knuth himself
while writing TeX and was first described in his paper in Computer Journal,
1984. Knuth essentially reiterated old maxim that
the very act of communicating one's work
clearly to other people will improve the work itself.
The key idea is that there are more symmetric relationships between
program and documentation and such classic features as folding and outlining
are very useful in working with program code. Attempts to view a program
as a book were not new and isolated components of Knuth vision were
refined long before TeX. For example the whole
XPL Language compiler
was documented in the book A Compiler Generator by McKeeman,
Horning and Wortman, published by Prentice-Hall, 1970, ISBN 13-155077-2.
See also Orthodox Editors Page.
What was new is the idea of the tools that can make such method of writing
of program more smooth and efficient.
The third idea is that additional representations of program like
cross reference table has tremendous effect on minimizing initial number
of errors in the program and as such make debugging less labor intensive.
Knuth did not understand the usefulness of slicing, hypertext language
reference and code fragments libraries but those are natural extensions
of his approach.
Instead of writing code containing documentation, the literate programming
suggest writing documentation containing code. Knuth indicated that
he chose the name "literate programming" in part to contrast with "structured
programming", which was the fashion of the time and which he apparently
felt pointed programmers in the completely wrong direction (and he was 100%
right on this; now nobody even remember all this fundamentalist ramblings,
only positive things like enhanced control structures survived the test
of the time from all structured programming blah-blah-blah ;-)
The very act of communicating
one's work clearly to other people will improve the work itself
In his later book on the topic [ pg. 99.] Knuth stressed
the importance of writing programs and documentation
as a single interrelated process not as a two separate processes.
I believe that the time is ripe for significantly
better documentation of programs, and that we can best achieve this
by considering programs to be works of literature. Hence,
my title: "Literate Programming."
Let us change our traditional attitude to the construction
of programs: Instead of imagining that our main task is to instruct
a computer what to do, let us concentrate rather on explaining to human
beings what we want a computer to do.
The practitioner of literate programming can
be regarded as an essayist, whose main concern is with exposition and
excellence of style. Such an author, with thesaurus in hand,
chooses the names of variables carefully and explains what each variable
means. He or she strives for a program that is comprehensible
because its concepts have been introduced in an order that is best for
human understanding, using a mixture of formal and informal methods
that reinforce each other.
Now after so many years and after Web and HTML became firmly entrenched
we can reformulate the idea of literate programming in WWW terms. Actually
usage of TEX while a tremendous step forward is not optimal for literate
programming and actually negatively affected subsequent acceptance of "literate
programming" as a technology. In WEB terms we can view literate programming
as a certain specialized wiki framework with several distinct features.
Sections of Wiki which represent code are automatically converted
into "neat" format using pretty printing and syntax highlighting for
program source (this is already an old hat; typographical niceties
that now became pretty much standard in any programming environment
GUI).
Documentation sections of the program can hyperlink with code
sections and cross-reference table.
Automatic code extraction with or without documentation sections
and submitting the resulting text file to compiler and interpreter.
This should be completely automatic (BTW that is achievable in many
modern HTML editors, including FrontPage, Dreamweaver, etc). HTML
provides server side which can be used to include program fragments
into a composite document.
XREF tables as an important part of programming environment
(currently the best way to generate then is to use the editor with
pipe execution capabilities like SlickEdit of vim, or generate then
into a separate window in the browser). Various class browsers were
developed for partial symbol table generation.
Incorporation of outlining and slicing into programming environment
(extraction of documentation of code from a mixed document is a special
case of outlining)
Availability of blog-type sections that can document the progress
of the work.
All sub-technologies that are linked under the umbrella of literate programming
are pretty well known and used by programmers for a long time. But
nobody managed to link them together into a coherent meta-technology and
style of programming before Knuth. For example cross-reference tools were
a part of any good programmer toolset from early 60th. Pretty printing of
programs also comes from early 60th. Syntax highlighting in pretty printing
was almost as old as pretty printing itself and reappeared in editors with
the introduction of color displays. I also know that in early 70th
many programmers used document editors like MS Word with its outstanding
outlining capabilities instead of a programming editor with considerable
success. Orthodox editors
like Kedit and SlickEdit are close to this approach
due to support of programmable folding.
But at the same time, while serving as a integration point for previously
isolated technologies, literate programming create a qualitatively
new paradigm of program development. At the same it allowed for further
development of each of the underling technologies in new directions. For
example software visualization is much broader and much more complex
subject that just TeX based program representation. What is
really important is the viewing program as a literate work, book or article,
not just an ability to manipulate the program in various ways that increases
its understanding.
While serving as a integration point for previously isolated
technologies, literate programming create a qualitatively
new paradigm of program development. At the same it allowed
for further development of each of the underling technologies
in new directions. For example software visualization is much
broader and much more complex subject that just TeX based
program representation. What is really important
is the viewing program as a literate work, book or article,
not just an ability to manipulate the program in various ways
that increases its understanding.
XREF tools, syntax highlighting and outlining are just three facets of
this complex problem. The "missing link" (integration of all three
into Wiki style environment) probably can explain rather cool acceptance
of the idea. Also usage of HTML is preferable to usage of TeX. TeX
proved to be not a very flexible way to develop a complex system and as
such it does not provide any significant advantages over will developed
IDE with the elaborate code browser and project management tools like Visual
Studio or Eclipse
Now with HTML widely used and wiki technologies available it's might
be a good time to take a second look on the initial ideas and re-implement
most sound concepts in a new way with much better integration and flexibility
then TeX permits. The great advantage of using HTML is simplicity
and availability of excellent html editors like
FrontPage. The real question is
how to integrate cross references, indices, outlining and "syntax highlighted"
sources in an attractive, flexible system.
Key Problems with Literate Programming
Literate programming is not without problems and this might explain low
level of adoption of this technology.
There is no silver bullet in program understanding/program writing
and like any technology "book-style" representation is better for some
purposes and worse for others.
The key problem with literary programming is that it is a static
representation. Understanding (and writing) of the program requires
flexible dynamic representation. Also generation of program text from
the markup representation creates the classic problem of two texts although
it is less severe in comparison with problems that arise in macro substitution
and can be amended by usage of Wiki style environment where access to
underling representation is available only for editing.
Also omitted from the concept of literary programming the idea of
folding and outlining which are two powerful tool that simplify writing
of complex documents and programs.
Generating XREF tables is only one approach to the visibility of
variables in the program. There are many other. Again, the programmer
can benefit from many views of his variables not just a single table.
Typical SQL-style queries might be useful .
Most published examples of literary programming are pretty dull and
actually more discredit then attract people to the technology.
Notes:
This is a Spartan WHYFF (We Help
You For Free) site written by people for whom English
is not a native language.
Some amount of grammar and spelling errors should be
expected.
The site contain some broken links
as it develops like a living tree...
Please try to use Google, Open directory,
etc. to find a replacement link (see
HOWTO search the WEB for details). We would appreciate
if you can
mail us a correct link.
To provide an insight into the quality of software that is available,
we have compiled a list of 6 advanced Linux documentation generators.
Hopefully, there will be something of interest here for anyone who wants
to generate documentation.
Now, let's explore the 6 documentation generators at hand. For each
title we have compiled its own portal page, a full description with
an in-depth analysis of its features, together with links to relevant
resources and reviews.
About: GNU Source-highlight produces a document with syntax
highlighting when given a source file. It handles many languages, e.g.,
Java, C/C++, Prolog, Perl, PHP3, Python, Flex, HTML, and other formats,
e.g., ChangeLog and log files, as source languages and HTML, XHTML,
DocBook, ANSI color escapes, LaTeX, and Texinfo as output formats. Input
and output formats can be specified with a regular expression-oriented
syntax.
Changes: New language definitions were added for autoconf,
LDAP, and glsl files. Anchors and references are correctly formatted.
Several language definitions were improved.
Highlight is a universal converter from source code to HTML, XHTML,
RTF, TeX, LaTeX, and XML. (X)HTML output is formatted by Cascading Style
Sheets. It supports more than 100 programming languages, and includes
40 highlighting color themes. It's possible to easily enhance the parsing
database. The converter includes some features to provide a consistent
layout of the input code.
Andrew Binstock and Donald Knuth converse on the success of open
source, the problem with multicore architecture, the disappointing lack
of interest in literate programming, the menace of reusable code, and
that urban legend about winning a programming contest with a single
compilation.
Andrew Binstock: You are one of the fathers of the open-source
revolution, even if you aren’t widely heralded as such. You previously
have stated that you released
TeX as open source because of the problem of proprietary
implementations at the time, and to invite corrections to the code—both
of which are key drivers for open-source projects today. Have you been
surprised by the success of open source since that time?
Donald Knuth: The success of open source code is perhaps the
only thing in the computer field that hasn’t surprised me during
the past several decades. But it still hasn’t reached its full potential;
I believe that open-source programs will
begin to be completely dominant as the economy moves more and more from
products towards services, and as more and more volunteers arise to
improve the code.
For example, open-source code can produce
thousands of binaries, tuned perfectly to the configurations of individual
users, whereas commercial software usually will exist in only a few
versions. A generic binary executable file must include
things like inefficient "sync" instructions that are totally inappropriate
for many installations; such wastage goes away when the source code
is highly configurable. This should be a huge win for open source.
Yet I think that a few programs, such as Adobe Photoshop, will always
be superior to competitors like the Gimp—for some reason, I really don’t
know why! I’m quite willing to pay good
money for really good software,
if I believe that it has been produced by the best programmers.
Remember, though, that my opinion on economic questions is highly
suspect, since I’m just an educator and scientist. I understand almost
nothing about the marketplace.
Andrew: A story states that you once entered a programming
contest at Stanford (I believe) and you submitted the winning entry,
which worked correctly after a single compilation. Is this
story true? In that vein, today’s developers frequently build programs
writing small code increments followed by immediate compilation and
the creation and running of unit tests. What are your thoughts on this
approach to software development?
Donald: The story you heard is typical of legends that are
based on only a small kernel of truth. Here’s what actually happened:
John McCarthy decided in 1971 to have a Memorial Day Programming
Race. All of the contestants except me worked at his AI Lab up in the
hills above Stanford, using the WAITS time-sharing system; I was down
on the main campus, where the only computer available to me was a mainframe
for which I had to punch cards and submit them for processing in batch
mode. I used Wirth’s
ALGOL W system (the predecessor of Pascal). My program didn’t
work the first time, but fortunately I could use Ed Satterthwaite’s
excellent offline debugging system for ALGOL W, so I needed only two
runs. Meanwhile, the folks using WAITS couldn’t get enough machine cycles
because their machine was so overloaded. (I think that the second-place
finisher, using that "modern" approach, came in about an hour after
I had submitted the winning entry with old-fangled methods.) It wasn’t
a fair contest.
As to your real question, the idea of immediate compilation and "unit
tests" appeals to me only rarely, when I’m feeling my way in a totally
unknown environment and need feedback about what works and what doesn’t.
Otherwise, lots of time is wasted on activities
that I simply never need to perform or even think about. Nothing needs
to be "mocked up."
Andrew: One of the emerging problems for developers, especially
client-side developers, is changing their thinking to write programs
in terms of threads. This concern, driven by the advent of inexpensive
multicore PCs, surely will require that many algorithms be recast for
multithreading, or at least to be thread-safe. So far, much of the work
you’ve published for Volume 4 of
The Art of Computer Programming (TAOCP) doesn’t
seem to touch on this dimension. Do you expect to enter into problems
of concurrency and parallel programming in upcoming work, especially
since it would seem to be a natural fit with the combinatorial topics
you’re currently working on?
Donald: The field of combinatorial algorithms is so vast that
I’ll be lucky to pack its sequential aspects into three or
four physical volumes, and I don’t think the sequential methods are
ever going to be unimportant. Conversely, the half-life of parallel
techniques is very short, because hardware changes rapidly and each
new machine needs a somewhat different approach. So I decided long ago
to stick to what I know best. Other people understand parallel machines
much better than I do; programmers should listen to them, not me, for
guidance on how to deal with simultaneity.
Andrew: Vendors of multicore processors have expressed frustration
at the difficulty of moving developers to this model. As a former professor,
what thoughts do you have on this transition and how to make it happen?
Is it a question of proper tools, such as better native support for
concurrency in languages, or of execution frameworks? Or are there other
solutions?
Donald: I don’t want to duck your question entirely. I might
as well flame a bit about my personal unhappiness
with the current trend toward multicore architecture.
To me, it looks more or less like the hardware designers have
run out of ideas, and that they’re trying
to pass the blame for the future demise of Moore’s Law to the software
writers by giving us machines that work faster only on a few key benchmarks!
I won’t be surprised at all if the whole multithreading idea turns out
to be a flop, worse than the "Titanium"
approach that was supposed to be so terrific—until
it turned out that the wished-for compilers were basically impossible
to write.
Let me put it this way: During the past 50 years, I’ve written well
over a thousand programs, many of which have substantial size. I can’t
think of even five of those programs that would have been enhanced
noticeably by parallelism or multithreading. Surely, for example, multiple
processors are no help to TeX.[1]
How many programmers do you know who are enthusiastic about these
promised machines of the future? I hear
almost nothing but grief from software people, although the hardware
folks in our department assure me that I’m wrong.
I know that important applications for parallelism exist—rendering
graphics, breaking codes, scanning images, simulating physical and biological
processes, etc. But all these applications require dedicated code and
special-purpose techniques, which will need to be changed substantially
every few years.
Even if I knew enough about such methods to write about them in
TAOCP, my time would be largely wasted, because soon there
would be little reason for anybody to read those parts. (Similarly,
when I prepare the third edition of
Volume 3 I plan
to rip out much of the material about how to sort on magnetic tapes.
That stuff was once one of the hottest topics in the whole software
field, but now it largely wastes paper when the book is printed.)
The machine I use today has dual processors. I get to use them both
only when I’m running two independent jobs at the same time; that’s
nice, but it happens only a few minutes every week. If I had four processors,
or eight, or more, I still wouldn’t be any better off, considering the
kind of work I do—even though I’m using my computer almost every day
during most of the day. So why should I be so happy about the future
that hardware vendors promise? They think a magic bullet will come along
to make multicores speed up my kind of work; I think it’s a pipe dream.
(No—that’s the wrong metaphor! "Pipelines" actually work for me, but
threads don’t. Maybe the word I want is "bubble.")
From the opposite point of view, I do grant that web browsing probably
will get better with multicores. I’ve been talking about my technical
work, however, not recreation. I also admit that I haven’t got many
bright ideas about what I wish hardware designers would provide instead
of multicores, now that they’ve begun to hit a wall with respect to
sequential computation. (But my
MMIX
design contains several ideas that would substantially improve the current
performance of the kinds of programs that concern me most—at the cost
of incompatibility with legacy x86 programs.)
Andrew: One of the few projects of yours that hasn’t been
embraced by a widespread community is literate programming.
What are your thoughts about why literate programming didn’t catch on?
And is there anything you’d have done differently in retrospect regarding
literate programming?
Donald: Literate programming is a very personal thing. I think
it’s terrific, but that might well be because I’m a very strange person.
It has tens of thousands of fans, but not millions.
In my experience, software created with
literate programming has turned out to be significantly better than
software developed in more traditional ways. Yet ordinary
software is usually okay—I’d give it a grade of C (or maybe C++), but
not F; hence, the traditional methods stay with us. Since they’re understood
by a vast community of programmers, most people have no big incentive
to change, just as I’m not motivated to learn Esperanto even though
it might be preferable to English and German and French and Russian
(if everybody switched).
Jon Bentley
probably hit the nail on the head when he once was asked why literate
programming hasn’t taken the whole world by storm.
He observed that a small percentage of the
world’s population is good at programming, and a small percentage is
good at writing; apparently I am asking everybody to be in both subsets.
Yet to me, literate programming is certainly the most important thing
that came out of the TeX project. Not only has it enabled me to write
and maintain programs faster and more reliably than ever before, and
been one of my greatest sources of joy since the 1980s—it has actually
been indispensable at times. Some of my major programs, such
as the MMIX meta-simulator, could not have been written with any other
methodology that I’ve ever heard of. The complexity was simply too daunting
for my limited brain to handle; without literate programming, the whole
enterprise would have flopped miserably.
If people do discover nice ways to use the newfangled multithreaded
machines, I would expect the discovery to come from people who routinely
use literate programming. Literate programming
is what you need to rise above the ordinary level of achievement.
But I don’t believe in forcing ideas on anybody. If literate
programming isn’t your style, please forget it and do what you like.
If nobody likes it but me, let it die.
On a positive note, I’ve been pleased to discover that the conventions
of CWEB are already standard equipment within preinstalled software
such as Makefiles, when I get off-the-shelf Linux these days.
Andrew: In
Fascicle 1 of Volume 1, you reintroduced the MMIX computer,
which is the 64-bit upgrade to the venerable MIX machine comp-sci students
have come to know over many years. You previously described MMIX in
great detail in MMIXware.
I’ve read portions of both books, but can’t tell whether the Fascicle
updates or changes anything that appeared in MMIXware, or whether it’s
a pure synopsis. Could you clarify?
Donald: Volume 1 Fascicle 1 is a programmer’s introduction,
which includes instructive exercises and such things. The MMIXware book
is a detailed reference manual, somewhat terse and dry, plus a bunch
of literate programs that describe prototype software for people to
build upon. Both books define the same computer (once the errata to
MMIXware are incorporated from my website). For most readers of
TAOCP, the first fascicle contains everything about MMIX that they’ll
ever need or want to know.
I should point out, however, that MMIX isn’t a single machine; it’s
an architecture with almost unlimited varieties of implementations,
depending on different choices of functional units, different pipeline
configurations, different approaches to multiple-instruction-issue,
different ways to do branch prediction, different cache sizes, different
strategies for cache replacement, different bus speeds, etc. Some instructions
and/or registers can be emulated with software on "cheaper" versions
of the hardware. And so on. It’s a test bed, all simulatable with my
meta-simulator, even though advanced versions would be impossible to
build effectively until another five years go by (and then we could
ask for even further advances just by advancing the meta-simulator specs
another notch).
Suppose you want to know if five separate multiplier units and/or
three-way instruction issuing would speed up a given MMIX program. Or
maybe the instruction and/or data cache could be made larger or smaller
or more associative. Just fire up the meta-simulator and see what happens.
Andrew: As I suspect you don’t use unit testing with MMIXAL,
could you step me through how you go about making sure that your code
works correctly under a wide variety of conditions and inputs? If you
have a specific work routine around verification, could you describe
it?
Donald: Most examples of machine language code in TAOCP
appear in Volumes 1-3; by the time we get to Volume 4, such low-level
detail is largely unnecessary and we can work safely at a higher level
of abstraction. Thus, I’ve needed to write only a dozen or so MMIX programs
while preparing the opening parts of Volume 4, and they’re all pretty
much toy programs—nothing substantial. For little things like that,
I just use informal verification methods, based on the theory that I’ve
written up for the book, together with the MMIXAL assembler and MMIX
simulator that are readily available on the Net (and described in full
detail in the MMIXware book).
That simulator includes debugging features like the ones I found
so useful in Ed Satterthwaite’s system for ALGOL W, mentioned earlier.
I always feel quite confident after checking a program with those tools.
Andrew: Despite its formulation many years ago, TeX is still
thriving, primarily as the foundation for LaTeX. While
TeX has been effectively frozen at your request, are there features
that you would want to change or add to it, if you had the time and
bandwidth? If so, what are the major items you add/change?
Donald: I believe changes to TeX would cause much more harm
than good. Other people who want other features are creating their own
systems, and I’ve always encouraged further development—except that
nobody should give their program the same name as mine. I want to take
permanent responsibility for TeX and
Metafont, and for
all the nitty-gritty things that affect existing documents that rely
on my work, such as the precise dimensions of characters in the Computer
Modern fonts.
Andrew: One of the little-discussed aspects of software development
is how to do design work on software in a completely new domain. You
were faced with this issue when you undertook TeX: No prior art was
available to you as source code, and it was a domain in which you weren’t
an expert. How did you approach the design, and how long did it take
before you were comfortable entering into the coding portion?
Donald: That’s another good question! I’ve discussed the answer
in great detail in Chapter 10 of my book
Literate Programming, together with Chapters 1 and 2 of my book
Digital Typography. I think that anybody who is really interested
in this topic will enjoy reading those chapters. (See also Digital
Typography Chapters 24 and 25 for the complete first and second
drafts of my initial design of TeX in 1977.)
Andrew: The books on TeX and the program itself show a clear
concern for limiting memory usage—an important problem for systems of
that era. Today, the concern for memory usage in programs has more to
do with cache sizes. As someone who has designed a processor in software,
the issues of cache-aware and cache-oblivious
algorithms surely must have crossed your radar screen. Is
the role of processor caches on algorithm design something that you
expect to cover, even if indirectly, in your upcoming work?
Donald: I mentioned earlier that MMIX provides a test bed
for many varieties of cache. And it’s a software-implemented machine,
so we can perform experiments that will be repeatable even a hundred
years from now. Certainly the next editions of Volumes 1-3 will discuss
the behavior of various basic algorithms with respect to different cache
parameters.
In Volume 4 so far, I count about a dozen references to cache memory
and cache-friendly approaches (not to mention a "memo cache," which
is a different but related idea in software).
Andrew: What set of tools do you use today for writing
TAOCP? Do you use TeX? LaTeX? CWEB? Word processor? And what
do you use for the coding?
Donald: My general working style is to write everything first
with pencil and paper, sitting beside a big wastebasket. Then I use
Emacs to enter the text into my machine, using the conventions of TeX.
I use tex, dvips, and gv to see the results, which appear on my screen
almost instantaneously these days. I check my math with Mathematica.
I program every algorithm that’s discussed (so that I can thoroughly
understand it) using CWEB, which works splendidly with the GDB debugger.
I make the illustrations with
MetaPost (or, in
rare cases, on a Mac with Adobe Photoshop or Illustrator). I have some
homemade tools, like my own spell-checker for TeX and CWEB within Emacs.
I designed my own bitmap font for use with Emacs, because I hate the
way the ASCII apostrophe and the left open quote have morphed into independent
symbols that no longer match each other visually. I have special Emacs
modes to help me classify all the tens of thousands of papers and notes
in my files, and special Emacs keyboard shortcuts that make bookwriting
a little bit like playing an organ. I prefer
rxvt to xterm for terminal
input. Since last December, I’ve been using a file backup system called
backupfs, which
meets my need beautifully to archive the daily state of every file.
According to the current directories on my machine, I’ve written
68 different CWEB programs so far this year. There were about 100 in
2007, 90 in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has
an extremely convenient "change file" mechanism, with which I can rapidly
create multiple versions and variations on a theme; so far in 2008 I’ve
made 73 variations on those 68 themes. (Some of the variations are quite
short, only a few bytes; others are 5KB or more. Some of the CWEB programs
are quite substantial, like the 55-page BDD package that I completed
in January.) Thus, you can see how important literate programming is
in my life.
I currently use Ubuntu Linux,
on a standalone laptop—it has no Internet connection. I occasionally
carry flash memory drives between this machine and the Macs that I use
for network surfing and graphics; but I trust my family jewels only
to Linux. Incidentally, with Linux I much prefer the keyboard focus
that I can get with classic
FVWM to the GNOME and
KDE environments that other people seem to like better. To each his
own.
Andrew: You state in the preface of
Fascicle 0 of Volume 4 of TAOCP that Volume 4 surely
will comprise three volumes and possibly more. It’s clear from the text
that you’re really enjoying writing on this topic. Given that, what
is your confidence in the note posted on the TAOCP website
that Volume 5 will see light of day by 2015?
Donald: If you check the Wayback Machine for previous incarnations
of that web page, you will see that the number 2015 has not been constant.
You’re certainly correct that I’m having a ball writing up this material,
because I keep running into fascinating facts that simply can’t be left
out—even though more than half of my notes don’t make the final cut.
Precise time estimates are impossible, because I can’t tell until
getting deep into each section how much of the stuff in my files is
going to be really fundamental and how much of it is going to be irrelevant
to my book or too advanced. A lot of the recent literature is academic
one-upmanship of limited interest to me; authors these days often introduce
arcane methods that outperform the simpler techniques only when the
problem size exceeds the number of protons in the universe. Such algorithms
could never be important in a real computer application. I read hundreds
of such papers to see if they might contain nuggets for programmers,
but most of them wind up getting short shrift.
From a scheduling standpoint, all I know at present is that I must
someday digest a huge amount of material that I’ve been collecting and
filing for 45 years. I gain important time by working in batch mode:
I don’t read a paper in depth until I can deal with dozens of others
on the same topic during the same week. When I finally am ready to read
what has been collected about a topic, I might find out that I can zoom
ahead because most of it is eminently forgettable for my purposes. On
the other hand, I might discover that it’s fundamental and deserves
weeks of study; then I’d have to edit my website and push that number
2015 closer to infinity.
Andrew: In late 2006, you were diagnosed with prostate cancer.
How is your health today?
Donald: Naturally, the cancer will be a serious concern. I have superb
doctors. At the moment I feel as healthy as ever, modulo being 70 years
old. Words flow freely as I write TAOCP and as I write the
literate programs that precede drafts of TAOCP. I wake up in
the morning with ideas that please me, and some of those ideas actually
please me also later in the day when I’ve entered them into my computer.
On the other hand, I willingly put myself in God’s hands with respect
to how much more I’ll be able to do before cancer or heart disease or
senility or whatever strikes. If I should unexpectedly die tomorrow,
I’ll have no reason to complain, because my life has been incredibly
blessed. Conversely, as long as I’m able to write about computer science,
I intend to do my best to organize and expound upon the tens of thousands
of technical papers that I’ve collected and made notes on since 1962.
Andrew: On your website, you mention that the Peoples Archive
recently made a series of videos in which you reflect on your past life.
In segment 93, "Advice to Young People," you advise that
people shouldn’t do something simply because it’s trendy.
As we know all too well, software development is as subject to fads
as any other discipline. Can you give some examples that are currently
in vogue, which developers shouldn’t adopt simply because they’re currently
popular or because that’s the way they’re currently done? Would you
care to identify important examples of this outside of software development?
Donald: Hmm. That question is almost contradictory, because
I’m basically advising young people to listen to themselves rather than
to others, and I’m one of the others. Almost
every biography of every person whom you would like to emulate will
say that he or she did many things against the "conventional wisdom"
of the day.
Still, I hate to duck your questions even though I also hate to offend
other people’s sensibilities—given that
software methodology has always been akin to religion.
With the caveat that there’s no reason anybody should care about
the opinions of a computer scientist/mathematician like me regarding
software development, let me just say that almost
everything I’ve ever heard associated with the
term "extreme
programming" sounds like exactly the
wrong way to go...with one exception. The exception is
the idea of working in teams and reading each other’s code. That idea
is crucial, and it might even mask out all the terrible aspects of extreme
programming that alarm me.
I also must confess to a strong bias
against the fashion for reusable code. To me, "re-editable code" is
much, much better than an untouchable black box or toolkit.
I could go on and on about this. If you’re totally convinced
that reusable code is wonderful, I probably won’t be able to sway you
anyway, but you’ll never convince me that reusable code isn’t mostly
a menace.
Here’s a question that you may well have meant to ask: Why is the
new book called Volume 4 Fascicle 0, instead of Volume 4 Fascicle 1?
The answer is that computer programmers will understand that I wasn’t
ready to begin writing Volume 4 of TAOCP at its true beginning
point, because we know that the initialization of a program can’t be
written until the program itself takes shape. So I started in 2005 with
Volume 4 Fascicle 2, after which came Fascicles 3 and 4. (Think of
Star Wars, which began with Episode 4.)
About: Highlight is a universal converter from source code
to HTML, XHTML, RTF, TeX, LaTeX, and XML. (X)HTML output is formatted
by Cascading Style Sheets. It supports more than 100 programming languages,
and includes 40 highlighting color themes. It's possible to easily enhance
the parsing database. The converter includes some features to provide
a consistent layout of the input code.
Changes: Embedded output instructions specific to the output
document format were added. Support for Arc and Lilypond was added.
Linux Cross Referencing or
LXR is a very versatile
tool for generating cross-referenced HTML files for source-codes in
C (and C++, I think). For example, you can browse through the linux
source code, as indicated here.
Literate programming systems have the following properties:
Code and extended, detailed comments are intermingled.
The code sections can be written in whatever order is best for
people to understand, and are re-ordered automatically when the
computer needs to run the program.
The program and its documentation can be handsomely typeset
into a single article that explains the program and how it works.
Indices and cross-references are generated automatically.
POD only does task 1, but the other tasks are much more important.
Literate programming is an interesting idea, and worth looking into,
but if we think that we already know all about it, we won't bother.
Let's bother. For an introduction,
see Knuth's
original paper which has a short but complete example. For a slightly
longer example, here's
a library I wrote in literate style that manages 2-3 trees in C.
Andrew Johnson's new book
Elements of Programming
with Perl uses literate programming techniques extensively,
and shows the source code for a literate programming system written
in Perl.
Finally, the Literate
Programming web site has links to many other resources, including
literate programming environments that you can try out yourself.
Whatever the origins of literate programming, there's no doubt that
its fame and/or infame6 comes from the great King Knuth.
For 'twas he, the noble El Don, who first propounded his version of
the "literate" approach to coding in that spooky, fatidic year 1984.7
His ideas were later amplified and published in 1992 (Literate Programming.
Lecture notes, Center for the Study of Language and Information, Stanford).
Those who in-joke about the publicational time gap between volumes 3
and 4 of Knuth's magnum opus, TAOCP (The Art of Computer Programming),
should remember this and all the other fine work that has distracted
him, especially his Herculean efforts in typesetting and typographical
computing.
Barton's plea under the mantras, "Code isn't just for computers"
and "Reading programs for pleasure," is to promote code that humans
can enjoy reading for the sheer fun of it, in the same way, for example,
that they can enjoy curling up in bed with their favorite Trollope (an
author carefully chosen for a cheap thrill unworthy of this august journal).
We note first a possible confusion or overlap between literate and literate
programming. Dijkstra tends to stress literacy in the sense of fluent
command of one's working/publishing tongue (and that really means English
for most practical purposes), so that all text not directly compilable,
such as comments and explanations, would be written crisply and free
from ambiguities. Barton seems to be seeking a literate flair in the
code itself.
... ... ...
Back comes the cry: "But debugging and
maintenance demand code legibility." Here follows a bifurcation
in the literate programming route. Ray Giguette sees a helpful literate
role right at the start of the project, using literate analogies to
shape our approach to software design.12 Robert McGrath dismisses
this too brusquely, I believe, while admitting that even weak analogies
may help to improve understanding between humans involved in design
and coding.13
“When software became merchandise, the opportunity vanished of teaching
software development as a craft and as artistry”.
2005-08-05 (freesoftwaremagazine.com)
Diomidis Spinellis, author of Code Reading: The Open Source Perspective,
is one of the first of what we will come to know as the literate critics
of code. His book is unlike any other programming book that came before
it and for a very exciting reason. What makes it unique is that Spinellis
is teaching us how to read source code instead of merely how
to write it. Spinellis hopes that after reading his book, “You may read
code purely for your own pleasure, as literature” (2). What I want to
emphasize here is that word pleasure. As long as we merely
view code as something practical; as a means designed, for
better or worse, to reach certain practical ends, then we will
never see the flourishing of the literature that Spinellis describes.
What must happen first is the cultivation of a new audience for code.
We desire a readership that derives a different sort of pleasure from
reading magnificent code than those who have come before them. Whereas,
generally speaking, most readers of code today judge code based on the
familiar criteria of precision, concision, efficiency,
and correctness, these future readers will speak of the
beauty of code and the artistry of a well-wrought script.
We will, perhaps, print out the programs of our favorite coders and
read them in the bathtub. Furthermore, we will do so for no other reason
than that we will enjoy doing so; we will as eagerly await
the next Miguel de Icaza as we would the novels of our favorite author
or the films of our favorite director. Even now, the first rays of this
new art are shooting across the horizon; tomorrow, we will shield our
eyes against its brilliance.
Richard P. Gabriel and Ron Goldman’s fabulous essay
Mob Software: The
Erotic Life of Code makes many of the points that I will attempt
to explicate here. One of their theses is that
“When software became merchandise, the opportunity vanished of
teaching software development as a craft and as artistry”.
For Gabriel and Goldman, faceless corporations have reduced coding
to a lowly craft; code is just another disposable product that is only
useful for furthering some corporate agenda. Such base motives have
prevented coding from flourishing as a literature. Gabriel and Goldman
describe the pitfalls of proprietary software development and ask a
rather compelling question:
It’s as if all writers had their own private companies and only
people in the Melville company could read Moby-Dick and
only those in Hemingway’s could read The Sun Also Rises.
Can you imagine developing a rich literature under these circumstances?
... ... ...
Author of the classic Art of Computer Programming books,
Knuth firmly believes that programming can reach literate proportions.
As early as 1974, Knuth was arguing that computer programming is more
artistic than most people realize. “When I speak about computer programming
as an art,” writes Knuth, “I am thinking primarily of it as an art
form, in an aesthetic sense. The chief goal of my work is to
help people learn how to write beautiful programs” (670). Knuth’s
passion and zeal for artistic coding is revealed in such lines as “it
is possible to write grand programs, noble programs,
truly magnificent ones!” (670). For Knuth, this means that
programmers must think of far more than how effectively their code will
compile.
... ... ..
The fine art of coding
In a 1983 article entitled “Literate Programming,” Knuth argues that
“the time is ripe for significantly better
documentation of programs, and that we can best achieve this by considering
programs to be works of literature” (1). Knuth’s project
at that time was literate programming, which is a combination
of a document formatting language and a programming language. The idea
was to greatly extend what can be done with embedded comments; in short,
to make source code as readable as documentation that might accompany
it. The goal was not to necessarily make code that would run more efficiently
on a computer; the point was to make code more interesting and enlightening
to human beings. The result of Knuth’s efforts was WEB, a combination
of PASCAL and TeX, and the newer CWEB, which offers C, C++, or JAVA
instead of PASCAL. WEB and CWEB allow programmers like Knuth to write
“essays” on coding that resemble Pope’s essay on poetry.
One of Knuth’s projects was to take the Will Crowther masterpiece
ADVENTURE and rewrite it with CWEB. The results are marvellous.
It is a joy to read this code. The best way I can describe the pleasure
I derive from reading it is to compare it to listening to really good
director’s commentary on a special-edition DVD. It’s like having a wizened
and witty old friend reading along with me as I study the code. How
many source code files have you read with comments like this:
Now here I am, 21 years later, returning to the great Adventure
after having indeed had many exciting adventures in Computer Science.
I believe people who have played this game will be able to extend
their fun by reading its once-secret program. Of course I urge everybody
to play the game first, at least ten times, before reading on. But
you cannot fully appreciate the astonishing brilliance of its design
until you have seen all of the surprises that have been built in.
Knuth has something here. Knuth’s CWEB “commentary” of Adventure
isn’t the heavily abbreviated, arcane gibberish that passes for comments
in most source code, nor is it slavishly didactic and only concerned
with teaching. It is in many ways comparable to Pope’s essay; we have
a coder representing in code what is magnificent about code
and how one ought to judge it. It is something we will likely to be
studying fifty years from now with the same reverence with which we
approach “The Essay on Criticism” today.
It seems inevitable that as free and open source
software community continues to grow, the need for “literate” programming
techniques will increase exponentially
Jef Raskin, author of The Humane Interface, recently presented
us with an essay entitled “Comments are More Important Than Code.” He
refers to Knuth’s work as “gospel for all serious programmers.” Though
Raskin is mostly concerned with the economic relevance of good
commenting practice, I welcome his criticism of modern programming languages
“that do not allow full flowing and arbitrarily long comments is seriously
behind the times.” It seems inevitable that as free and open source
software community continues to grow, the need for “literate” programming
techniques will increase exponentially. After all, programmers that
no one understands (much less admires) are unlikely to win much influence,
despite their cleverness.
Coding: art or science?
Of the many intriguing topics that Knuth has contemplated over the
years is whether programming should be considered an art or a science.
Always something of a linguist, Knuth examines the etymology of both
terms in a 1974 essay called “Computer Programming as an Art.” His results
indicate that real confusion exists about how to interpret the terms
“art” and “science,” even though we seem to know what we mean
when we claim that computer programming is a “science” and not an “art.”
We call the study of computers “computer science,” Knuth writes, because
“there is something undesirable about an area of human activity that
is classified as an ‘art’; it has to be a Science before it has any
real stature” (667). Yet Knuth argues that “when we prepare a program,
it can be like composing poetry or music” (670). The key to this transformation
is to embrace “art for art’s sake,” that is, to freely and unashamedly
write code for fun. Coding doesn’t always have to be
for the sake of utility. Artful coding can be done for its own sake,
without any thought about how it might eventually serve some useful
purpose.
Daniel Kohanski, author of a wonderful little book entitled The
Philosophical Programmer, has much to say about what he calls the
“aesthetics of programming.” Now, when most folks talk about aesthetics,
they are speaking about what makes the beautiful so beautiful. If I
see a young lady and tell you that I find her aesthetically pleasing,
I’m not talking about how much she can bench-press or how accurately
she can shoot. Yet this seems to be what Kohanski means when he talks
of aesthetical programming:
While aesthetics might be dismissed as merely expressing a concern
for appearances, its encouragement of elegance does have practical
advantages. Even so prosaic an activity as digging a ditch is improved
by attention to aesthetics; a ditch dug in a straight line is both
more appealing and more useful than one that zigzags at random,
although both will deliver the water from one place to the other.
(11)
I feel a sad irony that Kohanski chooses the metaphor of a ditch
to describe what he considers aesthetic code. Coders have been stuck
in this rut for quite some time. We take something as wonderful and
amazing as programming, and compare it to perhaps the lowliest manual
labor on earth: the digging of ditches. If conciseness, durability,
and efficiency are all that matters, programmers work without art and
grace and might as well wield shovels instead of keyboards.
Let me set a few things straight here. When most people try to establish
“Science and Art” as binary oppositions, they would generally do better
to use the terms “Engineers and Artists.” Computer programming can
be thought of from a strictly engineering perspective—that is, an application
of the principles of science towards the service of humanity. Civil
engineering, for instance, involves building safe and secure bridges.
According to the Oxford English Dictionary, the word engineer
was first used as a term for those who constructed siege engines—war
machinery. The word still carries a very practical connotation; we expect
engineers to be precise, clever, and so on, but expect a far different
set of qualities from those we term artists. Whereas the stereotypical
engineer is an introvert with a pocket protector and calculator wristwatch,
the stereotypical artist is someone like Salvador Dali—a wild, eccentric
type who is poorly understood, yet wildly revered. We expect our artists
to be unpredictable and delightfully social beings—who really understand
the human condition. We expect engineers to be pretty dull folks to
have around at parties.
They are the painters who have convinced themselves
that because they cannot sell their frescoes, that painting houses
is the only sensible thing one can do with a paintbrush
Such oppositions are seldom useful and more often misleading. We
might think of the man insisting that programming is a “science” as
equally intelligent as his companion, Tweedledum, who insists that it
is quite obviously an art. The truth, according to Knuth, is that programming
is “both a science and an art, and that the two aspects nicely complement
each other” (669). Like civil engineering, programming involves the
application of mathematics. Like poetry, programming involves the application
of aesthetics. As with bridges, some programs are mundane things that
clearly serve only to get folks across bodies of water, whereas others,
like the Golden Gate Bridge, are magnificent structures rightly regarded
as national landmarks. Unfortunately, the modern discourse surrounding
computer programming is far too slanted towards the banal; even legends
of the field cannot bring themselves to see their calling as anything
but a useful but dull craft. They are the painters who have convinced
themselves that because they cannot sell their frescoes, that painting
houses is the only sensible thing one can do with a paintbrush.
The future of programming as art
Computer programming is not limited to engineering, nor must coders
always think first of efficiency. Programming is also an art, and, what’s
more, it’s an art that shouldn’t be limited to what is “optimal”. Even
though programs are usually written to be parsed and executed by computers,
they are also read by other human beings, some of whom, I dare say,
exercise respectable taste and appreciate good style. We’ve misled ourselves
into thinking that computer programming is some “exact science,” more
akin to applied physics than fine art, yet my argument here is that
what’s really important in the construction of programs isn’t always
how efficiently they run on a computer—or even if they work at all.
What’s important is whether they are beautiful and inspiring to behold;
if they are sublime and share some of the same features that make masterful
plays, compositions, sculptures, paintings, or buildings so magnificent.
A programmer who defines a good program simply as “one that best does
the job with the least use of a computer’s resources” may get the job
done, but he certainly is a dull, uninspiring fellow. I wish to celebrate
programmers who are willing to dispense with this slavish devotion to
efficiency and see programming as an art in its own right; having not
so much to do with computers as other human beings who have the knowledge
and temperament to appreciate its majesty.
It is all too easy to transpose historical developments in literature
and literate criticism onto computer programming. Undoubtedly, such
a practice is at best simplistic—at worst it is myopic. Comparisons
to poetry, as Gabriel and Goldman point out, are all too tempting. Like
poetry, coding is at once imaginative and restricted:
Release is reined in by restraint: requirements of form, grammar,
sentence-making, echoes, rhyme, rhythm. Without release there could
be nothing worth reading; the erotic pleasure of pure meandering
would be unapproached. Without restraint there cannot be sense enough
to make the journey worth taking.
It is quite possible to look at the source code of a C++ program
and imagine it to be a poem; some experiment with “free verse” making
clever use of programming conventions. Such comparisons, while certainly
intriguing, are not what I’m interested in pursuing. Likewise, I am
not arguing that artistic coding is simply inserting well-written comments.
I would not be interested in someone’s effort to integrate a Shakespearean
sonnet into the header file of an e-mail client.
Instead, I’ve tried to assert that coding itself can be artistic;
that eloquent commenting can complement, but not substitute
for, eloquent coding. To do so would be to claim that it is more important
for artists to know how to describe their paintings than to paint them.
Clearly, the future of programming as art will involve both types of
skills; but, more importantly, the most artistic among us will be those
who have defected from the rank and file of engineers and refused to
kneel before the altar of efficiency. For these future Byrons and Shelleys,
the scripts unfolding beneath their fingers are not some disposable
materials for the commercial benefit of some ignorant corporate juggernaut.
Instead, they will be sacred works; digital manifestations of the spirit
of these artists. We should treat them with the same care and respect
we offer hallowed works in other genres, such as Rodin’s Thinker,
Virgil’s Aeneid, Dante’s Inferno, or Pope’s Essay
on Criticism. Like these other masterpieces, the best programs
will stand the test of time and remain impervious to the raging rivers
of technological and social change that crash against them.
To really appreciate the fine art of computer programming,
we must separate what works well in a given computer from
what represents artistic genius, and never conflate the
two—for the one is a fleeting, forgettable thing, but the other
will never die
This question of permanence is perhaps where we find ourselves
stumbling in our apology for programming. How can we talk of a program
as a “masterpiece”, knowing that, given the rate of technological development
that it may soon become so obsolete as not to function in our computers?
Yet here is the reason that I have stressed how insignificant it is
that a program actually works for it to be rightly considered
magnificent. Indeed, I find it almost certain that we will find ourselves
with programs whose utter brilliance we will not be capable of recognizing
for decades, if not centuries. We can imagine, for instance, a videogame
written for systems more sophisticated than any in production today.
Likewise, any programmer with any maturity whatsoever can appreciate
the inventiveness of the early pioneers, who wrought miracles far more
impressive in scope than the humble achievements so brazenly trumpeted
in the media today. To really appreciate the fine art of computer programming,
we must separate what works well in a given computer from
what represents artistic genius, and never conflate the two—for
the one is a fleeting, forgettable thing, but the other will never die.
Highlight is a universal converter from source code to HTML, XHTML,
RTF, TeX, LaTeX, and XML. (X)HTML output is formatted by Cascading Style
Sheets. It supports more than 100 programming languages, and includes
40 highlighting color themes. It's possible to easily enhance the parsing
database. The converter includes some features to provide a consistent
layout of the input code.
Release focus: Minor bugfixes
Changes:
This release fixes XML parsing and adds a new option to set the CSS
class name prefix for HTML output.
The following paragraphs discuss
the main benefits of traditional
literate programming. Note: none of these benefits
depends on printed output.
Design and coding happen at the highest possible level.
The names of
sections are constrained only by one's design skill, not by any rules
of language. You say what you mean, and that becomes both the design and
the code. You never have to simulate a concept because concepts become
section names.
The visual weight of code is separate from its actual length.
The visual weight of a
section is simply the length and complexity of the
section name, regardless of how complex the actual definition of the
section is. The results of this separation are spectacular. No longer is
one reluctant to do extensive error handling (or any other kind of minutia)
for fear that it would obscure the essence of the program. Donald Knuth
stresses this aspect of literate programming and I fully agree.
Sections show relations between snippets of code.
Sections can show and enforce relationships between apparently unrelated
pieces of code. Comments, macros or functions are other ways to indicate
such relationships, but often sections are ideal. Indeed, a natural progression
is to create sections as a matter of course. I typically convert a section
to a function only when it becomes apparent that a function's greater generality
outweighs the inconvenience of having to declare and define the function.
Complex section names invite improvements. A
section name is complex when it implies unwholesome dependencies between
the caller (user) of the section and the
section itself. Such section names tend to be conspicuous, so that the
programmer is lead to revise both the section name and its purpose. Many
times my attention has been drawn to a poorly conceived section because
I didn't like what its name implied. I have always been able to revise the
code to improve the design, either by splitting a section into parts or
be simplifying its relation to colleagues.
Sections create a place for extensive comments. One
of the most surprising thing about
literate programming is how severely traditional programming tends to
limit comments. In a conventional program the formatting of code must indicate
structure, and comments obscure that formatting.
Sections in literate programming provide a place for lengthy comments
that do not clutter the code at the place the section is
referenced.
Section names eliminate mundane comments. The
section name often says it all. The
reference to the section says everything that the user needs to know,
and the section name at the point of definition also eliminates the need
for many comments.
"A cloned node is a copy of a
node that changes when the original changes. Changes to the
children,
grandchildren, etc. of a node are simultaneously made to the corresponding
nodes contained in all cloned nodes. A small red arrow in icon boxes marks
clones.
Please take a few moments to experiment with clones. Start with a single
node, say a
node whose
headline is A. Clone node A using the CloneNode
command in Leo's Outline menu. Both clones are identical; there is no
distinction between the original node and any of its clones.
Type some text into the body of either node A. The same text appears
in the bodies of all other
clones of A. Now insert a node, say B, as a child of any of the A nodes.
All the A nodes now have a B child. See what happens if you clone B. See
what happens if you insert, delete or move nodes that are
children of A. Verify that when the second-to-last cloned node is deleted
the last cloned node becomes a regular node again.
Clones are much more than a cute feature. Clones allow multiple
views of data to exist within a single outline. The ability to
create multiple views of data is crucial; you don't have to try to decide
what is the 'correct' view of data. You can create as many views as you
like, each tailored exactly to the task at hand."
"I am using Leo since a few weeks and I brim over with enthusiasm
for it. I think it is the most amazing software since the invention
of the spreadsheet."
"We who use Leo know that it is a breakthrough tool and a whole new
way of writing code." -- Joe Orr
"I am a huge fan of Leo. I think it's quite possibly the most revolutionary
programming tool I have ever used and it (along with the Python language)
has utterly changed my view of programming (indeed of writing) forever."
-- Shakeeb Alireza
"Thank you very much for Leo. I think my way of working with data
will change forever... I am certain [Leo] will be a revolution. The
revolution is as important as the change from sequential linear organization
of a book into a web-like hyperlinked pages. The main concept that impress
me is that the source listing isn't the main focus any more. You focus
on the non-linear, hierarchical, collapsible outline of the source code."
-- Korakot Chaovavanich
"Leo is a quantum leap for me in terms of how many projects I can
manage and how much information I can find and organize and store in
a useful way." -- Dan Winkler
"Wow, wow, and wow...I finally understand how to use clones and I
realized that this is exactly how I want to organize my information.
Multiple views on my data, fully interlinkable just like my thoughts."
-- Anon
"Edward... you've come up with perhaps the most powerful new concept
in code manipulation since VI and Emacs. -- David McNab
"Leo is...a revolutionary step in the right direction for programming."
-- Brian Takita
The Doxygen configuration is kept
in abi/src/.doxygen.cfg.
The INPUT
variable contains the list of directories to be scanned when generating
documentation. At present time only the text directory (the AbiWord
backend) is actually scanned - but it's simple to add other directories.
Each component of AbiWord has an
overview description stored in a README.TXT
file. This is where you want to put the grand overview - and please
add text if you gain insight on stuff not presently documented in the
README.TXT
files.
From the
README.TXT files
you can refer to class/function names and the outcome is nice guided
tour where people can read the overview description and dive into the
code from there. It is of course also possible to just go directly to
the various hierarchies and lists at the top of all pages.
AbiWord
Doxygen Style Guide
Just a few guidelines for now. See
fp_Container which adheres to these (I think) and is comment complete.
Please try to adhere to these as it makes
for more consistent documentation (looks as well as content) - which
gives a more professional feel to it. If you have ideas for other guidelines,
please post them to the developer list and we'll discuss it.
KISS! We don't want the source code to drown in fancy formatted
comments.
Comments should be kept in raw ASCII where possible. If you
feel structure or typeface commands would help, use the HTML tags
which most people understand.
The first line of a comment block is the brief description (do
not use \brief). Follow it by input/output descriptions, then a
longer comment if necessary. Finally add \note, \bug, \see, \fixme
as necessary.
Put the descriptions by the function definition, not the declaration.
Always use the
/*!
Short description
\param param1 Param 1 Description
long descriptions should be indented like this
<repeat as necessary>
\retval paramN+1 Return value ParamN+1 description
<repeat as necessary>
\return Return value description
Long description
...
\note Note ...
<repeat as necessary>
\fixme FIXME description 1
<repeat as necessary>
\bug Bug description 1 <you can add URL to bugzilla here>
<repeat as necessary>
\see otherClass::otherFunction1
<repeat as necessary>
*/
variant of the comment marker, and leave the opening and closing
markers on empty lines.
In the brief line, describe what the function does,
not how it does it. Leave the input/output details to the appropriate
lines (accessors excepted). See
fp_Container::isEmpty.
Always add input/output details for a function: \param, \retval
(return value via pointer parameter), \return (actual function return
value).
A list of quick
hints about doxygen syntax. Please see
www.doxygen.org
for the full syntax.
Suppress links with % (doxygen will add links to any function
or filename).
Note that "-" characters can be used for simple bullet list
creation. I wonder if we should suppress that in favor of HTML tags.
Sometimes you want to write a class name or similar in plural,
but doxygen will not add a link. You can work around that by something
like "\link
fp_Run fp_Runs \endlink" but it's horrible to look at in the
raw text. So do without it, rephrase, put the singular word in parenthesis
"fp_Runs (fp_Run)",
or assume the reader can find the class in the class list.
Add references to named sections of the documentation with \ref
(e.g.
Formatter which links to the README.TXT in the
text/fmt directory on account of it having a \page
command).
We may also want to discuss allowing simple figures for documenting
hairy code. I think it should be possible - but it should not be done
on account of comment text: the programmer should not be required to
look at the doxygen output to understand the code!
Do we want the brief descriptions and return/param text to be in
a certain language style? Would help make the doc look consistent, but
may be too much detail for people to bother with complying. Please see
fp_Container for a suggested style (i.e., compute vs. computes).
Andrew Johnson's new book
Elements
of Programming with Perl uses literate programming techniques
extensively, and shows the source code for a literate programming
system written in Perl.
Finally,
the
Literate Programming web site has links to many other resources,
including literate programming environments that you can try out
yourself.
I think the author missed the main appeal of literate programming: by
trying to document program during writign you impove the quality of the
program even if nobody will read this part.
It may, however, very well be worthwhile and useful
to consider more symmetric relationships between program and documentation.
Thus, instead of embedding one kind of information into the other, we
can instead model documentation and program fragments as separate entities
tied together with relations. The relations can be implemented in a
number of diffent ways, e.g., as hypertext links or via database technology.
Literate programming systems have the following properties:
Code and extended, detailed comments are intermingled.
The code sections can be written in whatever order
is best for people to understand, and are re-ordered
automatically when the computer needs to run the program.
The program and its documentation can be handsomely
typeset into a single article that explains the program
and how it works. Indices and cross-references are generated
automatically.
POD only does task 1, but the other tasks are much more
important.
Literate programming is an interesting idea, and worth
looking into, but if we think that we already know all about
it, we won't bother. Let's bother. For an introduction,
see Knuth's original paper which has a short but complete
example. For a slightly longer example,
here's a
library I wrote in literate style that manages 2-3 trees
in C.
Andrew Johnson's new book
Elements of
Programming with Perl uses literate programming
techniques extensively, and shows the source code for a
literate programming system written in Perl.
Finally,
the Literate
Programming web site has links to many other resources,
including literate programming environments that you can
try out yourself.
Doxygen
is a documentation system for C++, C, Java, IDL (Corba and Microsoft flavors)
and to some extent PHP and C#.
It can help you in three ways:
It can generate an on-line documentation browser (in HTML) and/or
an off-line reference manual (in
) from
a set of documented source files. There is also support for generating
output in RTF (MS-Word), PostScript, hyperlinked PDF, compressed HTML,
and Unix man pages. The documentation is extracted directly from the
sources, which makes it much easier to keep the documentation consistent
with the source code.
You can
configure doxygen to extract the code structure from undocumented
source files. This is very useful to quickly find your way in large
source distributions. You can also visualize the relations between the
various elements by means of include dependency graphs, inheritance
diagrams, and collaboration diagrams, which are all generated automatically.
You can even `abuse' doxygen for creating normal documentation (as
I did for this manual).
Doxygen is developed under
Linux, but is set-up
to be highly portable. As a result, it runs on most other Unix flavors as
well. Furthermore, executables for Windows 9x/NT and Mac OS X are available.
Projects using doxygen: I have compiled a
list of projects
that use doxygen. If you know other projects, let me know and I'll add them.
Although doxygen is used successfully by a lot of people already, there
is always room for improvement. Therefore, I have compiled a
todo/wish list
of possible and/or requested enhancements.
Development has now moved to
sourceforge. See the
development section below for more information.
The Linux Cross-Reference project is
the testbed application of a general hypertext cross-referencing tool.
(Or the other way around.)
The main goal of the project is to create
a versatile cross-referencing tool for relatively large code repositories.
The project is based on stock web technology, so the codeview client
may be chosen from the full range of available web browsers. On the
server side, the prototype implementation is based on an
Apache web server,
but any Unix-based web server with cgi-script capability should do nicely.
(The prototype implementaion is running on a dual Pentium Pro Linux
box.)
The main feature of the indexer is of
course the ability to jump easily to the declaration of any global identifier.
Indeed, even all references to global identifiers are indexed.
Quick access to function declarations, data (type) definitions and preprocessor
macros makes code browsing just that tad more convenient. At-a-glance
overview of e.g. which code areas that will be affected by changing
a function or type definition should also come in useful during development
and debugging.
Other bits of hypertextual sugar, such
as e-mail and include file links, are provided as well, but is on the
whole, well, sugar. Some minimal visual markup is also done. (Style
sheets are considered as a way to do this in the future.)
Technicalities
The index generator is written in
Perl and relies heavily
on Perl's regular expression facilities. The algorithm used is very
brute force and extremely sloppy. The rationale behind the sloppiness
is that too little information renders the database useless, while too
much information simply means the users have to think and navigate at
the same time.
The Linux source code, with which the
project has initially been linked, presents the indexer with some very
tough obstacles. Specifically, the heavy use of preprocessor macros
makes the parsing a virtual nightmare. We want to index the information
in the preprocessor directives as well as the actual C code, so we have
to parse both at once, which leads to no end of trouble. (Strict parsing
is right out.) Still, we're pretty satisfied with what the indexer manages
to get out of it.
There's also the question of actually
broken code. We want to reasonably index all code portions, even if
some of it is not entirely syntactically valid. This is another reason
for the sloppiness.
There are obviously disadvantages to
this approach. No scope checking is done, and the most annoying effect
of this is mistaking local identifers for references to global ones
with the same name. This particular problem (and others) can only be
solved by doing (almost) full parsing. The feasibility of combining
this with the fuzzy way indexing is currently done is being looked into.
An identifier is a macro, typedef, struct,
enum, union, function, function prototype or variable. For the Linux
source code between 50000 and 60000 identifiers are collected. The individual
files of the sourcecode are formatted on the fly and presented with
clickable identifiers.
It is possible to search among the identifiers
and the entire kernel source text. The freetext search is implemented
using Glimpse,
so all the capabilities of Glimpse are available. Especially the regular
expression search capabilities are useful.
Availiablility
The sourcecode for the LXR engine is
of course availiable. It is released under the
GNU
Copyleft license. Version 0.3 can now be
downloaded.
You can use it to index your own projects. Version 0.3 includes C++
support and a much nicer diff markup than before. Please tell us if
you have trouble with the installation. Also, be aware that the documentation
is still rather incomplete. Jim Greer has been kind enough to write
some more comprehensive installation instructions. If you have trouble
look at his
installation
instructions.
In this paper we introduce HyperCode, a HyperText
representation of program source code. Using HTML for code presentation,
HyperCode provides links from uses of functions, types, variables, and
macros to their respective definition sites; similarly, definitions
are linked to lists-of-links back to use sites. Standard HTML browsers
such as Mosaic thereby become powerful tools for understanding
program control flow, functional dependencies, data structures, and
macro and variable utilization. Supporting HyperCode with a code database
front-ended by a WWW server enables software sharing and development
on a global scale by leveraging the programming, debugging, and computing
power brought together by the World-Wide Web.
code2html by Peter Palfrader
(Weasel) is a perlscript which converts a program source code to syntax
highlighted HTML. It may be called from the command line or as a CGI script.
It can also handle include commands in HTML files. Currently supports: Ada
95, C, C++, HTML, Java, JavaScript, Makefile, Pascal, Perl, SQL, AWK, M4,
and Groff.
code2html is a perlscript which converts a program source code to syntax
highlighted HTML. It may be called from the command line or as a CGI script.
It can also handle include commands in HTML files. It really should be rewitten
eventually since the code is so ugly.
Cxref is a program that will produce documentation (in LaTeX, HTML, RTF
or SGML) including cross-references from C program source code. The program
comes with more detailed information. There is a
README, which contains an example of the output of the program. (also
available in
PostScript or
plaintext.)
A fuller example of the output of the program can be seen in the
cxref output for the cxref source code itself.
To help with problems encountered in using the program, there is a
FAQ.
It has been designed to work with ANSI C, incorporating K&R, and most
popular GNU extensions.
(The cxref program only works for C not C++, I have no plans to produce
a C++ version.)
The documentation for the program is produced from comments in the code
that are appropriately formatted. The cross referencing comes from the code
itself and requires no extra work.
The documentation is produced for each of the following:
Files
A comment that applies to the whole file.
Functions
A comment for the function, including a description of each of the arguments
and the return value.
Variables
A comment for each of a group of variables and/or individual variables.
#include
A comment for each included file.
#define
A comment for each pre-processor symbol definition, and for macro arguments.
Type definitions
A comment for each defined type and for each element of a structure
or union type.
Any or all of these comments can be present in suitable places in the
source code.
As an example, the file
README.c has been put through cxref to give
HTML output.
The cross referencing is performed for the following items
Files
The files that the current file is included in (even when included
via other files).
#includes
Files included in the current file.
Files included by these files etc.
Variables
The location of the definition of external variables.
The files that have visibility of global variables.
The files / functions that use the variable.
Functions
The file that the function is prototyped in.
The functions that the function calls.
The functions that call the function.
The files and functions that reference the function.
The variables that are used in the function.
Each of these items is cross referenced in the output.
The latest released version available is version 1.5e.
Version 1.5e of cxref released Sun June 29 2003
Bug fixes
Don't lose the comment or value when C++ style comments follow a #define.
Updated to work with newer version of flex and SUN version of yacc.
Handle references for local functions with the same name in several files.
Remove some extra ';' from the HTML output.
Handle macros with variable args like MACRO(a,b,...) as well as MACRO(a,b...).
GCC changes
Handle gcc-3.x putting all of its internal #defines in the output.
Compile cxref-cpp if using gcc-3.x that drops comment on same line as #define.
Version 1.5d of cxref released Sun May 5 2002
Bug fixes
Fixes to HTML and SGML outputs (invalid character entities). Fix bug that
stopped -R/ from working. Fix links to HTML source files in certain cases.
Keep the sign of negative numbers in #define output. Improve the lex code
(flex -s). Add some missing ';' to yacc code. Fix the bison debugging
output. Change the use of IFS in cxref-ccc script.
Configure/Make changes
Fix Makefile to compile using non-GNU make programs.
Add flex specific options to the Makefile if using it.
Fixes for build/configure outside the source tree.
Include DESTDIR in the Makefile to help installation.
Configure makes a guess what to do with cxref-cpp if gcc is not installed.
GCC changes
Accept the gcc-3.0 __builtin_va_list type as-if it were a valid C type.
Handle the GCC __builtin_va_arg extension keyword.
Handle the GCC floating point hex extension data format.
Allow the use of gcc-3.x instead of the cxref-cpp program.
Version 1.5c of cxref released Sat Apr 28 2001
Bug fixes
Better Comment handling. Allow the __restrict keyword. Allow bracketed
function declarations. Remove gcc compilation warnings. Allow the
configure script to be run from a different directory.
Optimisation
Speed up the lex code.
Version 1.5b of cxref released Sun Sep 26 1999
Bug fixes
Comments that use the '+html+' convention appear correctly in the HTML source
output. More configurable Makefile (CFLAGS and LDFLAGS options to configure).
Increase the length of static arrays for getcwd(). Fix NAME_MAX compilation
problem. Fix deferencing NULL pointer problem.
Optimisation
Speed up the cross referencing, especially for the first pass with no outputs.
Version 1.5a of cxref released Fri Jun 18 1999
Bug fixes
Fix the "+html+" etc in comments. Make verbatim comments work in LaTeX
output. Allow $ in function and variable names. Allow the configure to force
cxref-cpp instead of gcc. Tidy the Makefiles. Increase the size of
statically allocated arrays in cross referencing. Remove the problem of #line
directives causing confusion. Handle more GNU C extensions. Fix references
to the source file from the HTML. Handle C++ comments following #defines.
Output
The full cxref and cpp command lines are displayed as comments in output files.
Version 1.5 of cxref released Sun Feb 21 1999
Bug fixes
Fix the FAQ to HTML converter. Stop comments in header files leaking out.
Configuration
Use the GNU autoconf program to create a configure script.
Now uses gcc instead of cxref-cpp if it is new enough (version >= 2.8.0).
Now compiles and runs under MS Win32 with the cygwin library.
Output
Added SGML (Linuxdoc DTD) output.
Added RTF (Rich Text Format) output.
Added HTML 3.2 output (with tables).
Added an HTML version of the source file with links into it.
Tools
Provided a Perl script to automatically determine required header files.
Version 1.4b of cxref released Sat Apr 18 1998
... ... ...
The full version history is in the
NEWS file distributed with the program.
Mailing List
There is a mailing list available for announcements about new versions
of Cxref. This will only be used by me to send announcements about new
versions of Cxref, it is not for Cxref discussions.
You can alternatively send an e-mail to cxref-announce-request at
gedanken.demon.co.uk with subscribe in the body.
Cxref (in various versions) has been tested on the following systems:
Linux 1.[123].x, Linux 2.[01234].x, SunOS 4.1.x, Solaris 2.x, HPUX 10.x,
AIX, Irix, (Free|Net)BSD, Win32 with Cygnus development kit.
The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with.
We do not warrant the correctness of the information provided or its
fitness for any purpose
In no way this site is associated with or endorse cybersquatters
using
the term "softpanorama" with other main or country domains (e.g. softpanorama.com) with
bad faith intent to profit from the goodwill belonging to
someone else.
Created Jan 1, 1996;
Last modified:
August 15, 2009