"Without wanting to be elitist,
the thing that will prevent literate programming from becoming a mainstream
method is that it requires thought and discipline. The mainstream
is established by people who want fast results while using roughly the
same methods that everyone else seems to be using, and literate programming
is never going to have that kind of appeal. This doesn't take away from
its usefulness as an approach."
Patrick TJ
McPhee
The idea of literate programming is an idea of
content
management applied to program sources. It was proposed by Donald Knuth in 1984
in his article Donald
Knuth. Literate Programming published in Computer Journal (British computer
society publication).
While the term got some traction, unfortunately the idea itself cannot be completely counted as one of Donald Knuth successes.
Unlike TAOCP or TeX it never caught up and had a rather cool initial reception.
But that does not mean that the idea was/is without merits and later some components
of it became standard part of every decent IDE.
But the whole idea of literate programming was three key components:
The first is essentially an invention of benefits of hypertext
representation of program and its documentation for software writing.
As originally conceived by Don Knuth, literate programming involves prettyprinting
code: displaying it using several fonts with proper nesting and systematic line
breaks. It probably was inspired by the ``publication syntax'' of Algol 60.
Not it is easy to covert any program test to HTML as there converters almost
for any language in existence. As such HTML is a better markup language then
TeX but we need to remember that at this time when Donald Knuth wrote his article
HTML did not exist (This idea was described in his paper in Computer Journal,
1984). TeX was created using this approach and simultaneously used in a bootstrap
fashion but several key notions connected with the idea of hypertext.
The philosophy behind WEB is that an experienced system programmer, who
wants to provide the best possible documentation of his or her software
products, needs two things simultaneously: a language like TeX for formatting,
and a language like C for programming. Neither type of language can provide
the best documentation by itself; but when both are appropriately combined,
we obtain a system that is much more useful than either language separately.
The structure of a software program may be
thought of as a web that is made up of many interconnected pieces.
To document such a program we want to explain each individual part of the
web and how it relates to its neighbors. The typographic tools provided
by TeX give us an opportunity to explain the local structure of each part
by making that structure visible, and the programming tools provided by
languages such as C or Fortran make it possible for us to specify the algorithms
formally and unambiguously. By combining the two,
we can develop a style of programming that maximizes our ability to
perceive the structure of a complex piece of software, and
at the same time the documented programs can be mechanically translated
into a working software system that matches the documentation.
Now it is simpler both to discuss and implement ideas of literate
programming in the HTML context as the latter is now dominant markup language.
It is actually a historical accident that the markup language for Web was created
on the base of SGML and not on the base of TeX. But right now HTML rules
and Web server can be the cornerstone of an implementation of a literate programming
platform. Most utilities and www browsers will convert HTML back to plain
text, for example the Linemode browser or Lynx:
Netscape and Internet Explorer let you "save as" plain text
The second is the view of program writing as a new type of
literate work. The key finding is that writing documentation/notes
along with program improves quality of both, often dramatically. An ideal program,
Knuth used to say, can be read by the fireside, like good prose. This idea was
field tested by Knuth himself while writing TeX and was first described in his
paper in Computer Journal, 1984. Knuth essentially reiterated old maxim that
the very act of communicating one's work clearly
to other people will improve the work itself.
The key idea is that there are more symmetric relationships
between program and documentation and such classic features as folding and outlining
are very useful in working with program code. Attempts to view a program as
a book were not new and isolated components of Knuth vision were refined long
before TeX. For example the whole
XPL Language compiler was documented
in the book A Compiler Generator by McKeeman, Horning and Wortman, published
by Prentice-Hall, 1970, ISBN 13-155077-2. See also
Orthodox Editors Page
What was new is the idea of the tools that can make such method
of writing of program more smooth and efficient.
The third idea is that additional representations of program
like cross reference table has tremendous effect on minimizing initial number
of errors in the program and as such make debugging less labor intensive.
Knuth did not understand the usefulness of slicing, hypertext language reference
and code fragments libraries but that is a natural extension of his approach.
Instead of writing code containing documentation, the literate programming
suggest writing documentation containing code. Knuth indicated that he chose
the name "literate programming" in part to contrast with "structured programming",
which was the fashion of the time and which he apparently felt pointed programmers
in the completely wrong direction (and he was 100% right on this; now nobody even
remember all this fundamentalist ramblings, only positive things like enhanced control
structures survived the test of the time from all structured programming blah-blah-blah
;-)
The very act of
communicating one's work clearly to other people will improve the work
itself
In his later book on the topic [ pg. 99.] Knuth stressed
the importance of writing programs and documentation
as a single interrelated process not as a two separate processes.
I believe that the time is ripe for significantly better documentation
of programs, and that we can best achieve this by considering programs
to be works of literature. Hence, my title: "Literate Programming."
Let us change our traditional attitude to the construction
of programs: Instead of imagining that our main task is to instruct a computer
what to do, let us concentrate rather on explaining to human beings what we
want a computer to do.
The practitioner of literate programming can be regarded
as an essayist, whose main concern is with exposition and excellence of style.
Such an author, with thesaurus in hand, chooses the names of variables carefully
and explains what each variable means. He or she strives for a program
that is comprehensible because its concepts have been introduced in an order
that is best for human understanding, using a mixture of formal and informal
methods that reinforce each other.
Now after so many years and after Web and HTML became firmly
entrenched
we can reformulate the idea of literate programming in WWW terms. Actually usage
of TEX while a tremendous step forward is not optimal for literate programming and
actually negatively affected subsequent acceptance of "literate programming" as
a technology. In WEB terms we can view literate programming as a certain
specialized
wiki framework with several distinct features.
Sections of Wiki which represent code are automatically
converted
into "neat" format using pretty printing and syntax highlighting for program
source (this is already an old hat; typographical niceties that now became
pretty much standard in any programming environment GUI).
Documentation sections of the program can hyperlink with
code sections and cross-reference table.
Automatic code extraction with or without documentation sections
and submitting the resulting text file to compiler and interpreter. This
should be completely automatic (BTW that is achievable in many modern HTML editors,
including FrontPage, Dreamweaver, etc). HTML provides server side which
can be used to include program fragments into a composite document.
XREF tables as an important part of programming environment
(currently the best way to generate then is to use the editor with pipe
execution capabilities like SlickEdit of vim, or generate then into a separate
window in the browser). Various class browsers were developed for partial symbol
table generation.
Incorporation of outlining and slicing into programming environment
(extraction of documentation of code from a mixed document is a special case
of outlining)
Availability of blog-type sections that can document the
progress of the work.
All sub-technologies that are linked under the umbrella of literate
programming are pretty well known and used by programmers for a long time.
But nobody managed to link them together into a coherent meta-technology and style
of programming before Knuth. For example cross-reference tools were a part of any
good programmer toolset from early 60th. Pretty printing of programs also comes
from early 60th. Syntax highlighting in pretty printing was almost as old as pretty
printing itself and reappeared in editors with the introduction of color displays.
I also know that in early 70th many programmers used document editors like MS Word
with its outstanding outlining capabilities instead of a programming editor
with considerable success. Orthodox editors
like Kedit and SlickEdit are close to this approach due
to support of programmable folding.
But at the same time, while serving as a integration point for previously
isolated technologies, literate programming create a qualitatively new paradigm
of program development. At the same it allowed for further development of
each of the underling technologies in new directions. For example software visualization
is much broader and much more complex subject that just TeX based program
representation. What is really important is the viewing program as a
literate work, book or article, not just an ability to manipulate the program in
various ways that increases its understanding.
While serving as a integration point for previously
isolated technologies, literate programming create a qualitatively
new paradigm of program development. At the same it allowed for further
development of each of the underling technologies in new directions.
For example software visualization is much broader and much more
complex subject that just TeX based program representation.
What is really important is the viewing program as a literate work,
book or article, not just an ability to manipulate the program in various
ways that increases its understanding.
XREF tools, syntax highlighting and outlining are just three facets
of this complex problem. The "missing link" (integration of all three
into Wiki style environment) probably can explain rather cool acceptance of
the idea. Also usage of HTML is preferable to usage of TeX. TeX proved to
be not a very flexible way to develop a complex system and as such it does not provide
any significant advantages over will developed IDE with the elaborate code browser
and project management tools like Visual Studio or Eclipse
Now with HTML widely used and wiki technologies available it's might
be a good time to take a second look on the initial ideas and re-implement most
sound concepts in a new way with much better integration and flexibility then TeX
permits. The great advantage of using HTML is simplicity and availability
of excellent html editors like FrontPage.
The real question is how to integrate cross references, indices, outlining and "syntax
highlighted" sources in an attractive, flexible system.
Key Problems with Literate Programming
Literate programming is not without problems and this might explain
low level of adoption of this technology.
There is no silver bullet in program understanding/program writing
and like any technology "book-style" representation is better for some purposes
and worse for others.
The key problem with literary programming is that it is a static
representation. Understanding (and writing) of the program requires flexible
dynamic representation. Also generation of program text from the markup representation
creates the classic problem of two texts although it is less severe in comparison
with problems that arise in macro substitution and can be amended by usage of
Wiki style environment where access to underling representation is available
only for editing.
Also omitted from the concept of literary programming the idea
of folding and outlining which are two powerful tool that simplify writing of
complex documents and programs.
Generating XREF tables is only one approach to the visibility
of variables in the program. There are many other. Again, the programmer can
benefit from many views of his variables not just a single table. Typical SQL-style
queries might be useful .
Most published examples of literary programming are pretty dull
and actually more discredit then attract people to the technology.
Notes:
Those pages are written by people for whom English is not a
native language. Some amount of grammar and spelling errors
should be expected.
This is a Spartan WHYFF (We Help You For Free) site. It
cannot replace the best teachers and
the
best books.
The site contain some obsolete pages as it develops like a
living tree... Some links on older pages are broken. Please
try to use Google, Open directory, etc. to find a replacement link
(see
HOWTO search the WEB for details).
We would appreciate if you can
mail us a correct link.
Andrew Binstock and Donald Knuth converse on the success of
open source, the problem with multicore architecture, the disappointing lack
of interest in literate programming, the menace of reusable code, and that
urban legend about winning a programming contest with a single compilation.
Andrew Binstock: You are one of the fathers of the open-source
revolution, even if you aren’t widely heralded as such. You previously have
stated that you released TeX
as open source because of the problem of proprietary implementations at the
time, and to invite corrections to the code—both of which are key drivers
for open-source projects today. Have you been surprised by the success of
open source since that time?
Donald Knuth: The success of open source code is perhaps the only thing
in the computer field that hasn’t surprised me during the past
several decades. But it still hasn’t reached its full potential; I believe
that open-source programs will begin to be completely dominant as the
economy moves more and more from products towards services, and as more and
more volunteers arise to improve the code.
For example, open-source code can produce thousands of binaries, tuned
perfectly to the configurations of individual users, whereas commercial
software usually will exist in only a few versions. A generic binary
executable file must include things like inefficient "sync" instructions
that are totally inappropriate for many installations; such wastage goes
away when the source code is highly configurable. This should be a huge win
for open source.
Yet I think that a few programs, such as Adobe Photoshop, will always be
superior to competitors like the Gimp—for some reason, I really don’t know
why! I’m quite willing to pay good money for really good software,
if I
believe that it has been produced by the best programmers.
Remember, though, that my opinion on economic questions is highly
suspect, since I’m just an educator and scientist. I understand almost
nothing about the marketplace.
Andrew: A story states that you once entered a programming
contest at Stanford (I believe) and you submitted the winning entry, which
worked correctly after a single compilation. Is this story true? In
that vein, today’s developers frequently build programs writing small code
increments followed by immediate compilation and the creation and running of
unit tests. What are your thoughts on this approach to software development?
Donald: The story you heard is typical of legends that are based on only
a small kernel of truth. Here’s what actually happened:
John McCarthy decided in 1971 to have a Memorial Day Programming Race.
All of the contestants except me worked at his AI Lab up in the hills above
Stanford, using the WAITS time-sharing system; I was down on the main
campus, where the only computer available to me was a mainframe for which I
had to punch cards and submit them for processing in batch mode. I used
Wirth’s ALGOL W system
(the predecessor of Pascal). My program didn’t work the first time,
but fortunately I could use Ed Satterthwaite’s excellent offline debugging
system for ALGOL W, so I needed only two runs. Meanwhile, the folks using
WAITS couldn’t get enough machine cycles because their machine was so
overloaded. (I think that the second-place finisher, using that "modern"
approach, came in about an hour after I had submitted the winning entry with
old-fangled methods.) It wasn’t a fair contest.
As to your real question, the idea of immediate compilation and "unit
tests" appeals to me only rarely, when I’m feeling my way in a totally
unknown environment and need feedback about what works and what doesn’t.
Otherwise, lots of time is wasted on activities that I simply never need to
perform or even think about. Nothing needs to be "mocked up."
Andrew: One of the emerging problems for developers, especially
client-side developers, is changing their thinking to write programs in
terms of threads. This concern, driven by the advent of inexpensive
multicore PCs, surely will require that many algorithms be recast for
multithreading, or at least to be thread-safe. So far, much of the work
you’ve published for Volume 4 of The Art
of Computer Programming (TAOCP) doesn’t seem to touch
on this dimension. Do you expect to enter into problems of concurrency and
parallel programming in upcoming work, especially since it would seem to be
a natural fit with the combinatorial topics you’re currently working on?
Donald: The field of combinatorial algorithms is so vast that I’ll be
lucky to pack its sequential aspects into three or four physical
volumes, and I don’t think the sequential methods are ever going to be
unimportant. Conversely, the half-life of parallel techniques is very short,
because hardware changes rapidly and each new machine needs a somewhat
different approach. So I decided long ago to stick to what I know best.
Other people understand parallel machines much better than I do; programmers
should listen to them, not me, for guidance on how to deal with
simultaneity.
Andrew: Vendors of multicore processors have expressed
frustration at the difficulty of moving developers to this model. As a
former professor, what thoughts do you have on this transition and how to
make it happen? Is it a question of proper tools, such as better native
support for concurrency in languages, or of execution frameworks? Or are
there other solutions?
Donald: I don’t want to duck your question entirely. I might as well
flame a bit about my personal unhappiness with the current trend toward
multicore architecture. To me, it looks more or less like the hardware
designers have run out of ideas, and that they’re trying to pass the blame
for the future demise of Moore’s Law to the software writers by giving us
machines that work faster only on a few key benchmarks! I won’t be surprised
at all if the whole multithreading idea turns out to be a flop, worse than
the "Titanium" approach
that was supposed to be so terrific—until it turned out that the wished-for
compilers were basically impossible to write.
Let me put it this way: During the past 50 years, I’ve written well over
a thousand programs, many of which have substantial size. I can’t think of
even five of those programs that would have been enhanced
noticeably by parallelism or multithreading. Surely, for example, multiple
processors are no help to TeX.[1]
How many programmers do you know who are enthusiastic about these
promised machines of the future? I hear almost nothing but grief from
software people, although the hardware folks in our department assure me
that I’m wrong.
I know that important applications for parallelism exist—rendering
graphics, breaking codes, scanning images, simulating physical and
biological processes, etc. But all these applications require dedicated code
and special-purpose techniques, which will need to be changed substantially
every few years.
Even if I knew enough about such methods to write about them in TAOCP,
my time would be largely wasted, because soon there would be little reason
for anybody to read those parts. (Similarly, when I prepare the third
edition of
Volume
3 I plan to rip out much of the material about how to sort on magnetic
tapes. That stuff was once one of the hottest topics in the whole software
field, but now it largely wastes paper when the book is printed.)
The machine I use today has dual processors. I get to use them both only
when I’m running two independent jobs at the same time; that’s nice, but it
happens only a few minutes every week. If I had four processors, or eight,
or more, I still wouldn’t be any better off, considering the kind of work I
do—even though I’m using my computer almost every day during most of the
day. So why should I be so happy about the future that hardware vendors
promise? They think a magic bullet will come along to make multicores speed
up my kind of work; I think it’s a pipe dream. (No—that’s the wrong
metaphor! "Pipelines" actually work for me, but threads don’t. Maybe the
word I want is "bubble.")
From the opposite point of view, I do grant that web browsing probably
will get better with multicores. I’ve been talking about my technical work,
however, not recreation. I also admit that I haven’t got many bright ideas
about what I wish hardware designers would provide instead of multicores,
now that they’ve begun to hit a wall with respect to sequential computation.
(But my MMIX
design contains several ideas that would substantially improve the current
performance of the kinds of programs that concern me most—at the cost of
incompatibility with legacy x86 programs.)
Andrew: One of the few projects of yours that hasn’t been
embraced by a widespread community is literate programming.
What are your thoughts about why literate programming didn’t catch on? And
is there anything you’d have done differently in retrospect regarding
literate programming?
Donald: Literate programming is a very personal thing. I think it’s
terrific, but that might well be because I’m a very strange person. It has
tens of thousands of fans, but not millions.
In my experience, software created with literate programming has turned
out to be significantly better than software developed in more traditional
ways. Yet ordinary software is usually okay—I’d give it a grade of C (or
maybe C++), but not F; hence, the traditional methods stay with us. Since
they’re understood by a vast community of programmers, most people have no
big incentive to change, just as I’m not motivated to learn Esperanto even
though it might be preferable to English and German and French and Russian
(if everybody switched).
Jon Bentley
probably hit the nail on the head when he once was asked why literate
programming hasn’t taken the whole world by storm. He observed that a small
percentage of the world’s population is good at programming, and a small
percentage is good at writing; apparently I am asking everybody to be in
both subsets.
Yet to me, literate programming is certainly the most important thing
that came out of the TeX project. Not only has it enabled me to write and
maintain programs faster and more reliably than ever before, and been one of
my greatest sources of joy since the 1980s—it has actually been indispensable at times. Some of my major programs, such as the MMIX
meta-simulator, could not have been written with any other methodology that
I’ve ever heard of. The complexity was simply too daunting for my limited
brain to handle; without literate programming, the whole enterprise would
have flopped miserably.
If people do discover nice ways to use the newfangled multithreaded
machines, I would expect the discovery to come from people who routinely use
literate programming. Literate programming is what you need to rise above
the ordinary level of achievement. But I don’t believe in forcing ideas on
anybody. If literate programming isn’t your style, please forget it and do
what you like. If nobody likes it but me, let it die.
On a positive note, I’ve been pleased to discover that the conventions of
CWEB are already standard equipment within preinstalled software such as
Makefiles, when I get off-the-shelf Linux these days.
Andrew: In Fascicle 1 of Volume 1, you reintroduced the MMIX computer,
which is the 64-bit upgrade to the venerable MIX machine comp-sci students
have come to know over many years. You previously described MMIX in great
detail in MMIXware.
I’ve read portions of both books, but can’t tell whether the Fascicle
updates or changes anything that appeared in MMIXware, or whether it’s a
pure synopsis. Could you clarify?
Donald: Volume 1 Fascicle 1 is a programmer’s introduction, which
includes instructive exercises and such things. The MMIXware book is a
detailed reference manual, somewhat terse and dry, plus a bunch of literate
programs that describe prototype software for people to build upon. Both
books define the same computer (once the errata to MMIXware are incorporated
from my website). For most readers of TAOCP, the first fascicle
contains everything about MMIX that they’ll ever need or want to know.
I should point out, however, that MMIX isn’t a single machine; it’s an
architecture with almost unlimited varieties of implementations, depending
on different choices of functional units, different pipeline configurations,
different approaches to multiple-instruction-issue, different ways to do
branch prediction, different cache sizes, different strategies for cache
replacement, different bus speeds, etc. Some instructions and/or registers
can be emulated with software on "cheaper" versions of the hardware. And so
on. It’s a test bed, all simulatable with my meta-simulator, even though
advanced versions would be impossible to build effectively until another
five years go by (and then we could ask for even further advances just by
advancing the meta-simulator specs another notch).
Suppose you want to know if five separate multiplier units and/or
three-way instruction issuing would speed up a given MMIX program. Or maybe
the instruction and/or data cache could be made larger or smaller or more
associative. Just fire up the meta-simulator and see what happens.
Andrew: As I suspect you don’t use unit testing with MMIXAL,
could you step me through how you go about making sure that your code works
correctly under a wide variety of conditions and inputs? If you have a
specific work routine around verification, could you describe it?
Donald: Most examples of machine language code in TAOCP appear
in Volumes 1-3; by the time we get to Volume 4, such low-level detail is
largely unnecessary and we can work safely at a higher level of abstraction.
Thus, I’ve needed to write only a dozen or so MMIX programs while preparing
the opening parts of Volume 4, and they’re all pretty much toy
programs—nothing substantial. For little things like that, I just use
informal verification methods, based on the theory that I’ve written up for
the book, together with the MMIXAL assembler and MMIX simulator that are
readily available on the Net (and described in full detail in the MMIXware
book).
That simulator includes debugging features like the ones I found so
useful in Ed Satterthwaite’s system for ALGOL W, mentioned earlier. I always
feel quite confident after checking a program with those tools.
Andrew: Despite its formulation many years ago, TeX is still
thriving, primarily as the foundation for LaTeX. While TeX
has been effectively frozen at your request, are there features that you
would want to change or add to it, if you had the time and bandwidth? If so,
what are the major items you add/change?
Donald: I believe changes to TeX would cause much more harm than good.
Other people who want other features are creating their own systems, and
I’ve always encouraged further development—except that nobody should give
their program the same name as mine. I want to take permanent responsibility
for TeX and Metafont,
and for all the nitty-gritty things that affect existing documents that rely
on my work, such as the precise dimensions of characters in the Computer
Modern fonts.
Andrew: One of the little-discussed aspects of software
development is how to do design work on software in a completely new domain.
You were faced with this issue when you undertook TeX: No prior art was
available to you as source code, and it was a domain in which you weren’t an
expert. How did you approach the design, and how long did it take before you
were comfortable entering into the coding portion?
Donald: That’s another good question! I’ve discussed the answer in great
detail in Chapter 10 of my book
Literate Programming, together with Chapters 1 and 2 of my book
Digital Typography. I think that anybody who is really interested in
this topic will enjoy reading those chapters. (See also Digital
Typography Chapters 24 and 25 for the complete first and second drafts
of my initial design of TeX in 1977.)
Andrew: The books on TeX and the program itself show a clear
concern for limiting memory usage—an important problem for systems of that
era. Today, the concern for memory usage in programs has more to do with
cache sizes. As someone who has designed a processor in software, the issues
of cache-aware and cache-oblivious algorithms surely must have crossed your radar
screen. Is the role of processor caches on algorithm design something that
you expect to cover, even if indirectly, in your upcoming work?
Donald: I mentioned earlier that MMIX provides a test bed for many
varieties of cache. And it’s a software-implemented machine, so we can
perform experiments that will be repeatable even a hundred years from now.
Certainly the next editions of Volumes 1-3 will discuss the behavior of
various basic algorithms with respect to different cache parameters.
In Volume 4 so far, I count about a dozen references to cache memory and
cache-friendly approaches (not to mention a "memo cache," which is a
different but related idea in software).
Andrew: What set of tools do you use today for writing TAOCP?
Do you use TeX? LaTeX? CWEB? Word processor? And what do you use for the
coding?
Donald: My general working style is to write everything first with pencil
and paper, sitting beside a big wastebasket. Then I use Emacs to enter the
text into my machine, using the conventions of TeX. I use tex, dvips, and gv
to see the results, which appear on my screen almost instantaneously these
days. I check my math with Mathematica.
I program every algorithm that’s discussed (so that I can thoroughly
understand it) using CWEB, which works splendidly with the GDB debugger. I
make the illustrations with MetaPost (or, in rare cases, on a Mac with Adobe Photoshop or
Illustrator). I have some homemade tools, like my own spell-checker for TeX
and CWEB within Emacs. I designed my own bitmap font for use with Emacs,
because I hate the way the ASCII apostrophe and the left open quote have
morphed into independent symbols that no longer match each other visually. I
have special Emacs modes to help me classify all the tens of thousands of
papers and notes in my files, and special Emacs keyboard shortcuts that make
bookwriting a little bit like playing an organ. I prefer
rxvt to xterm for terminal
input. Since last December, I’ve been using a file backup system called
backupfs, which meets
my need beautifully to archive the daily state of every file.
According to the current directories on my machine, I’ve written 68
different CWEB programs so far this year. There were about 100 in 2007, 90
in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has an extremely
convenient "change file" mechanism, with which I can rapidly create multiple
versions and variations on a theme; so far in 2008 I’ve made 73 variations
on those 68 themes. (Some of the variations are quite short, only a few
bytes; others are 5KB or more. Some of the CWEB programs are quite
substantial, like the 55-page BDD package that I completed in January.)
Thus, you can see how important literate programming is in my life.
I currently use Ubuntu Linux, on a
standalone laptop—it has no Internet connection. I occasionally carry flash
memory drives between this machine and the Macs that I use for network
surfing and graphics; but I trust my family jewels only to Linux.
Incidentally, with Linux I much prefer the keyboard focus that I can get
with classic FVWM to the
GNOME and KDE environments that other people seem to like better. To each
his own.
Andrew: You state in the preface of Fascicle 0 of Volume 4 of
TAOCP that Volume 4 surely
will comprise three volumes and possibly more. It’s clear from the text that
you’re really enjoying writing on this topic. Given that, what is your
confidence in the note posted on the TAOCP website that Volume 5
will see light of day by 2015?
Donald: If you check the Wayback Machine for previous incarnations of
that web page, you will see that the number 2015 has not been constant.
You’re certainly correct that I’m having a ball writing up this material,
because I keep running into fascinating facts that simply can’t be left
out—even though more than half of my notes don’t make the final cut.
Precise time estimates are impossible, because I can’t tell until getting
deep into each section how much of the stuff in my files is going to be
really fundamental and how much of it is going to be irrelevant to my book
or too advanced. A lot of the recent literature is academic one-upmanship of
limited interest to me; authors these days often introduce arcane methods
that outperform the simpler techniques only when the problem size exceeds
the number of protons in the universe. Such algorithms could never be
important in a real computer application. I read hundreds of such papers to
see if they might contain nuggets for programmers, but most of them wind up
getting short shrift.
From a scheduling standpoint, all I know at present is that I must
someday digest a huge amount of material that I’ve been collecting and
filing for 45 years. I gain important time by working in batch mode: I don’t
read a paper in depth until I can deal with dozens of others on the same
topic during the same week. When I finally am ready to read what has been
collected about a topic, I might find out that I can zoom ahead because most
of it is eminently forgettable for my purposes. On the other hand, I might
discover that it’s fundamental and deserves weeks of study; then I’d have to
edit my website and push that number 2015 closer to infinity.
Andrew: In late 2006, you were diagnosed with prostate cancer.
How is your health today?
Donald: Naturally, the cancer will be a serious concern. I have superb
doctors. At the moment I feel as healthy as ever, modulo being 70 years old.
Words flow freely as I write TAOCP and as I write the literate
programs that precede drafts of TAOCP. I wake up in the morning
with ideas that please me, and some of those ideas actually please me also
later in the day when I’ve entered them into my computer.
On the other hand, I willingly put myself in God’s hands with respect to
how much more I’ll be able to do before cancer or heart disease or senility
or whatever strikes. If I should unexpectedly die tomorrow, I’ll have no
reason to complain, because my life has been incredibly blessed. Conversely,
as long as I’m able to write about computer science, I intend to do my best
to organize and expound upon the tens of thousands of technical papers that
I’ve collected and made notes on since 1962.
Andrew: On your website, you mention that the Peoples Archive
recently made a series of videos in which you reflect on your past life. In
segment 93, "Advice to Young People," you advise that people shouldn’t do
something simply because it’s trendy. As we know all too well, software
development is as subject to fads as any other discipline. Can you give some
examples that are currently in vogue, which developers shouldn’t adopt
simply because they’re currently popular or because that’s the way they’re
currently done? Would you care to identify important examples of this
outside of software development?
Donald: Hmm. That question is almost contradictory, because I’m basically
advising young people to listen to themselves rather than to others, and I’m
one of the others. Almost every biography of every person whom you would
like to emulate will say that he or she did many things against the
"conventional wisdom" of the day.
Still, I hate to duck your questions even though I also hate to offend
other people’s sensibilities—given that software methodology has always been
akin to religion. With the caveat that there’s no reason anybody should care
about the opinions of a computer scientist/mathematician like me regarding
software development, let me just say that almost everything I’ve ever heard
associated with the term "extreme
programming" sounds like exactly the wrong way to go...with one
exception. The exception is the idea of working in teams and reading each
other’s code. That idea is crucial, and it might even mask out all the
terrible aspects of extreme programming that alarm me.
I also must confess to a strong bias against the fashion for reusable
code. To me, "re-editable code" is much, much better than an untouchable
black box or toolkit. I could go on and on about this. If you’re totally
convinced that reusable code is wonderful, I probably won’t be able to sway
you anyway, but you’ll never convince me that reusable code isn’t mostly a
menace.
Here’s a question that you may well have meant to ask: Why is the new
book called Volume 4 Fascicle 0, instead of Volume 4 Fascicle 1? The answer
is that computer programmers will understand that I wasn’t ready to begin
writing Volume 4 of TAOCP at its true beginning point, because we
know that the initialization of a program can’t be written until the program
itself takes shape. So I started in 2005 with Volume 4 Fascicle 2, after
which came Fascicles 3 and 4. (Think of Star Wars, which began with
Episode 4.)
About: Highlight is a universal converter from source code to HTML,
XHTML, RTF, TeX, LaTeX, and XML. (X)HTML output is formatted by Cascading Style
Sheets. It supports more than 100 programming languages, and includes 40 highlighting
color themes. It's possible to easily enhance the parsing database. The converter
includes some features to provide a consistent layout of the input code.
Changes: Embedded output instructions specific to the output document
format were added. Support for Arc and Lilypond was added.
Linux Cross Referencing or
LXR is a very versatile tool for
generating cross-referenced HTML files for source-codes in C (and C++, I think).
For example, you can browse through the linux source code, as indicated
here.
Literate programming systems have the following properties:
Code and extended, detailed comments are intermingled.
The code sections can be written in whatever order is best for people
to understand, and are re-ordered automatically when the computer needs
to run the program.
The program and its documentation can be handsomely typeset into a single
article that explains the program and how it works. Indices and cross-references
are generated automatically.
POD only does task 1, but the other tasks are much more important.
Literate programming is an interesting idea, and worth looking into, but
if we think that we already know all about it, we won't bother. Let's bother.
For an introduction,
see Knuth's original
paper which has a short but complete example. For a slightly longer example,
here's a library I wrote in
literate style that manages 2-3 trees in C.
Andrew Johnson's new book Elements
of Programming with Perl uses literate programming techniques extensively,
and shows the source code for a literate programming system written in Perl.
Finally, the Literate Programming
web site has links to many other resources, including literate programming
environments that you can try out yourself.
Whatever the origins of literate programming, there's no doubt that its fame
and/or infame6 comes from the great King Knuth. For 'twas he, the
noble El Don, who first propounded his version of the "literate" approach to
coding in that spooky, fatidic year 1984.7 His ideas were later amplified
and published in 1992 (Literate Programming. Lecture notes, Center for the Study
of Language and Information, Stanford). Those who in-joke about the publicational
time gap between volumes 3 and 4 of Knuth's magnum opus, TAOCP (The Art of Computer
Programming), should remember this and all the other fine work that has distracted
him, especially his Herculean efforts in typesetting and typographical computing.
Barton's plea under the mantras, "Code isn't just for computers" and "Reading
programs for pleasure," is to promote code that humans can enjoy reading for
the sheer fun of it, in the same way, for example, that they can enjoy curling
up in bed with their favorite Trollope (an author carefully chosen for a cheap
thrill unworthy of this august journal). We note first a possible confusion
or overlap between literate and literate programming. Dijkstra tends to stress
literacy in the sense of fluent command of one's working/publishing tongue (and
that really means English for most practical purposes), so that all text not
directly compilable, such as comments and explanations, would be written crisply
and free from ambiguities. Barton seems to be seeking a literate flair in the
code itself.
... ... ...
Back comes the cry: "But debugging and maintenance
demand code legibility." Here follows a bifurcation in the literate
programming route. Ray Giguette sees a helpful literate role right at the start
of the project, using literate analogies to shape our approach to software design.12
Robert McGrath dismisses this too brusquely, I believe, while admitting that
even weak analogies may help to improve understanding between humans involved
in design and coding.13
“When software became merchandise, the opportunity vanished of teaching software
development as a craft and as artistry”.
2005-08-05 (freesoftwaremagazine.com)
Diomidis Spinellis, author of Code Reading: The Open Source Perspective,
is one of the first of what we will come to know as the literate critics of
code. His book is unlike any other programming book that came before it and
for a very exciting reason. What makes it unique is that Spinellis is teaching
us how to read source code instead of merely how to write it. Spinellis
hopes that after reading his book, “You may read code purely for your own pleasure,
as literature” (2). What I want to emphasize here is that word pleasure.
As long as we merely view code as something practical; as a means designed,
for better or worse, to reach certain practical ends, then we will
never see the flourishing of the literature that Spinellis describes. What must
happen first is the cultivation of a new audience for code. We desire a readership
that derives a different sort of pleasure from reading magnificent code than
those who have come before them. Whereas, generally speaking, most readers of
code today judge code based on the familiar criteria of precision,
concision, efficiency, and correctness, these future
readers will speak of the beauty of code and the artistry
of a well-wrought script. We will, perhaps, print out the programs of our favorite
coders and read them in the bathtub. Furthermore, we will do so for no other
reason than that we will enjoy doing so; we will as eagerly await the
next Miguel de Icaza as we would the novels of our favorite author or the films
of our favorite director. Even now, the first rays of this new art are shooting
across the horizon; tomorrow, we will shield our eyes against its brilliance.
Richard P. Gabriel and Ron Goldman’s fabulous essay
Mob Software: The Erotic
Life of Code makes many of the points that I will attempt to explicate here.
One of their theses is that “When software became
merchandise, the opportunity vanished of teaching software development as a
craft and as artistry”. For Gabriel and Goldman, faceless corporations
have reduced coding to a lowly craft; code is just another disposable product
that is only useful for furthering some corporate agenda. Such base motives
have prevented coding from flourishing as a literature. Gabriel and Goldman
describe the pitfalls of proprietary software development and ask a rather compelling
question:
It’s as if all writers had their own private companies and only people in
the Melville company could read Moby-Dick and only those in Hemingway’s
could read The Sun Also Rises. Can you imagine developing a rich
literature under these circumstances?
... ... ...
Author of the classic Art of Computer Programming books, Knuth firmly
believes that programming can reach literate proportions. As early as 1974,
Knuth was arguing that computer programming is more artistic than most people
realize. “When I speak about computer programming as an art,” writes Knuth,
“I am thinking primarily of it as an art form, in an aesthetic sense.
The chief goal of my work is to help people learn how to write beautiful
programs” (670). Knuth’s passion and zeal for artistic coding is revealed in
such lines as “it is possible to write grand programs, noble
programs, truly magnificent ones!” (670). For Knuth, this means that
programmers must think of far more than how effectively their code will compile.
... ... ..
The fine art of coding
In a 1983 article entitled “Literate Programming,” Knuth argues that
“the time is ripe for significantly better documentation
of programs, and that we can best achieve this by considering programs to be
works of literature” (1). Knuth’s project at that time was
literate programming, which is a combination of a document formatting
language and a programming language. The idea was to greatly extend what can
be done with embedded comments; in short, to make source code as readable as
documentation that might accompany it. The goal was not to necessarily make
code that would run more efficiently on a computer; the point was to make code
more interesting and enlightening to human beings. The result of Knuth’s efforts
was WEB, a combination of PASCAL and TeX, and the newer CWEB, which offers C,
C++, or JAVA instead of PASCAL. WEB and CWEB allow programmers like Knuth to
write “essays” on coding that resemble Pope’s essay on poetry.
One of Knuth’s projects was to take the Will Crowther masterpiece ADVENTURE
and rewrite it with CWEB. The results are marvellous. It is a joy to read this
code. The best way I can describe the pleasure I derive from reading it is to
compare it to listening to really good director’s commentary on a special-edition
DVD. It’s like having a wizened and witty old friend reading along with me as
I study the code. How many source code files have you read with comments like
this:
Now here I am, 21 years later, returning to the great Adventure after having
indeed had many exciting adventures in Computer Science. I believe people
who have played this game will be able to extend their fun by reading its
once-secret program. Of course I urge everybody to play the game first,
at least ten times, before reading on. But you cannot fully appreciate the
astonishing brilliance of its design until you have seen all of the surprises
that have been built in.
Knuth has something here. Knuth’s CWEB “commentary” of Adventure
isn’t the heavily abbreviated, arcane gibberish that passes for comments in
most source code, nor is it slavishly didactic and only concerned with teaching.
It is in many ways comparable to Pope’s essay; we have a coder representing
in code what is magnificent about code and how one ought to judge it.
It is something we will likely to be studying fifty years from now with the
same reverence with which we approach “The Essay on Criticism” today.
It seems inevitable that as free and open source software
community continues to grow, the need for “literate” programming techniques
will increase exponentially
Jef Raskin, author of The Humane Interface, recently presented us
with an essay entitled “Comments are More Important Than Code.” He refers to
Knuth’s work as “gospel for all serious programmers.” Though Raskin is mostly
concerned with the economic relevance of good commenting practice,
I welcome his criticism of modern programming languages “that do not allow full
flowing and arbitrarily long comments is seriously behind the times.” It seems
inevitable that as free and open source software community continues to grow,
the need for “literate” programming techniques will increase exponentially.
After all, programmers that no one understands (much less admires) are unlikely
to win much influence, despite their cleverness.
Coding: art or science?
Of the many intriguing topics that Knuth has contemplated over the years
is whether programming should be considered an art or a science. Always something
of a linguist, Knuth examines the etymology of both terms in a 1974 essay called
“Computer Programming as an Art.” His results indicate that real confusion exists
about how to interpret the terms “art” and “science,” even though we seem
to know what we mean when we claim that computer programming is a “science”
and not an “art.” We call the study of computers “computer science,” Knuth writes,
because “there is something undesirable about an area of human activity that
is classified as an ‘art’; it has to be a Science before it has any real stature”
(667). Yet Knuth argues that “when we prepare a program, it can be like composing
poetry or music” (670). The key to this transformation is to embrace “art for
art’s sake,” that is, to freely and unashamedly write code for fun.
Coding doesn’t always have to be for the sake of utility. Artful coding
can be done for its own sake, without any thought about how it might eventually
serve some useful purpose.
Daniel Kohanski, author of a wonderful little book entitled The Philosophical
Programmer, has much to say about what he calls the “aesthetics of programming.”
Now, when most folks talk about aesthetics, they are speaking about
what makes the beautiful so beautiful. If I see a young lady and tell you that
I find her aesthetically pleasing, I’m not talking about how much she can bench-press
or how accurately she can shoot. Yet this seems to be what Kohanski means when
he talks of aesthetical programming:
While aesthetics might be dismissed as merely expressing a concern for appearances,
its encouragement of elegance does have practical advantages. Even so prosaic
an activity as digging a ditch is improved by attention to aesthetics; a
ditch dug in a straight line is both more appealing and more useful than
one that zigzags at random, although both will deliver the water from one
place to the other. (11)
I feel a sad irony that Kohanski chooses the metaphor of a ditch
to describe what he considers aesthetic code. Coders have been stuck in this
rut for quite some time. We take something as wonderful and amazing as programming,
and compare it to perhaps the lowliest manual labor on earth: the digging of
ditches. If conciseness, durability, and efficiency are all that matters, programmers
work without art and grace and might as well wield shovels instead of keyboards.
Let me set a few things straight here. When most people try to establish
“Science and Art” as binary oppositions, they would generally do better to use
the terms “Engineers and Artists.” Computer programming can be thought
of from a strictly engineering perspective—that is, an application of the principles
of science towards the service of humanity. Civil engineering, for instance,
involves building safe and secure bridges. According to the Oxford English
Dictionary, the word engineer was first used as a term for those
who constructed siege engines—war machinery. The word still carries a very practical
connotation; we expect engineers to be precise, clever, and so on, but expect
a far different set of qualities from those we term artists. Whereas
the stereotypical engineer is an introvert with a pocket protector and calculator
wristwatch, the stereotypical artist is someone like Salvador Dali—a wild, eccentric
type who is poorly understood, yet wildly revered. We expect our artists to
be unpredictable and delightfully social beings—who really understand the human
condition. We expect engineers to be pretty dull folks to have around at parties.
They are the painters who have convinced themselves that
because they cannot sell their frescoes, that painting houses is the only
sensible thing one can do with a paintbrush
Such oppositions are seldom useful and more often misleading. We might think
of the man insisting that programming is a “science” as equally intelligent
as his companion, Tweedledum, who insists that it is quite obviously an art.
The truth, according to Knuth, is that programming is “both a science and an
art, and that the two aspects nicely complement each other” (669). Like civil
engineering, programming involves the application of mathematics. Like poetry,
programming involves the application of aesthetics. As with bridges, some programs
are mundane things that clearly serve only to get folks across bodies of water,
whereas others, like the Golden Gate Bridge, are magnificent structures rightly
regarded as national landmarks. Unfortunately, the modern discourse surrounding
computer programming is far too slanted towards the banal; even legends of the
field cannot bring themselves to see their calling as anything but a useful
but dull craft. They are the painters who have convinced themselves that because
they cannot sell their frescoes, that painting houses is the only sensible thing
one can do with a paintbrush.
The future of programming as art
Computer programming is not limited to engineering, nor must coders always
think first of efficiency. Programming is also an art, and, what’s more, it’s
an art that shouldn’t be limited to what is “optimal”. Even though programs
are usually written to be parsed and executed by computers, they are also read
by other human beings, some of whom, I dare say, exercise respectable taste
and appreciate good style. We’ve misled ourselves into thinking that computer
programming is some “exact science,” more akin to applied physics than fine
art, yet my argument here is that what’s really important in the construction
of programs isn’t always how efficiently they run on a computer—or even if they
work at all. What’s important is whether they are beautiful and inspiring to
behold; if they are sublime and share some of the same features that make masterful
plays, compositions, sculptures, paintings, or buildings so magnificent. A programmer
who defines a good program simply as “one that best does the job with the least
use of a computer’s resources” may get the job done, but he certainly is a dull,
uninspiring fellow. I wish to celebrate programmers who are willing to dispense
with this slavish devotion to efficiency and see programming as an art in its
own right; having not so much to do with computers as other human beings who
have the knowledge and temperament to appreciate its majesty.
It is all too easy to transpose historical developments in literature and
literate criticism onto computer programming. Undoubtedly, such a practice is
at best simplistic—at worst it is myopic. Comparisons to poetry, as Gabriel
and Goldman point out, are all too tempting. Like poetry, coding is at once
imaginative and restricted:
Release is reined in by restraint: requirements of form, grammar, sentence-making,
echoes, rhyme, rhythm. Without release there could be nothing worth reading;
the erotic pleasure of pure meandering would be unapproached. Without restraint
there cannot be sense enough to make the journey worth taking.
It is quite possible to look at the source code of a C++ program and imagine
it to be a poem; some experiment with “free verse” making clever use of programming
conventions. Such comparisons, while certainly intriguing, are not what I’m
interested in pursuing. Likewise, I am not arguing that artistic coding is simply
inserting well-written comments. I would not be interested in someone’s effort
to integrate a Shakespearean sonnet into the header file of an e-mail client.
Instead, I’ve tried to assert that coding itself can be artistic; that eloquent
commenting can complement, but not substitute for, eloquent coding.
To do so would be to claim that it is more important for artists to know how
to describe their paintings than to paint them. Clearly, the future of programming
as art will involve both types of skills; but, more importantly, the most artistic
among us will be those who have defected from the rank and file of engineers
and refused to kneel before the altar of efficiency. For these future Byrons
and Shelleys, the scripts unfolding beneath their fingers are not some disposable
materials for the commercial benefit of some ignorant corporate juggernaut.
Instead, they will be sacred works; digital manifestations of the spirit of
these artists. We should treat them with the same care and respect we offer
hallowed works in other genres, such as Rodin’s Thinker, Virgil’s
Aeneid, Dante’s Inferno, or Pope’s Essay on Criticism.
Like these other masterpieces, the best programs will stand the test of time
and remain impervious to the raging rivers of technological and social change
that crash against them.
To really appreciate the fine art of computer programming,
we must separate what works well in a given computer from what
represents artistic genius, and never conflate the two—for the one
is a fleeting, forgettable thing, but the other will never die
This question of permanence is perhaps where we find ourselves stumbling
in our apology for programming. How can we talk of a program as a “masterpiece”,
knowing that, given the rate of technological development that it may soon become
so obsolete as not to function in our computers? Yet here is the reason that
I have stressed how insignificant it is that a program actually works
for it to be rightly considered magnificent. Indeed, I find it almost certain
that we will find ourselves with programs whose utter brilliance we will not
be capable of recognizing for decades, if not centuries. We can imagine, for
instance, a videogame written for systems more sophisticated than any in production
today. Likewise, any programmer with any maturity whatsoever can appreciate
the inventiveness of the early pioneers, who wrought miracles far more impressive
in scope than the humble achievements so brazenly trumpeted in the media today.
To really appreciate the fine art of computer programming, we must separate
what works well in a given computer from what represents artistic
genius, and never conflate the two—for the one is a fleeting, forgettable
thing, but the other will never die.
Highlight is a universal converter from source code to HTML, XHTML, RTF,
TeX, LaTeX, and XML. (X)HTML output is formatted by Cascading Style Sheets.
It supports more than 100 programming languages, and includes 40 highlighting
color themes. It's possible to easily enhance the parsing database. The converter
includes some features to provide a consistent layout of the input code.
Release focus: Minor bugfixes
Changes:
This release fixes XML parsing and adds a new option to set the CSS class name
prefix for HTML output.
The following paragraphs discuss the main
benefits of traditional
literate programming. Note: none of these benefits depends
on printed output.
Design and coding happen at the highest possible level. The
names of
sections are constrained only by one's design skill, not by any rules of language.
You say what you mean, and that becomes both the design and the code. You never
have to simulate a concept because concepts become
section names.
The visual weight of code is separate from its actual length.
The visual weight of a
section is simply the length and complexity of the
section name, regardless of how complex the actual definition of the section
is. The results of this separation are spectacular. No longer is one reluctant to
do extensive error handling (or any other kind of minutia) for fear that it would
obscure the essence of the program. Donald Knuth stresses this aspect of literate
programming and I fully agree.
Sections show relations between snippets of code.
Sections can show and enforce relationships between apparently unrelated pieces
of code. Comments, macros or functions are other ways to indicate such relationships,
but often sections are ideal. Indeed, a natural progression is to create sections
as a matter of course. I typically convert a section to a function only when it
becomes apparent that a function's greater generality outweighs the inconvenience
of having to declare and define the function.
Complex section names invite improvements. A
section name is complex when it implies unwholesome dependencies between the
caller (user) of the section and the
section itself. Such section names tend to be conspicuous, so that the programmer
is lead to revise both the section name and its purpose. Many times my attention
has been drawn to a poorly conceived section because I didn't like what its name
implied. I have always been able to revise the code to improve the design, either
by splitting a section into parts or be simplifying its relation to colleagues.
Sections create a place for extensive comments. One of the most
surprising thing about
literate programming is how severely traditional programming tends to limit
comments. In a conventional program the formatting of code must indicate structure,
and comments obscure that formatting.
Sections in literate programming provide a place for lengthy comments that do
not clutter the code at the place the section is
referenced.
Section names eliminate mundane comments. The
section name often says it all. The
reference to the section says everything that the user needs to know, and the
section name at the point of definition also eliminates the need for many comments.
"A cloned node is a copy of a
node that changes when the original changes. Changes to the
children,
grandchildren, etc. of a node are simultaneously made to the corresponding nodes
contained in all cloned nodes. A small red arrow in icon boxes marks clones.
Please take a few moments to experiment with clones. Start with a single node,
say a
node whose
headline is A. Clone node A using the CloneNode
command in Leo's Outline menu. Both clones are identical; there is no distinction
between the original node and any of its clones.
Type some text into the body of either node A. The same text appears in the bodies
of all other
clones of A. Now insert a node, say B, as a child of any of the A nodes. All
the A nodes now have a B child. See what happens if you clone B. See what happens
if you insert, delete or move nodes that are
children of A. Verify that when the second-to-last cloned node is deleted the
last cloned node becomes a regular node again.
Clones are much more than a cute feature. Clones allow multiple views
of data to exist within a single outline. The ability to create multiple
views of data is crucial; you don't have to try to decide what is the 'correct'
view of data. You can create as many views as you like, each tailored exactly to
the task at hand."
"I am using Leo since a few weeks and I brim over with enthusiasm for it.
I think it is the most amazing software since the invention of the spreadsheet."
"We who use Leo know that it is a breakthrough tool and a whole new way of
writing code." -- Joe Orr
"I am a huge fan of Leo. I think it's quite possibly the most revolutionary
programming tool I have ever used and it (along with the Python language) has
utterly changed my view of programming (indeed of writing) forever." -- Shakeeb
Alireza
"Thank you very much for Leo. I think my way of working with data will change
forever... I am certain [Leo] will be a revolution. The revolution is as important
as the change from sequential linear organization of a book into a web-like
hyperlinked pages. The main concept that impress me is that the source listing
isn't the main focus any more. You focus on the non-linear, hierarchical, collapsible
outline of the source code." -- Korakot Chaovavanich
"Leo is a quantum leap for me in terms of how many projects I can manage
and how much information I can find and organize and store in a useful way."
-- Dan Winkler
"Wow, wow, and wow...I finally understand how to use clones and I realized
that this is exactly how I want to organize my information. Multiple views on
my data, fully interlinkable just like my thoughts." -- Anon
"Edward... you've come up with perhaps the most powerful new concept in code
manipulation since VI and Emacs. -- David McNab
"Leo is...a revolutionary step in the right direction for programming." --
Brian Takita
The Doxygen configuration is kept in
abi/src/.doxygen.cfg.
The INPUT variable
contains the list of directories to be scanned when generating documentation.
At present time only the text directory (the AbiWord backend) is actually scanned
- but it's simple to add other directories.
Each component of AbiWord has an overview
description stored in a README.TXT
file. This is where you want to put the grand overview - and please add text
if you gain insight on stuff not presently documented in the
README.TXT files.
From the
README.TXT files you can
refer to class/function names and the outcome is nice guided tour where people
can read the overview description and dive into the code from there. It is of
course also possible to just go directly to the various hierarchies and lists
at the top of all pages.
AbiWord Doxygen
Style Guide
Just a few guidelines for now. See
fp_Container which adheres to these (I think) and is comment complete.
Please try to adhere to these as it makes for
more consistent documentation (looks as well as content) - which gives a more
professional feel to it. If you have ideas for other guidelines, please post
them to the developer list and we'll discuss it.
KISS! We don't want the source code to drown in fancy formatted comments.
Comments should be kept in raw ASCII where possible. If you feel structure
or typeface commands would help, use the HTML tags which most people understand.
The first line of a comment block is the brief description (do not use
\brief). Follow it by input/output descriptions, then a longer comment if
necessary. Finally add \note, \bug, \see, \fixme as necessary.
Put the descriptions by the function definition, not the declaration.
Always use the
/*!
Short description
\param param1 Param 1 Description
long descriptions should be indented like this
<repeat as necessary>
\retval paramN+1 Return value ParamN+1 description
<repeat as necessary>
\return Return value description
Long description
...
\note Note ...
<repeat as necessary>
\fixme FIXME description 1
<repeat as necessary>
\bug Bug description 1 <you can add URL to bugzilla here>
<repeat as necessary>
\see otherClass::otherFunction1
<repeat as necessary>
*/
variant of the comment marker, and leave the opening and closing markers
on empty lines.
In the brief line, describe what the function does, not how
it does it. Leave the input/output details to the appropriate lines (accessors
excepted). See
fp_Container::isEmpty.
Always add input/output details for a function: \param, \retval (return
value via pointer parameter), \return (actual function return value).
A list of quick
hints
about doxygen syntax. Please see
www.doxygen.org for the
full syntax.
Suppress links with % (doxygen will add links to any function or filename).
Note that "-" characters can be used for simple bullet list creation.
I wonder if we should suppress that in favor of HTML tags.
Sometimes you want to write a class name or similar in plural, but doxygen
will not add a link. You can work around that by something like "\link
fp_Run fp_Runs \endlink" but it's horrible to look at in the raw text.
So do without it, rephrase, put the singular word in parenthesis "fp_Runs
(fp_Run)",
or assume the reader can find the class in the class list.
Add references to named sections of the documentation with \ref (e.g.
Formatter which links to the README.TXT in the text/fmt
directory on account of it having a \page command).
We may also want to discuss allowing simple figures for documenting hairy
code. I think it should be possible - but it should not be done on account of
comment text: the programmer should not be required to look at the doxygen output
to understand the code!
Do we want the brief descriptions and return/param text to be in a certain
language style? Would help make the doc look consistent, but may be too much
detail for people to bother with complying. Please see
fp_Container for a suggested style (i.e., compute vs. computes).
Andrew Johnson's new book
Elements of
Programming with Perl uses literate programming techniques extensively,
and shows the source code for a literate programming system written in Perl.
Finally,
the Literate
Programming web site has links to many other resources, including literate
programming environments that you can try out yourself.
I think the author missed the main appeal of literate programming:
by trying to document program during writign you impove the quality of the program
even if nobody will read this part.
It may, however, very well be worthwhile and useful to consider
more symmetric relationships between program and documentation. Thus, instead
of embedding one kind of information into the other, we can instead model documentation
and program fragments as separate entities tied together with relations. The
relations can be implemented in a number of diffent ways, e.g., as hypertext
links or via database technology.
Literate programming systems have the following properties:
Code and extended, detailed comments are intermingled.
The code sections can be written in whatever order is best
for people to understand, and are re-ordered automatically when
the computer needs to run the program.
The program and its documentation can be handsomely typeset
into a single article that explains the program and how it works.
Indices and cross-references are generated automatically.
POD only does task 1, but the other tasks are much more important.
Literate programming is an interesting idea, and worth looking
into, but if we think that we already know all about it, we won't
bother. Let's bother. For an introduction,
see Knuth's
original paper which has a short but complete example. For a
slightly longer example,
here's a library
I wrote in literate style that manages 2-3 trees in C.
Andrew Johnson's new book
Elements of Programming
with Perl uses literate programming techniques extensively,
and shows the source code for a literate programming system written
in Perl.
Finally, the Literate
Programming web site has links to many other resources, including
literate programming environments that you can try out yourself.
Doxygen
is a documentation system for C++, C, Java, IDL (Corba and Microsoft flavors) and
to some extent PHP and C#.
It can help you in three ways:
It can generate an on-line documentation browser (in HTML) and/or an off-line
reference manual (in
) from a set
of documented source files. There is also support for generating output in RTF
(MS-Word), PostScript, hyperlinked PDF, compressed HTML, and Unix man pages.
The documentation is extracted directly from the sources, which makes it much
easier to keep the documentation consistent with the source code.
You can
configure doxygen to extract the code structure from undocumented source
files. This is very useful to quickly find your way in large source distributions.
You can also visualize the relations between the various elements by means of
include dependency graphs, inheritance diagrams, and collaboration diagrams,
which are all generated automatically.
You can even `abuse' doxygen for creating normal documentation (as I did
for this manual).
Doxygen is developed under Linux,
but is set-up to be highly portable. As a result, it runs on most other Unix flavors
as well. Furthermore, executables for Windows 9x/NT and Mac OS X are available.
Projects using doxygen: I have compiled a
list of projects
that use doxygen. If you know other projects, let me know and I'll add them.
Although doxygen is used successfully by a lot of people already, there is always
room for improvement. Therefore, I have compiled a
todo/wish list of
possible and/or requested enhancements.
Development has now moved to
sourceforge. See the development
section below for more information.
The Linux Cross-Reference project is the testbed
application of a general hypertext cross-referencing tool. (Or the other way
around.)
The main goal of the project is to create a versatile
cross-referencing tool for relatively large code repositories. The project is
based on stock web technology, so the codeview client may be chosen from the
full range of available web browsers. On the server side, the prototype implementation
is based on an Apache web
server, but any Unix-based web server with cgi-script capability should do nicely.
(The prototype implementaion is running on a dual Pentium Pro Linux box.)
The main feature of the indexer is of course
the ability to jump easily to the declaration of any global identifier. Indeed,
even all references to global identifiers are indexed. Quick access to
function declarations, data (type) definitions and preprocessor macros makes
code browsing just that tad more convenient. At-a-glance overview of e.g. which
code areas that will be affected by changing a function or type definition should
also come in useful during development and debugging.
Other bits of hypertextual sugar, such as e-mail
and include file links, are provided as well, but is on the whole, well, sugar.
Some minimal visual markup is also done. (Style sheets are considered as a way
to do this in the future.)
Technicalities
The index generator is written in
Perl and relies heavily on
Perl's regular expression facilities. The algorithm used is very brute force
and extremely sloppy. The rationale behind the sloppiness is that too little
information renders the database useless, while too much information simply
means the users have to think and navigate at the same time.
The Linux source code, with which the project
has initially been linked, presents the indexer with some very tough obstacles.
Specifically, the heavy use of preprocessor macros makes the parsing a virtual
nightmare. We want to index the information in the preprocessor directives as
well as the actual C code, so we have to parse both at once, which leads to
no end of trouble. (Strict parsing is right out.) Still, we're pretty satisfied
with what the indexer manages to get out of it.
There's also the question of actually broken
code. We want to reasonably index all code portions, even if some of it is not
entirely syntactically valid. This is another reason for the sloppiness.
There are obviously disadvantages to this approach.
No scope checking is done, and the most annoying effect of this is mistaking
local identifers for references to global ones with the same name. This particular
problem (and others) can only be solved by doing (almost) full parsing. The
feasibility of combining this with the fuzzy way indexing is currently done
is being looked into.
An identifier is a macro, typedef, struct, enum,
union, function, function prototype or variable. For the Linux source code between
50000 and 60000 identifiers are collected. The individual files of the sourcecode
are formatted on the fly and presented with clickable identifiers.
It is possible to search among the identifiers
and the entire kernel source text. The freetext search is implemented using
Glimpse, so all the capabilities
of Glimpse are available. Especially the regular expression search capabilities
are useful.
Availiablility
The sourcecode for the LXR engine is of course
availiable. It is released under the
GNUCopyleft
license. Version 0.3 can now be
downloaded.
You can use it to index your own projects. Version 0.3 includes C++ support
and a much nicer diff markup than before. Please tell us if you have trouble
with the installation. Also, be aware that the documentation is still rather
incomplete. Jim Greer has been kind enough to write some more comprehensive
installation instructions. If you have trouble look at his
installation instructions.
In this paper we introduce HyperCode, a HyperText
representation of program source code. Using HTML for code presentation, HyperCode
provides links from uses of functions, types, variables, and macros to their
respective definition sites; similarly, definitions are linked to lists-of-links
back to use sites. Standard HTML browsers such as Mosaic thereby become
powerful tools for understanding program control flow, functional dependencies,
data structures, and macro and variable utilization. Supporting HyperCode with
a code database front-ended by a WWW server enables software sharing and development
on a global scale by leveraging the programming, debugging, and computing power
brought together by the World-Wide Web.
code2html by Peter Palfrader
(Weasel) is a perlscript which converts a program source code to syntax highlighted
HTML. It may be called from the command line or as a CGI script. It can also handle
include commands in HTML files. Currently supports: Ada 95, C, C++, HTML, Java,
JavaScript, Makefile, Pascal, Perl, SQL, AWK, M4, and Groff.
code2html is a perlscript which converts a program source code to syntax highlighted
HTML. It may be called from the command line or as a CGI script. It can also handle
include commands in HTML files. It really should be rewitten eventually since the
code is so ugly.
Cxref is a program that will produce documentation (in LaTeX, HTML, RTF or SGML)
including cross-references from C program source code. The program comes with more
detailed information. There is a
README,
which contains an example of the output of the program. (also available in
PostScript
or plaintext.)
A fuller example of the output of the program can be seen in the
cxref
output for the cxref source code itself.
To help with problems encountered in using the program, there is a
FAQ.
It has been designed to work with ANSI C, incorporating K&R, and most popular
GNU extensions.
(The cxref program only works for C not C++, I have no plans to produce a
C++ version.)
The documentation for the program is produced from comments in the code that
are appropriately formatted. The cross referencing comes from the code itself and
requires no extra work.
The documentation is produced for each of the following:
Files
A comment that applies to the whole file.
Functions
A comment for the function, including a description of each of the arguments
and the return value.
Variables
A comment for each of a group of variables and/or individual variables.
#include
A comment for each included file.
#define
A comment for each pre-processor symbol definition, and for macro arguments.
Type definitions
A comment for each defined type and for each element of a structure or union
type.
Any or all of these comments can be present in suitable places in the source
code.
As an example, the file
README.c
has been put through cxref to give
HTML
output.
The cross referencing is performed for the following items
Files
The files that the current file is included in (even when included via
other files).
#includes
Files included in the current file.
Files included by these files etc.
Variables
The location of the definition of external variables.
The files that have visibility of global variables.
The files / functions that use the variable.
Functions
The file that the function is prototyped in.
The functions that the function calls.
The functions that call the function.
The files and functions that reference the function.
The variables that are used in the function.
Each of these items is cross referenced in the output.
The latest released version available is version 1.5e.
Version 1.5e of cxref released Sun June 29 2003
Bug fixes
Don't lose the comment or value when C++ style comments follow a #define.
Updated to work with newer version of flex and SUN version of yacc.
Handle references for local functions with the same name in several files.
Remove some extra ';' from the HTML output.
Handle macros with variable args like MACRO(a,b,...) as well as MACRO(a,b...).
GCC changes
Handle gcc-3.x putting all of its internal #defines in the output.
Compile cxref-cpp if using gcc-3.x that drops comment on same line as #define.
Version 1.5d of cxref released Sun May 5 2002
Bug fixes
Fixes to HTML and SGML outputs (invalid character entities). Fix bug that
stopped -R/ from working. Fix links to HTML source files in certain cases.
Keep the sign of negative numbers in #define output. Improve the lex code
(flex -s). Add some missing ';' to yacc code. Fix the bison debugging
output. Change the use of IFS in cxref-ccc script.
Configure/Make changes
Fix Makefile to compile using non-GNU make programs.
Add flex specific options to the Makefile if using it.
Fixes for build/configure outside the source tree.
Include DESTDIR in the Makefile to help installation.
Configure makes a guess what to do with cxref-cpp if gcc is not installed.
GCC changes
Accept the gcc-3.0 __builtin_va_list type as-if it were a valid C type.
Handle the GCC __builtin_va_arg extension keyword.
Handle the GCC floating point hex extension data format.
Allow the use of gcc-3.x instead of the cxref-cpp program.
Version 1.5c of cxref released Sat Apr 28 2001
Bug fixes
Better Comment handling. Allow the __restrict keyword. Allow bracketed
function declarations. Remove gcc compilation warnings. Allow the
configure script to be run from a different directory.
Optimisation
Speed up the lex code.
Version 1.5b of cxref released Sun Sep 26 1999
Bug fixes
Comments that use the '+html+' convention appear correctly in the HTML source
output. More configurable Makefile (CFLAGS and LDFLAGS options to configure).
Increase the length of static arrays for getcwd(). Fix NAME_MAX compilation
problem. Fix deferencing NULL pointer problem.
Optimisation
Speed up the cross referencing, especially for the first pass with no outputs.
Version 1.5a of cxref released Fri Jun 18 1999
Bug fixes
Fix the "+html+" etc in comments. Make verbatim comments work in LaTeX
output. Allow $ in function and variable names. Allow the configure to force
cxref-cpp instead of gcc. Tidy the Makefiles. Increase the size of
statically allocated arrays in cross referencing. Remove the problem of #line
directives causing confusion. Handle more GNU C extensions. Fix references
to the source file from the HTML. Handle C++ comments following #defines.
Output
The full cxref and cpp command lines are displayed as comments in output files.
Version 1.5 of cxref released Sun Feb 21 1999
Bug fixes
Fix the FAQ to HTML converter. Stop comments in header files leaking out.
Configuration
Use the GNU autoconf program to create a configure script.
Now uses gcc instead of cxref-cpp if it is new enough (version >= 2.8.0).
Now compiles and runs under MS Win32 with the cygwin library.
Output
Added SGML (Linuxdoc DTD) output.
Added RTF (Rich Text Format) output.
Added HTML 3.2 output (with tables).
Added an HTML version of the source file with links into it.
Tools
Provided a Perl script to automatically determine required header files.
Version 1.4b of cxref released Sat Apr 18 1998
... ... ...
The full version history is in the
NEWS
file distributed with the program.
Mailing List
There is a mailing list available for announcements about new versions of
Cxref. This will only be used by me to send announcements about new versions
of Cxref, it is not for Cxref discussions.
You can alternatively send an e-mail to cxref-announce-request at gedanken.demon.co.uk
with subscribe in the body.
Cxref (in various versions) has been tested on the following systems: Linux
1.[123].x, Linux 2.[01234].x, SunOS 4.1.x, Solaris 2.x, HPUX 10.x, AIX, Irix,
(Free|Net)BSD, Win32 with Cygnus development kit.
Standard disclaimer:The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with. We do not warrant the correctness
of the information provided or its fitness for any purpose.