Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


[an error occurred while processing this directive]
Prev Up Contents Next

The Second Love: Typography

"Knuth began TeX because he had become annoyed at the declining quality of the typesetting in volumes I-III of his monumental "Art of Computer Programming" (see Knuth, also bible). In a manifestation of the typical hackish urge to solve the problem at hand once and for all, he began to design his own typesetting language. He thought he would finish it on his sabbatical in 1978; he was wrong by only about 8 years."

The Jargon File

Knuth took a decade off from writing The Art of Computer Programming to create the TeX typesetting language The third edition of The Art of Computer Programming is now typeset in TeX and that in itself was another landmark event. As Donald Knuth noted in his Amazon.com interview:

I've been accumulating corrections and emendations in my own personal copies of the books for 25 years, and people have written to me and said, "Don, do you know that there's a typo on page such and such?" I always knew about these mistakes and I wasn't happy to see thousands of copies printed every year having these mistakes in them. But I also knew that correcting them was a lot of work, as there are many, many cross-references. And my biggest project was to work on the volumes that haven't yet been finished. So, my original plan was simply to make an errata list for volumes 1, 2, and 3 that I could put up on the Web. I created a big database of corrections--there were quite a lot, about 200 pages for each volume--and posted them. When I showed this list of changes to a meeting of the TeX user group [TeX is a computer typesetting system developed by Mr. Knuth. Ed.], one of the guys in the audience volunteered for the hard work of putting the revisions in electronic form. He wound up creating many megabytes of data from which to generate the book. All I needed to do was double-check the corrections. All in all, several volunteers spent a couple of years of their lives doing the detail work and getting the revisions ready. In January of this year, I received volumes 1, 2, and 3 in electronic form, and used them to generate 2,000 laser-printed pages incorporating my hundreds of pages of errata, which looked something like The Art of Computer Programming. When a book exists as a computer file, you have a different feeling about it because you know it's something that you can easily improve. This is my life's work after all--I've spent 35 years on it--and I saw many, many places where I could make it better. So I spent the last seven months making this book into something special. Of course, I'm not unbiased, but in my humble opinion, I've gotten close to something that I can be really proud of. It's a much better book than I would have dared to attempt with the old method of correcting galleys by hand.

Many people do not understand that one of the first major open source project was neither GNU nor Linux. It was TeX. Knuth developed the first version of TeX in in 1971-1978 in order to avoid problem with typesetting of the second edition of his TAoCP volumes. The program proved popular and he produced a second version (in 1982) which was the basis of what we use today.  The whole text of the program was published in this book The TeXbook (Addison-Wesley, 1984, ISBN 0-201-13447-0, paperback ISBN 0-201-13448-9).   At this type there were several propriotory typesetting systems but they were propriroty, nt very flexible and had problem with complex formulas and sophisiticated layouts.

Paradoxically the development of TeX was done largely in parallel with the development of troff in Bell Labs and in a way Donald Knuth "outprogram" very talented researchers in Bell Lab. Bell Lab was one of the first companies which tried to create a set typesetting tools for computer.  It started experimenting with typesetting arount the same time as Knuth. J.E. Saltzer had written "runoff" for CTSS. Bob Morris moved it to the 635, and called it "roff". Ritchie rewrote that as "rf" for the PDP-7, before there was UNIX. At the same time, the summer of 1969,  Doug McIllroy rewrote roff in BCPL (...), extending and simplifying it.  Joseph Ossanna wrote troff and maintained it until his death in 1977.

As many people probably know the initial version of Unix for PDP 11 was formally developed to support publishing. In 1971 Unix developers wanted to get a PDP-11 for further work on the operating system.  In order to justify the cost for  this system, they proposed that they would implement a document formatting system for the AT&T patents division. And the key component of this system was roff/troff.  Because troff required a commercial license, it was later reengineered for use with free version of Unix. The groff formatter suite used on most free BSD systems was written by James Clark in the 1980's from UNIX troff, with ideas from SoftQuad and other extended versions of troff.  Nearly a decade later, Ted Dolotta created the memorandum (-mm) macros, with a lot of input from John Mashey. Thereafter, Eric Allman  wrote the BSD -me macros.  Here is a relevant quote from Groff History :

 `troff' can trace its origins back to a formatting program called  `runoff', written by J. E. Saltzer, which ran on MIT's CTSS operating  system in the mid-sixties.  This name came from the common phrase of  the time "I'll run off a document."  Bob Morris ported it to the 635  architecture and called the program `roff' (an abbreviation of  `runoff').  It was rewritten as `rf' for the PDP-7 (before having  UNIX), and at the same time (1969), Doug McIllroy rewrote an extended  and simplified version of `roff' in the BCPL programming language.

The first version of UNIX was developed on a PDP-7 which was sitting  around Bell Labs.  In 1971 the developers wanted to get a PDP-11 for  further work on the operating system.  In order to justify the cost for  this system, they proposed that they would implement a document formatting system for the AT&T patents division.  This first formatting
program was a reimplementation of McIllroy's `roff', written by  J. F. Ossanna.
 
When they needed a more flexible language, a new version of `roff'  called `nroff' ("Newer `roff'") was written.  It had a much more complicated syntax, but provided the basis for all future versions.  When they got a Graphic Systems CAT Phototypesetter, Ossanna wrote a version of `nroff' that would drive it.  It was dubbed `troff', for  "typesetter `roff'", although many people have speculated that it actually means "Times `roff'" because of the use of the Times font family in `troff' by default.  As such, the name `troff' is pronounced `t-roff' rather than `trough'.

With `troff' came `nroff' (they were actually the same program except for some `#ifdef's), which was for producing output for line printers and character terminals.  It understood everything `troff'  did, and ignored the commands which were not applicable (e.g. font changes).

Since there are several things which cannot be done easily in `troff', work on several preprocessors began.  These programs would  transform certain parts of a document into `troff', which made a very  natural use of pipes in UNIX.

The `eqn' preprocessor allowed mathematical formulae to be specified in a much simpler and more intuitive manner.  `tbl' is a preprocessor for formatting tables.  The `refer' preprocessor (and the similar  program, `bib') processes citations in a document according to a bibliographic database.

Unfortunately, Ossanna's `troff' was written in PDP-11 assembly language and produced output specifically for the CAT phototypesetter.  He rewrote it in C, although it was now 7000 lines of uncommented code and still dependent on the CAT.  As the CAT became less common, and was no longer supported by the manufacturer, the need to make it support
 other devices became a priority.  However, before this could be done,  Ossanna was killed in an auto accident.

So, Brian Kernighan took on the task of rewriting `troff'.  The  newly rewritten version produced a device independent code which was very easy for postprocessors to read and translate to the appropriate printer codes.  Also, this new version of `troff' (called `ditroff' for "device independent `troff'") had several extensions, which included drawing functions.  

Due to the additional abilities of the new version of `troff',  several new preprocessors appeared.  The `pic' preprocessor provides a wide range of drawing functions.  Likewise the `ideal' preprocessor did the same, although via a much different paradigm.  The `grap'  preprocessor took specifications for graphs, but, unlike other preprocessors, produced `pic' code.
 
James Clark began work on a GNU implementation of `ditroff' in early 1989.  The first version, `groff' 0.3.1, was released June 1990.  `groff' included:

Development of GNU `troff' progressed rapidly, and saw the additions of a replacement for `refer', an implementation of the `ms' and `mm'  macros, and a program to deduce how to format a document (`grog').
 
It was declared a stable (i.e. non-beta) package with the release of version 1.04 around November 1991.
 
Beginning in 1999, `groff' has new maintainers (the package was an orphan for a few years).  As a result, new features and programs like  `grn', a preprocessor for gremlin images, and an output device to  produce HTML output have been added.

I think that one of reason for performing this colossal work and essentially beating a very talented AT&T team on their own turf was Donald Knuth love for typesetting. Otherwise working with some vendor like Adobe to improve an exiting tool so that it became suitable for the texts with complex mathematical formulas like typesetting of TAOCP would be a better solution of the problem. After all, TAOCP is not an open book. And free software is only free if you don't place a monetary value on your own time. From this point of view  eight years of Donald Knuth time was a huge investment which probably could be more productively spend on written another volume of TAOCP. After all troff can be extended to do all the things TeX can and that can be done by somebody else then Knuth.  The only benefit that I see is that computer science development stalled after, say 1990 and due a decade consumed by work on TeX Knuth got into a better position to systematize more of less static body of knowledge. The field just lost its dynamic. You can see yourself the this loss of dynamic and beginning of the "sclerosisation" of computer science by browsing old CACM issues: while in 80th almost each issue contained groundbreaking articles in late 80th they became much less impressive, almost completely dull reading with articles of questionable quality not only because editors were asleep at the wheel but because there was not better.

Anyway,  eight year were lost for peripheral ( from the point if view of TAOCP goals) project but as a side effect  Donald Knuth emerged as one of open source pioneers. Unlike most open source pioneers he published not only his code but major algorithms as well and that put him above pure "code junkies" ;-)

Here is how Donald Knuth explained the motives of writing TeX in his Advocado Interview:

Advogato: The first questions that I have are about free software. TeX was one of the first big projects that was released as free software and had a major impact. These days, of course, it's a big deal. But I think when TeX came out it was just something you did, right?

Prof. Knuth: I saw that the whole business of typesetting was being held back by proprietary interests, and I didn't need any claim to fame. I had already been successful with my books and so I didn't have to stake it all on anything. So it didn't matter to me whether or not whether I got anything financial out of it.

There were people who saw that there was a need for such software, but each one thought that they were going to lock everyone into their system. And pretty much there would be no progress. They wouldn't explain to people what they were doing. They would have people using their thing; they couldn't switch to another, and they couldn't get another person to do the typesetting for them. The fonts would be only available for one, and so on.

But I was thinking about FORTRAN actually, the situation in programming in the '50s, when IBM didn't make FORTRAN an IBM-only thing. So it became a lingua franca. It was implemented on all different machines. And I figured this was such a new subject that whatever I came up with probably wouldn't be the best possible solution. It would be more like FORTRAN, which was the first fairly good solution [chuckle]. But it would be better if it was available to everybody than if there were all kinds of things that people were keeping only on one machine.

So that was part of the thinking. But partly that if I hadn't already been successful with my books, and this was my big thing, I probably would not have said, "well, let's give it away." But since I was doing it really for the love it and I didn't have a stake in it where I needed it, I was much more concerned with the idea that it should be usable by everybody. It's partly also that I come out of traditional mathematics where we prove things, but we don't charge people for using what we prove.

So this idea of getting paid for something over and over again, well, in books that seems to happen. You write a book and then the more copies you sell the more you get, even though you only have to write the book once. And software was a little bit like that.

It is interesting to note that TeX begins slightly earlier than Unix but was finished long after Unix became a major player on the OS arena. Actually one can say that Knuth wrote an open source alternative to Unix troff (see below).

TeX is the composition engine (strictly speaking, an interpreter, not a compiler). It is essentially a batch engine, although a limited amount of interactivity is possible when processing a file, to allow error recovery and diagnostic.

To produce his own books, Knuth had to use the rich set of academic publishing conventions and typesetting styles  – footnotes, floating insertions (figures and tables), etc., etc. To simplify his work he developed an input language that permit typesetting complex math expressions. As a markup language TeX is relatively low level (skip so much space, change to font X, set this string of words in paragraph form, ...), but can be enhanced by macro commands. 

The handling of footnotes and similar structures are so well behaved that "style files" have been created for TeX to process critical editions and legal tomes. It is also (after some highly useful enhancements in about 1990) able to handle the composition of many different languages according to their own traditional rules, and for this reason (as well as for the low cost) is quite widely used in Eastern Europe.

From the start, it became very popular among mathematicians to the exent that many mathematica publication accept papers only in TeX.  Some of the algorithms in TeX have not been bettered in any of the composition tools devised in the years since TeX appeared. The most obvious example is the paragraph typesetting with sophisticated line breaking.

TeX produces a "device independent" output file – .dvi – that must then be translated to the particular output device being used (a laser printer, inkjet printer, typesetter; in the "old days" even daisy-wheel printers were used). The DVI translator actually accesses the font shapes, either as bitmaps, Type 1 fonts, or pointers to fonts installed in a printer with the shapes not otherwise accessible. PostScript is one of the most popular "final" output forms for TeX.

Despite its age TeX is still hold its own ground or production of books and journal articles in research mathematics. Few other tools proprietary or otherwise, can handle the complex formula and layout so well and produce high-quality, publication-ready output.

Prev Up Contents Next

Copyright © 1996-2007 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: April 28, 2008