|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Softpanorama C Webliography
The Tao gave birth to machine language.
Machine language gave birth to the assembler.
The assembler gave birth to the compiler.
Now their are ten thousand languages.
Each language has its purpose, however humble.
Each language expresses the Yin and Yang of software.
Each language has its place within the Tao.
But do not program in COBOL if you can avoid it.
The Tao of
Programming
All through my life, I've always used the programming language
that blended best with the debugging system and operating system that I'm using.
If I had a better debugger for language X,
and if X went well with the operating system, I would be using that.
Donald
Knuth
Note: This material on this somewhat intersect with
C++ page as I consider C++ to be mainly "a better C" and
I am deeply skeptical about OO approach. Some materials that are missing in this
page might probably be found on the C++ page
C is often referred to as a ``high-level assembly language.''
That means that it is not optimal as the first programming language. The
ability to perform low-level operations that are needed for systems programming
is actually a distinctive feature of the language. I believe that the language
was a very important invention for its time. The first widespread machine
independent system programming language. As Alex Stepanov aptly
noted in his Dr.
Dobb's Journal Interview
Let's consider now why C is a
great language. It is commonly believed that C is a hack which was successful
because Unix was written in it. I disagree. Over a long period of time
computer architectures evolved, not because of some clever people figuring
how to evolve architectures---as a matter of fact, clever people were pushing
tagged architectures during that period of time---but because of the demands
of different programmers to solve real problems. Computers that were able to
deal just with numbers evolved into computers with byte-addressable memory,
flat address spaces, and pointers. This was a natural evolution reflecting
the growing set of problems that people were solving. C, reflecting the
genius of Dennis Ritchie, provided a minimal model of the computer that had
evolved over 30 years. C was not a quick hack. As computers
evolved to handle all kinds of problems, C, being the minimal model of such a
computer, became a very powerful language to solve all kinds of problems in
different domains very effectively. This is the secret of C's
portability: it is the best representation of an abstract computer that we
have. Of course, the abstraction is done over the set of real computers, not
some imaginary computational devices. Moreover, people could understand the
machine model behind C. It is much easier for an average engineer to
understand the machine model behind C than the machine model behind Ada or
even Scheme. C succeeded because it was doing the right thing, not
because of AT&T promoting it or Unix being written with it.
Right now it got it's second life as a lower level language for
"dual language" programming (in combination with scripting languages).
Especially easy to leant is TCL+C
dual language programming techniques. I strongly advice learn TCL to any
serious C programmer. Otherwise you will deprive yourself of a lot of important
concepts and method of program development and probably will never be as
productive as you can be.
C is a simple and elegant language, that introduced a lot of new
ideas into language disign. As Alex Stepanov in his
Dr. Dobb's Journal
Interview aptly put it:
Let's consider now why C is a
great language. It is commonly believed that C is a hack which was successful
because Unix was written in it. I disagree. Over a long period of time
computer architectures evolved, not because of some clever people figuring
how to evolve architectures---as a matter of fact, clever people were pushing
tagged architectures during that period of time---but because of the demands
of different programmers to solve real problems. Computers that were able to
deal just with numbers evolved into computers with byte-addressable memory,
flat address spaces, and pointers. This was a natural evolution reflecting
the growing set of problems that people were solving. C, reflecting the
genius of Dennis Ritchie, provided a minimal model of the computer that had
evolved over 30 years. C was not a quick hack. As computers
evolved to handle all kinds of problems, C, being the minimal model of such a
computer, became a very powerful language to solve all kinds of problems in
different domains very effectively. This is the secret of C's
portability: it is the best representation of an abstract computer that we
have. Of course, the abstraction is done over the set of real computers, not
some imaginary computational devices. Moreover, people could understand the
machine model behind C. It is much easier for an average engineer to
understand the machine model behind C than the machine model behind Ada or
even Scheme. C succeeded because it was doing the right thing, not
because of AT&T promoting it or Unix being written with it.
While borrowing features of PL/1 and BCPL it really elegantly
integrated the concept of pointers into PL/1-style framework, provided practical
set of high-level control structures and introduced shortcuts for
increment/decrement style operations. As
Donald Knuth remarked:
The way C
handles pointers, for example, was a brilliant innovation; it solved a lot of
problems that we had before in data structuring and made the programs look
good afterwards. C isn't the perfect language, no language is, but I think it
has a lot of virtues, and you can avoid the parts you don't like. I do like C
as a language, especially because it blends in with the operating system (if
you're using UNIX, for example).
All through my life, I've always used the programming language that blended
best with the debugging system and operating system that I'm using. If I had
a better debugger for language X, and if X went well with the operating
system, I would be using that.
And believe me despite C++ existence (and partially due to it
;-) C will be around for a long time. As Dennis Ritchie aptly put in
one of his interviews:
LinuxWorld.com:
C and Unix have exhibited remarkable stability, popularity, and longevity in
the past three decades. How do you explain that unusual phenomenon?
Dennis Ritchie:
Somehow, both hit some sweet spots. The longevity is a bit remarkable -- I
began to observe a while ago that both have been around, in not astonishingly
changed form, for well more half the lifetime of commercial computers. This
must have to do with finding the right point of abstraction of computer
hardware for implementation of the applications.
The basic Unix idea -- a hierarchical file
system with simple operations on it (create/open/read/write/delete with I/O
operations based on just descriptor/buffer/count) -- wasn't new even in 1970,
but has proved to be amazingly adaptable in many ways. Likewise, C managed to
escape its original close ties with Unix as a useful tool for writing
applications in different environments. Even more than Unix, it is a
pragmatic tool that seems to have flown at the right height.
Both Unix and C gained from accidents of
history. We picked the very popular PDP-11 during the 1970s, then the VAX
during the early 1980s. [See Resources for links to
both.] And AT&T and Bell Labs maintained policies about software distribution
that were, in retrospect, pretty liberal. It wasn't today's notion of open
software by any means, but it was close enough to help get both the language
and the operating system accepted in many places, including universities, the
government, and in growing companies.
LinuxWorld.com:
Five or ten years from now, will C still be as popular and indispensable as
it is today, especially in system programming, networking, and embedded
systems, or will newer programming languages take its place?
Dennis Ritchie:
I really don't know the answer to this, except to observe that software is
much harder to change en masse than hardware. C++ and Java, say, are
presumably growing faster than plain C, but I bet C will still be around. For
infrastructure technology, C will be hard to displace. The same could be
said, of course, of other languages (Pascal versions, Ada for example). But
the ecological niches you mention are well occupied.
What is changing is that higher-level
languages are becoming much more important as the number of computer-involved
people increases. Things that began as neat but small tools, like Perl or
Python, say, are suddenly more central in the whole scheme of things. The
kind of programming that C provides will probably remain similar absolutely
or slowly decline in usage, but relatively, JavaScript or its variants, or
XML, will continue to become more central. For that matter, it may be that
Visual Basic is the most heavily used language around the world. I'm not
picking a winner here, but higher-level ways of instructing machines will
continue to occupy more of the center of the stage.
But C is notoriously
difficult to learn as a first language. Other things equal, the best way
to learn C is to learn assembly language first or
to learn two of them in parallel. This way you will probably find C constructs
and especially pointer arithmetic quite natural. If you have never
programmed in assembly language, you may be frustrated by syntax and convoluted
semantic of pointer arithmetic, treatment array names as pointers and so on. The
main problem here is that the language was designed for people who already are
able to program in assembler. This language was designed for writer of an
operating system (Unix) and that is noticeable. In any case, you should
understand that C was designed by accomplished system programmers for
accomplished programmers and do not expect to much help as for finding errors
neither from the compiler not from run time system (which is non-existent in
pure C, and is not very helpful in C++). Both space and time efficiency and the
ability to be close to machine language constructions was necessary on 24K
PDP 11 were it was first implemented. See
A development of the C
language by Dennis Ritchie for more historical information.
You will be much better off if you already took course in some
classic programming language like Basic, Fortran or Pascal(Turbo
Pascal is just great as the first language with Modula-2 as logical
continuation). This way you will be able to understand the language by comparing
C-way of doing thing with Pascal-way of doing things that you already know.
The problem of learning C at high schools and universities is
often complicated by teachers ;-). Many teachers forgot the problems they face
when they study the language themselves and try to feed students as much
material as possible in the very first course. IMHO the attempts to teach both C
and C++ in one semester course are really pretty close to a crime. No it's worse
than a crime -- this is a blunder ;-(. In any case after such course usually
more than 50% of students hate programming in general and C in particular...
Again I would like to stress that Pascal (in its Turbo Pascal incarnation) is a
much better first language, but if you are unfortunate enough have C as your
first language try to slow down and spend first seven weeks without pointers and
structures -- the language will be much better understood if you do not rush to
complex constructs and master a reasonable subset before jumping into complex
stuff. Bad books can also complicate things considerable. Please read
[alt.comp.lang.learn.c-c++] -
FAQ for some useful recommendations on how to avoid typical pitfalls and
problems in C.
Due to the presence of the preprocessor, diagnostic of lexical
and syntax errors of C compilers is exceptionally bad and to avoid frustration
you better write with minimum number of mistakes. That means having a good
textbook and consulting it often. Moreover
than means that you need to check manually program for typical mistakes (like
missing semicolons, "=" instead of "==" in comparison, etc.). You can
actually save a lot of time this way.
For me list of "gotchas" -- errors that I already made and that
took me considerable time to discover was really helpful. You can use some
of the lists available on the WEB as a starting point, but
you will be much better off by creating your own from scratch. For example due
to my previous many years experience with PL/1 I still sometimes use "=" instead
of "==" in if statements and loops. This is an annoying error. See
The Top 10 Ways to
get screwed by the C programming language.
Also annoying and difficult to uncover were cases when I forgot
to place & operator before the name of the variable and passed a value instead
of address. C method of representing strings as array of characters that ends
with null was pretty interesting in 70th, but lost much of its appeal on
computers with several gigabyte of memory. The necessity of having null at the
end of the string leads to subtle errors. That's why sometimes you will see
recommendations like the one in the
SDM C Style
Guide:
4.7 Standard: Explicit +1 in String Length Declaration for
\n
Character arrays used as strings, i.e., to
hold ASCII text and terminated by a null character) should have a defined
length that explicitly includes the "+ 1" character for the null string
terminator.
#define NAME_LEN 20 + 1
char name[NAME_LEN];
Style is important too. One needs (no, actually one should) to
use indent or other pretty-printer
-- they are really important as they really simplify catching of errors by
creating a pattern of indentation that is distinctive from what you might
expect. At the same time one should not go overboard with style by enforcing
upon oneself things that make no sense at all. Although I like the idea of using
high level control construct whenever possible, I really hate structured
programming pundits that teach to avoid GOTO, Breaks, continue, global variables
no matter what -- really religious attitude. This "structured programming
fundamentalists" are not as bad as (now mostly extinct) verification proponents
a la Professor E.W. Dijkstra (who BTW originated
"considered harmful"
cliche in his influential Go To
Statement Considered Harmful paper, published in Communications of the
ACM, Vol. 11, No. 3, March 1968, pp. 147-148.), but still they try to make
programming more difficult instead of trying making it easier. As B.
Kernighan noted in his famous
Why Pascal is Not My
Favorite Programming Language:
There is no 'break' statement for exiting
loops. This is consistent with the one entry-one exit philosophy espoused by
proponents of structured programming, but it does lead to nasty
circumlocutions or duplicated code, particularly when coupled with the
inability to control the order in which logical expressions are evaluated.
Consider this common situation, expressed in C or Ratfor:
while (getnext(...)) {
if (something)
break
rest of loop
}
With no 'break' statement, the first attempt
in Pascal is
done := false;
while (not done) and (getnext(...)) do
if something then
done := true
else begin
rest of loop
end
A scientific ground of this attempts to avoid certain construct
is completely non-existent -- so, as Donald
Knuth pointed out, feel free to use them with no guilt feeling if you try to
implement a higher level control construct that is simply not availed in a given
language. For example I like many recommendation
SDM C Style
Guide and just ignore half-dozen their "structured programming
fundamentalism"-based recommendation like avoiding global variables, continue
statements and GOTOs (see Donald Knuth's
famous article about this issue for more details), but your mileage may vary.
Paradoxically, execution time errors are easier to find as most
implementations have pretty decent debuggers with step by step execution.
Borland is probably the best, but I was really impressed by a debugger that is
built in Visual Studio 5.0. May be it was written by Borland people who were
bought by Microsoft just before Borland itself was bought by Inprise ;-)
But deficiency of C are logical continuation of its strong
points. And there are a lot of strong point in C -- its popularity proves that
it is one of the best system programming languages around. People are flexible
enough to adapt to the language and top programmers can produce up to a couple
of thousand lines of code over a weekend. GCC is the main Linux compiler
and in order to use it one needs to learn C. So C is the key to open source
software. One deficiency of C that I hate is the fact that it does not
support coroutines but there are
libraries for GCC that can help.
It is silly to consider C to be a weaker programming language
than C++. C++ is a decent language, but it is to certain extent an overkill and
is a much more complex programming language than C. It also magnify problems
that exist in C making debugging even more difficult. Contrary to OO advocates
C++ in not always better than C.
Actually you can in many cases one can do much better by using
tandem of TCL + C. TCL has a very simple structure. Each line starts with
a command, such as dotask and a number of arguments. Each command
is implemented as a C function. This function is responsible for handling all
the arguments. See my TCL
page. C programmers can also benefit from learning
Expect. It is the
greatest testing tool in existance. See also DejaGnu
Important:
-
Turbo C version 2.01 is now free from Borland -- see link to Zip file
here !
This is a great free compiler for beginners . It provides
provided everything you needed, all of the tools, included in one
environment. Turbo C 2.01 provided tight integration between the editor,
compiler, linker, and debugger and can be used on any computer. 1M of memory
is enough ;-) It can run on Linux in emulator. See also
Borland Community Museum (free
registration required).
-
Intel compiler is probably the best as for quality of
the generated code on the Intel platform.
It
is available with a non-commercial license, meaning that anyone can
download and use the full compiler for non-profit work. This is the best
optimizing compiler you can get. The installation of Intel compiler is far
faster and easier than the installation of Visual Studio .NET.
The Intel compiler scores are approximately 2.5 times
better then gcc 3.2.1 for the Monte-Carlo simulation, which is a considerably
larger margin than for any of the other parts of the SciMark 2.0 benchmark.
For other parts it is outperforming by only small margins of 10% or less. See
Benchmarking Intel C++ against GNU gcc on Linux.
-
High quality
Borland C++
Compiler 5.5 is now free too and one can use it for writing programs in
windows.
-
For Dos
Soon
Watcom C/C++ and Fortran Compilers will be open source and will be much
better deal than Borland. It's another top of the line C compiler -- I thing
it's better than Microsoft. See
http://www.openwatcom.org
And the last but not least. Sometimes you will feel a block -
you can not do it any more no matter what. Here is several possible ways
to overcome this condition:
- Do some intense physical activity for several hours like
running, diving, bicycling, fast swimming, etc. Then take a shower and try it
again... This is usually very helpful.
- Switch to another projects for at least a couple of days...
- Go off on a tangent - sleep, read something that isn't specific to
the source of the frustration, but still connected with programming, for
example:
- Anything by Donald Knuth
- Back issues (the older the better ;-) of Byte, Dr. Dobbs',
etc.
- Go to the library and browse programming books at random for several
hours (This one is for those who like such an activity, like me ;-)
Good luck !
Dr. Nikolai Bezroukov
Notes:
- Those pages are written by people for whom English is not a
native language. Some amount of grammar and spelling errors
should be expected.
- This is a Spartan WHYFF (We Help You For Free) site. It
cannot replace the best teachers and
the
best books.
- The site contain some obsolete pages as it develops like a
living tree... Some links on older pages
are broken. Please
try to use Google, Open directory, etc. to find a replacement link
(see
HOWTO search the WEB for details).
We would appreciate if you can
mail us a correct link.
|
|
By
Donald E. Knuth,Andrew
Binstock
Date: Apr 25, 2008
Andrew Binstock and Donald Knuth converse on the success of
open source, the problem with multicore architecture, the disappointing lack
of interest in literate programming, the menace of reusable code, and that
urban legend about winning a programming contest with a single compilation.
Andrew Binstock: You are one of the fathers of the open-source
revolution, even if you aren’t widely heralded as such. You previously have
stated that you released
TeX
as open source because of the problem of proprietary implementations at the
time, and to invite corrections to the code—both of which are key drivers
for open-source projects today. Have you been surprised by the success of
open source since that time?
Donald Knuth: The success of open source code is perhaps the only thing
in the computer field that hasn’t surprised me during the past
several decades. But it still hasn’t reached its full potential; I believe
that open-source programs will begin to be completely dominant as the
economy moves more and more from products towards services, and as more and
more volunteers arise to improve the code.
For example, open-source code can produce thousands of binaries, tuned
perfectly to the configurations of individual users, whereas commercial
software usually will exist in only a few versions. A generic binary
executable file must include things like inefficient "sync" instructions
that are totally inappropriate for many installations; such wastage goes
away when the source code is highly configurable. This should be a huge win
for open source.
Yet I think that a few programs, such as Adobe Photoshop, will always be
superior to competitors like the Gimp—for some reason, I really don’t know
why! I’m quite willing to pay good money for really good software,
if I
believe that it has been produced by the best programmers.
Remember, though, that my opinion on economic questions is highly
suspect, since I’m just an educator and scientist. I understand almost
nothing about the marketplace.
Andrew: A story states that you once entered a programming
contest at Stanford (I believe) and you submitted the winning entry, which
worked correctly after a single compilation. Is this story true? In
that vein, today’s developers frequently build programs writing small code
increments followed by immediate compilation and the creation and running of
unit tests. What are your thoughts on this approach to software development?
Donald: The story you heard is typical of legends that are based on only
a small kernel of truth. Here’s what actually happened:
John McCarthy decided in 1971 to have a Memorial Day Programming Race.
All of the contestants except me worked at his AI Lab up in the hills above
Stanford, using the WAITS time-sharing system; I was down on the main
campus, where the only computer available to me was a mainframe for which I
had to punch cards and submit them for processing in batch mode. I used
Wirth’s ALGOL W system
(the predecessor of Pascal). My program didn’t work the first time,
but fortunately I could use Ed Satterthwaite’s excellent offline debugging
system for ALGOL W, so I needed only two runs. Meanwhile, the folks using
WAITS couldn’t get enough machine cycles because their machine was so
overloaded. (I think that the second-place finisher, using that "modern"
approach, came in about an hour after I had submitted the winning entry with
old-fangled methods.) It wasn’t a fair contest.
As to your real question, the idea of immediate compilation and "unit
tests" appeals to me only rarely, when I’m feeling my way in a totally
unknown environment and need feedback about what works and what doesn’t.
Otherwise, lots of time is wasted on activities that I simply never need to
perform or even think about. Nothing needs to be "mocked up."
Andrew: One of the emerging problems for developers, especially
client-side developers, is changing their thinking to write programs in
terms of threads. This concern, driven by the advent of inexpensive
multicore PCs, surely will require that many algorithms be recast for
multithreading, or at least to be thread-safe. So far, much of the work
you’ve published for Volume 4 of
The Art
of Computer Programming (TAOCP) doesn’t seem to touch
on this dimension. Do you expect to enter into problems of concurrency and
parallel programming in upcoming work, especially since it would seem to be
a natural fit with the combinatorial topics you’re currently working on?
Donald: The field of combinatorial algorithms is so vast that I’ll be
lucky to pack its sequential aspects into three or four physical
volumes, and I don’t think the sequential methods are ever going to be
unimportant. Conversely, the half-life of parallel techniques is very short,
because hardware changes rapidly and each new machine needs a somewhat
different approach. So I decided long ago to stick to what I know best.
Other people understand parallel machines much better than I do; programmers
should listen to them, not me, for guidance on how to deal with
simultaneity.
Andrew: Vendors of multicore processors have expressed
frustration at the difficulty of moving developers to this model. As a
former professor, what thoughts do you have on this transition and how to
make it happen? Is it a question of proper tools, such as better native
support for concurrency in languages, or of execution frameworks? Or are
there other solutions?
Donald: I don’t want to duck your question entirely. I might as well
flame a bit about my personal unhappiness with the current trend toward
multicore architecture. To me, it looks more or less like the hardware
designers have run out of ideas, and that they’re trying to pass the blame
for the future demise of Moore’s Law to the software writers by giving us
machines that work faster only on a few key benchmarks! I won’t be surprised
at all if the whole multithreading idea turns out to be a flop, worse than
the "Titanium" approach
that was supposed to be so terrific—until it turned out that the wished-for
compilers were basically impossible to write.
Let me put it this way: During the past 50 years, I’ve written well over
a thousand programs, many of which have substantial size. I can’t think of
even five of those programs that would have been enhanced
noticeably by parallelism or multithreading. Surely, for example, multiple
processors are no help to TeX.[1]
How many programmers do you know who are enthusiastic about these
promised machines of the future? I hear almost nothing but grief from
software people, although the hardware folks in our department assure me
that I’m wrong.
I know that important applications for parallelism exist—rendering
graphics, breaking codes, scanning images, simulating physical and
biological processes, etc. But all these applications require dedicated code
and special-purpose techniques, which will need to be changed substantially
every few years.
Even if I knew enough about such methods to write about them in TAOCP,
my time would be largely wasted, because soon there would be little reason
for anybody to read those parts. (Similarly, when I prepare the third
edition of
Volume
3 I plan to rip out much of the material about how to sort on magnetic
tapes. That stuff was once one of the hottest topics in the whole software
field, but now it largely wastes paper when the book is printed.)
The machine I use today has dual processors. I get to use them both only
when I’m running two independent jobs at the same time; that’s nice, but it
happens only a few minutes every week. If I had four processors, or eight,
or more, I still wouldn’t be any better off, considering the kind of work I
do—even though I’m using my computer almost every day during most of the
day. So why should I be so happy about the future that hardware vendors
promise? They think a magic bullet will come along to make multicores speed
up my kind of work; I think it’s a pipe dream. (No—that’s the wrong
metaphor! "Pipelines" actually work for me, but threads don’t. Maybe the
word I want is "bubble.")
From the opposite point of view, I do grant that web browsing probably
will get better with multicores. I’ve been talking about my technical work,
however, not recreation. I also admit that I haven’t got many bright ideas
about what I wish hardware designers would provide instead of multicores,
now that they’ve begun to hit a wall with respect to sequential computation.
(But my MMIX
design contains several ideas that would substantially improve the current
performance of the kinds of programs that concern me most—at the cost of
incompatibility with legacy x86 programs.)
Andrew: One of the few projects of yours that hasn’t been
embraced by a widespread community is
literate programming.
What are your thoughts about why literate programming didn’t catch on? And
is there anything you’d have done differently in retrospect regarding
literate programming?
Donald: Literate programming is a very personal thing. I think it’s
terrific, but that might well be because I’m a very strange person. It has
tens of thousands of fans, but not millions.
In my experience, software created with literate programming has turned
out to be significantly better than software developed in more traditional
ways. Yet ordinary software is usually okay—I’d give it a grade of C (or
maybe C++), but not F; hence, the traditional methods stay with us. Since
they’re understood by a vast community of programmers, most people have no
big incentive to change, just as I’m not motivated to learn Esperanto even
though it might be preferable to English and German and French and Russian
(if everybody switched).
Jon Bentley
probably hit the nail on the head when he once was asked why literate
programming hasn’t taken the whole world by storm. He observed that a small
percentage of the world’s population is good at programming, and a small
percentage is good at writing; apparently I am asking everybody to be in
both subsets.
Yet to me, literate programming is certainly the most important thing
that came out of the TeX project. Not only has it enabled me to write and
maintain programs faster and more reliably than ever before, and been one of
my greatest sources of joy since the 1980s—it has actually been indispensable at times. Some of my major programs, such as the MMIX
meta-simulator, could not have been written with any other methodology that
I’ve ever heard of. The complexity was simply too daunting for my limited
brain to handle; without literate programming, the whole enterprise would
have flopped miserably.
If people do discover nice ways to use the newfangled multithreaded
machines, I would expect the discovery to come from people who routinely use
literate programming. Literate programming is what you need to rise above
the ordinary level of achievement. But I don’t believe in forcing ideas on
anybody. If literate programming isn’t your style, please forget it and do
what you like. If nobody likes it but me, let it die.
On a positive note, I’ve been pleased to discover that the conventions of
CWEB are already standard equipment within preinstalled software such as
Makefiles, when I get off-the-shelf Linux these days.
Andrew: In
Fascicle 1 of Volume 1, you reintroduced the MMIX computer,
which is the 64-bit upgrade to the venerable MIX machine comp-sci students
have come to know over many years. You previously described MMIX in great
detail in
MMIXware.
I’ve read portions of both books, but can’t tell whether the Fascicle
updates or changes anything that appeared in MMIXware, or whether it’s a
pure synopsis. Could you clarify?
Donald: Volume 1 Fascicle 1 is a programmer’s introduction, which
includes instructive exercises and such things. The MMIXware book is a
detailed reference manual, somewhat terse and dry, plus a bunch of literate
programs that describe prototype software for people to build upon. Both
books define the same computer (once the errata to MMIXware are incorporated
from my website). For most readers of TAOCP, the first fascicle
contains everything about MMIX that they’ll ever need or want to know.
I should point out, however, that MMIX isn’t a single machine; it’s an
architecture with almost unlimited varieties of implementations, depending
on different choices of functional units, different pipeline configurations,
different approaches to multiple-instruction-issue, different ways to do
branch prediction, different cache sizes, different strategies for cache
replacement, different bus speeds, etc. Some instructions and/or registers
can be emulated with software on "cheaper" versions of the hardware. And so
on. It’s a test bed, all simulatable with my meta-simulator, even though
advanced versions would be impossible to build effectively until another
five years go by (and then we could ask for even further advances just by
advancing the meta-simulator specs another notch).
Suppose you want to know if five separate multiplier units and/or
three-way instruction issuing would speed up a given MMIX program. Or maybe
the instruction and/or data cache could be made larger or smaller or more
associative. Just fire up the meta-simulator and see what happens.
Andrew: As I suspect you don’t use unit testing with MMIXAL,
could you step me through how you go about making sure that your code works
correctly under a wide variety of conditions and inputs? If you have a
specific work routine around verification, could you describe it?
Donald: Most examples of machine language code in TAOCP appear
in Volumes 1-3; by the time we get to Volume 4, such low-level detail is
largely unnecessary and we can work safely at a higher level of abstraction.
Thus, I’ve needed to write only a dozen or so MMIX programs while preparing
the opening parts of Volume 4, and they’re all pretty much toy
programs—nothing substantial. For little things like that, I just use
informal verification methods, based on the theory that I’ve written up for
the book, together with the MMIXAL assembler and MMIX simulator that are
readily available on the Net (and described in full detail in the MMIXware
book).
That simulator includes debugging features like the ones I found so
useful in Ed Satterthwaite’s system for ALGOL W, mentioned earlier. I always
feel quite confident after checking a program with those tools.
Andrew: Despite its formulation many years ago, TeX is still
thriving, primarily as the foundation for
LaTeX. While TeX
has been effectively frozen at your request, are there features that you
would want to change or add to it, if you had the time and bandwidth? If so,
what are the major items you add/change?
Donald: I believe changes to TeX would cause much more harm than good.
Other people who want other features are creating their own systems, and
I’ve always encouraged further development—except that nobody should give
their program the same name as mine. I want to take permanent responsibility
for TeX and Metafont,
and for all the nitty-gritty things that affect existing documents that rely
on my work, such as the precise dimensions of characters in the Computer
Modern fonts.
Andrew: One of the little-discussed aspects of software
development is how to do design work on software in a completely new domain.
You were faced with this issue when you undertook TeX: No prior art was
available to you as source code, and it was a domain in which you weren’t an
expert. How did you approach the design, and how long did it take before you
were comfortable entering into the coding portion?
Donald: That’s another good question! I’ve discussed the answer in great
detail in Chapter 10 of my book
Literate Programming, together with Chapters 1 and 2 of my book
Digital Typography. I think that anybody who is really interested in
this topic will enjoy reading those chapters. (See also Digital
Typography Chapters 24 and 25 for the complete first and second drafts
of my initial design of TeX in 1977.)
Andrew: The books on TeX and the program itself show a clear
concern for limiting memory usage—an important problem for systems of that
era. Today, the concern for memory usage in programs has more to do with
cache sizes. As someone who has designed a processor in software, the issues
of cache-aware and
cache-oblivious algorithms surely must have crossed your radar
screen. Is the role of processor caches on algorithm design something that
you expect to cover, even if indirectly, in your upcoming work?
Donald: I mentioned earlier that MMIX provides a test bed for many
varieties of cache. And it’s a software-implemented machine, so we can
perform experiments that will be repeatable even a hundred years from now.
Certainly the next editions of Volumes 1-3 will discuss the behavior of
various basic algorithms with respect to different cache parameters.
In Volume 4 so far, I count about a dozen references to cache memory and
cache-friendly approaches (not to mention a "memo cache," which is a
different but related idea in software).
Andrew: What set of tools do you use today for writing TAOCP?
Do you use TeX? LaTeX? CWEB? Word processor? And what do you use for the
coding?
Donald: My general working style is to write everything first with pencil
and paper, sitting beside a big wastebasket. Then I use Emacs to enter the
text into my machine, using the conventions of TeX. I use tex, dvips, and gv
to see the results, which appear on my screen almost instantaneously these
days. I check my math with Mathematica.
I program every algorithm that’s discussed (so that I can thoroughly
understand it) using CWEB, which works splendidly with the GDB debugger. I
make the illustrations with MetaPost (or, in rare cases, on a Mac with Adobe Photoshop or
Illustrator). I have some homemade tools, like my own spell-checker for TeX
and CWEB within Emacs. I designed my own bitmap font for use with Emacs,
because I hate the way the ASCII apostrophe and the left open quote have
morphed into independent symbols that no longer match each other visually. I
have special Emacs modes to help me classify all the tens of thousands of
papers and notes in my files, and special Emacs keyboard shortcuts that make
bookwriting a little bit like playing an organ. I prefer
rxvt to xterm for terminal
input. Since last December, I’ve been using a file backup system called
backupfs, which meets
my need beautifully to archive the daily state of every file.
According to the current directories on my machine, I’ve written 68
different CWEB programs so far this year. There were about 100 in 2007, 90
in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has an extremely
convenient "change file" mechanism, with which I can rapidly create multiple
versions and variations on a theme; so far in 2008 I’ve made 73 variations
on those 68 themes. (Some of the variations are quite short, only a few
bytes; others are 5KB or more. Some of the CWEB programs are quite
substantial, like the 55-page BDD package that I completed in January.)
Thus, you can see how important literate programming is in my life.
I currently use Ubuntu Linux, on a
standalone laptop—it has no Internet connection. I occasionally carry flash
memory drives between this machine and the Macs that I use for network
surfing and graphics; but I trust my family jewels only to Linux.
Incidentally, with Linux I much prefer the keyboard focus that I can get
with classic FVWM to the
GNOME and KDE environments that other people seem to like better. To each
his own.
Andrew: You state in the preface of
Fascicle 0 of Volume 4 of
TAOCP that Volume 4 surely
will comprise three volumes and possibly more. It’s clear from the text that
you’re really enjoying writing on this topic. Given that, what is your
confidence in the note posted on the TAOCP website that Volume 5
will see light of day by 2015?
Donald: If you check the Wayback Machine for previous incarnations of
that web page, you will see that the number 2015 has not been constant.
You’re certainly correct that I’m having a ball writing up this material,
because I keep running into fascinating facts that simply can’t be left
out—even though more than half of my notes don’t make the final cut.
Precise time estimates are impossible, because I can’t tell until getting
deep into each section how much of the stuff in my files is going to be
really fundamental and how much of it is going to be irrelevant to my book
or too advanced. A lot of the recent literature is academic one-upmanship of
limited interest to me; authors these days often introduce arcane methods
that outperform the simpler techniques only when the problem size exceeds
the number of protons in the universe. Such algorithms could never be
important in a real computer application. I read hundreds of such papers to
see if they might contain nuggets for programmers, but most of them wind up
getting short shrift.
From a scheduling standpoint, all I know at present is that I must
someday digest a huge amount of material that I’ve been collecting and
filing for 45 years. I gain important time by working in batch mode: I don’t
read a paper in depth until I can deal with dozens of others on the same
topic during the same week. When I finally am ready to read what has been
collected about a topic, I might find out that I can zoom ahead because most
of it is eminently forgettable for my purposes. On the other hand, I might
discover that it’s fundamental and deserves weeks of study; then I’d have to
edit my website and push that number 2015 closer to infinity.
Andrew: In late 2006, you were diagnosed with prostate cancer.
How is your health today?
Donald: Naturally, the cancer will be a serious concern. I have superb
doctors. At the moment I feel as healthy as ever, modulo being 70 years old.
Words flow freely as I write TAOCP and as I write the literate
programs that precede drafts of TAOCP. I wake up in the morning
with ideas that please me, and some of those ideas actually please me also
later in the day when I’ve entered them into my computer.
On the other hand, I willingly put myself in God’s hands with respect to
how much more I’ll be able to do before cancer or heart disease or senility
or whatever strikes. If I should unexpectedly die tomorrow, I’ll have no
reason to complain, because my life has been incredibly blessed. Conversely,
as long as I’m able to write about computer science, I intend to do my best
to organize and expound upon the tens of thousands of technical papers that
I’ve collected and made notes on since 1962.
Andrew: On your website, you mention that the
Peoples Archive
recently made a series of videos in which you reflect on your past life. In
segment 93, "Advice to Young People," you advise that people shouldn’t do
something simply because it’s trendy. As we know all too well, software
development is as subject to fads as any other discipline. Can you give some
examples that are currently in vogue, which developers shouldn’t adopt
simply because they’re currently popular or because that’s the way they’re
currently done? Would you care to identify important examples of this
outside of software development?
Donald: Hmm. That question is almost contradictory, because I’m basically
advising young people to listen to themselves rather than to others, and I’m
one of the others. Almost every biography of every person whom you would
like to emulate will say that he or she did many things against the
"conventional wisdom" of the day.
Still, I hate to duck your questions even though I also hate to offend
other people’s sensibilities—given that software methodology has always been
akin to religion. With the caveat that there’s no reason anybody should care
about the opinions of a computer scientist/mathematician like me regarding
software development, let me just say that almost everything I’ve ever heard
associated with the term "extreme
programming" sounds like exactly the wrong way to go...with one
exception. The exception is the idea of working in teams and reading each
other’s code. That idea is crucial, and it might even mask out all the
terrible aspects of extreme programming that alarm me.
I also must confess to a strong bias against the fashion for reusable
code. To me, "re-editable code" is much, much better than an untouchable
black box or toolkit. I could go on and on about this. If you’re totally
convinced that reusable code is wonderful, I probably won’t be able to sway
you anyway, but you’ll never convince me that reusable code isn’t mostly a
menace.
Here’s a question that you may well have meant to ask: Why is the new
book called Volume 4 Fascicle 0, instead of Volume 4 Fascicle 1? The answer
is that computer programmers will understand that I wasn’t ready to begin
writing Volume 4 of TAOCP at its true beginning point, because we
know that the initialization of a program can’t be written until the program
itself takes shape. So I started in 2005 with Volume 4 Fascicle 2, after
which came Fascicles 3 and 4. (Think of Star Wars, which began with
Episode 4.)
About: Xcoral is a multi-window mouse-based text editor
for Unix/X11 with syntax highlighting and auto-indentation. A
built-in browser enables you to navigate through C functions, C++
and Java classes, methods, files, and attributes. This browser is
very fast and self-updates automatically after file modifications.
An ANSI C Interpreter (Smac) is also built-in to dynamically extend
the editor's facilities (with user functions, keybindings, modes,
etc).
Changes: Bugfixes.
About: Sunifdef is a command line tool for eliminating
superfluous preprocessor clutter from C and C++ source files. It
is a more powerful successor to the FreeBSD 'unifdef' tool.
Sunifdef is most useful to developers of constantly evolving
products with large code bases, where preprocessor conditionals
are used to configure the feature sets, APIs or implementations
of different releases. In these environments, the code base
steadily accumulates #ifdef-pollution as transient configuration
options become obsolete. Sunifdef can largely automate the
recurrent task of purging redundant #if logic from the code.
Changes: Six bugs are fixed in this release. Five of
these fixes tackle longstanding defects of sunifdef's parsing
and evaluation of integer constants, a niche that has received
little scrutiny since the tool branched from unifdef. This
version provides robust parsing of hex, decimal, and octal
numerals and arithmetic on them. However, sunifdef still
evaluates all integer constants as ints and performs signed
integer arithmetic upon them. This falls short of emulating the
C preprocessor's arithmetic in limit cases, which is an unfixed
defect.
About: ATF is a collection of libraries and utilities
designed to ease unattended application testing in the hands of
developers and end users of a specific piece of software. Tests can
currently be written in C/C++ or POSIX shell and, contrary to other
testing frameworks, ATF tests are installed into the system
alongside any other application files. This allows the end user to
easily verify that the software behaves correctly on her system.
Furthermore, the results of the test suites can be collected into
nicely-formatted reports to simplify their visualization and
analysis.
Changes: This release adds preliminary documentation on
the C++ and shell interfaces to write tests, mainly directed to
developers wishing to adopt ATF. It adds a way to specify required
architectures and machines for given tests through the require. arch
and require.machine properties; if the platform running the tests
does not fulfill the requirements, the tests are simply skipped. It
adds the ability to limit the maximum time a test case can last
through the timeout property, killing tests that get stalled. There
are many portability fixes, especially to SunOS, and small
improvements all around.
SWIG is a software development tool that connects programs
written in C and C++ with a variety of high-level
programming languages. SWIG is primarily used with common
scripting languages such as Perl, PHP, Python, Tcl/Tk, and
Ruby, however the list of supported languages also includes
non-scripting languages such as C#, Common Lisp (CLISP,
Allegro CL, UFFI), Java, Modula-3, OCAML, and R. Also
several interpreted and compiled Scheme implementations
(Guile, MzScheme, Chicken) are supported. SWIG is most
commonly used to create high-level interpreted or compiled
programming environments, user interfaces, and as a tool for
testing and prototyping C/C++ software. SWIG can also export
its parse tree in the form of XML and Lisp s-expressions.
Release focus: Minor feature enhancements
Changes:
shared_ptr support was added for Java and C#. STL support
for Ruby was enhanced. Windows support for R was added. A
long-standing memory leak in the PHP module was fixed.
Numerous fixes and minor enhancements were made for
Allegrocl, C#, cffi, Chicken, Guile, Java, Lua, Ocaml, Perl,
PHP, Python, Ruby, and Tcl. Warning support was improved.
Getting the output of a shell command from
a C program using popen
Sometimes its necessary to access the output of a shell
command (more than just the return value) in a C program. One way
could be to redirect it to a file and then access it .The other
would be by using the popen function.
#include<stdio.h>
main(){
char cmd[80];
FILE *fptr;
char out[256];
int ret;
strcpy(cmd,"ls -l");
fptr = popen(cmd, "r");
while(1){
fgets(out, 256, fptr);
if(feof(fptr)) break;
puts(out);
}
ret = pclose(fptr);
}
/* Noted tested with S10 gcc only ..*/
Splint is a tool for statically checking C programs for security
vulnerabilities and coding mistakes. With minimal effort, Splint can be used
as a better lint. If additional effort is invested adding annotations to
programs, Splint can perform stronger checking than can be done by any
standard lint.
About 10 months ago, I was writing a library. As I was writing
it, I started to look at the whole issue of notifying the caller of
errors. In typical fashion, I tried to optimize the error handling
problem rather than just do the right thing, and just use error
codes. I did a ton of research. Here is a current list of links and
articles on the subject.
Getting Started
To get you started here are some good starting points. They both
received a lot of attention on the internet.
A colorful
post by
Damien Katz.
A nice
opinion piece that is pro-error codes by the famous Joel of
Joel on Software.
Read my
original post with excellent comments by
Daniel Lyons, Paul Clegg, and
Neville of the North.
Nutshell
The default and standard way of handling errors since the
begining is to just use error codes with some convention of noticing
them. For example, you could document the error condition with an
api and then set a global variable for the actual code. It is up to
the programmer calling the function to notice the error and do the
right thing.
This is the technique used by operating systems and most
libraries. Historically, these systems have never been consistent or
compatable with other conventions. The most evolved system for this
would probably be the
Microsoft COM system. All functions return an HRESULT, which is
essentially an error code.
The next system was the ‘exception-handling’ system. In this
system errors cannot be ingored. Exception handlers are declared,
optionally, at a given scope. If an exception is thrown (ie
an error has occurred), handlers are searched up the stack until a
matching handler is found.
IMHO, the exception system isn’t used properly in 90% of the
cases. There is a fine balance between a soft error and something
exceptional. The syntax also tends to get in the way for even the
simplest of errors. I agree that there should be errors that are not
ignored, but there has to be a better way.
So, old skoolers are ‘we use error codes, and we like
them, dammit - aka, super disciplined programming, usually
for real-time, embedded and smaller systems.
The new schoolers are, ‘you have to be kidding about error-codes,
use exceptions’ - aks, yeah, we use exceptions, that is what the
language gives us… and btw, no, we don’t mind typing on our
keyboards a lot
Somehow, there has to be a better way. Maybe it will be system or
application, specific.
Moving On - Old / New Ideas
If you don’t mind it being a C++ article,
here is an
amazing one from Andrei Alexandrescu and Petru Marginean. (Andrei is
widely known for his great work on Policy Based design with C++,
which is excellent) The artcle is well written and practical. In
fact, the idea was so good, the language ‘D’ made it part of the
language.
Here is an example:
void User::AddFriend(User& newFriend)
{
friends_.push_back(&newFriend);
try
{
pDB_->AddFriend(GetName(), newFriend.GetName());
}
catch (...)
{
friends_.pop_back();
throw;
}
}
10 lines, and this is for the super-simple example.
void User::AddFriend(User& newFriend)
{
friends_.push_back(&newFriend);
ScopeGuard guard = MakeObjGuard(friends_, &UserCont::pop_back);
pDB_->AddFriend(GetName(), newFriend.GetName());
guard.Dismiss();
}
In D it would look even cleaner:
void User::AddFriend(User& newFriend)
{
friends_.push_back(&newFriend);
scope(failure) friends_.pop_back();
pDB_->AddFriend(GetName(), newFriend.GetName());
}
IMHO, I think exception handling will move more towards systems
like this. Higher level, simpler and cleaner.
Other interesting systems are the ones developed for Common Lisp,
Erlang, and Smalltalk. I’m sure Haskell has something to say about
this as well.
The Common Lisp and Smalltalk ones are similar. Instead of
forcing a mechanism like most exception handlers. These systems give
the exception ‘catcher’ the choice of retry’ing or doing something
different at the point of the exception. Very powerful.
Speaking of smalltalk, here is an excellent
article called
Subsystem Exception Handling in Smalltalk. I highly recommend
it.
My Recomendation
If you are building a library, use error codes. Error codes are
much easier to turn into exceptions by the language wrapper that
will eventually be built on top.
When programming, don’t get trapped into think about the little
picture. A lot of these errors are just pawns in the grand scheme of
assuring that you have all of your resources in place before you
begin your task at hand. If you present your code in that manner, it
will be much easier to understand for all parties.
More Links
Error Codes vs. Exceptions by Damien Katz.
opinion piece that is pro-error codes by the famous Joel of
Joel on Software.
Read my
original post with excellent comments by
Daniel Lyons, Paul Clegg, and
Neville of the North.
Microsoft COM
D
Language - Exception Safe Programming
Subsystem Exception Handling in Smalltalk - nice section on
history as well
http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html
A nice long thread on comp.lang.c++.moderated
*Slightly Wacky, But Neat *
http://www.halfbakery.com/idea/C20exception20handling_20macros
http://www.nicemice.net/cexcept/ http://home.rochester.rr.com/bigbyofrocny/GEF/
http://www.on-time.com/ddj0011.htm
|
About:
Doxygen is a cross-platform, JavaDoc-like documentation system
for C++, C, Objective-C, C#, Java, IDL, Python, and PHP. Doxygen
can be used to generate an on-line class browser (in HTML)
and/or an off-line reference manual (in LaTeX or RTF) from a set
of source files. Doxygen can also be configured to extract the
code-structure from undocumented source files. This includes
dependency graphs, class diagrams and hyperlinked source code.
This type of information can be very useful to quickly find your
way in large source distributions.
Changes: This
release fixes a number of bugs that could cause it to crash
under certain conditions or produce invalid output.
|
Make allows a programmer to easily keep track of a project by maintaining
current versions of their programs from separate sources. Make can automate
various tasks for you, not only compiling proper branch of source code from
the project tree, but helping you automate other tasks, such as cleaning
directories, organizing output, and even debugging.
I agree with your ramblings, although by chance I happen to have one
counter-example - John Carmack of id Software. The first Quake really was an
amazing technical achievement (real-time texture-mapped 3D graphics done in
software that looked good on a Pentium 75?!?).
And if you look at the source code (which you can download for free), it's
some of the prettiest, easy-to-follow C code I've ever seen.
And aside from a few interviews, Carmack hasn't written smack.
Error
reporting in C programsC is the most commonly used
programming language on UNIX platforms. Despite the popularity
of other languages on UNIX (such as Java™, C++, Python, or
Perl), all of the application programming interfaces (APIs) of
systems have been created for C. The standard C library, part of
every C compiler suite, is the foundation upon which UNIX
standards, such as Portable Operating System Interface (POSIX)
and the Single UNIX Specification, were created.
When C and UNIX were developed in the early 1970s, the
concept of exceptions, which interrupt the flow of an
application when some condition occurs, was fairly new or
non-existent. The libraries had to use other conventions for
reporting errors.
While you're pouring over the C library, or almost any other
UNIX library, you'll discover two common ways of reporting
failures:
- The function returns an error or success code; if it's
an error code, the code itself can be used to figure out
what went wrong.
- The function returns a specific value (or range of
values) to indicate an error, and the global variable
errno is set to indicate the cause of the problem.
The errno global variable (or, more accurately,
symbol, since on systems with a thread-safe C library,
errno is actually a function or macro that ensures each
thread has its own errno) is defined in the <errno.h>
system header, along with all of its possible values
defined as standard constants.
Many of the functions in the first category actually return
one of the standard errno codes, but it's
impossible to tell how a function behaves and what it returns
without checking the Returns section of the manual page. If
you're lucky, the function's man page lists all of its possible
return values and what they mean in the context of this
particular function. Third party libraries often have a single
convention that's followed by all of the functions in the
library but, again, you'll have to check the library's
documentation before making any assumptions.
Let's take a quick look at some code demonstrating
errno and a couple of functions that you can use to
transform that error code into something more human-readable.
[Feb 14, 2006] Free Microsoft compilers
- Get a Free Copy of Visual Studio 2005 Express Editions
Download a copy of Visual Studio 2005
Express Editions today – easy to use tools for the hobbyist, novice and
student developer.
-
Visual C++ Toolkit 2003 The Microsoft Visual
C++ Toolkit 2003 includes the core tools developers
need to compile and link C++-based applications for
Windows and the .NET Common Language Runtime –
compiler, linker, libraries, and sample code.
[Nov 9, 2005]
10
Things I Hate About (UNIX -- a primitive and misguided view on C; The main
value of thier peace is that it contains most of the typical arguments
that people who as no clue in software engineering attack the language. The
author does not understand that for higher level language TCL or similar
scripting languages should be used along not instead of C.
The C language was
written to enable UNIX to be portable. It's designed to
produce good code for the PDP-11, and very closely maps
to that machine's capabilities. There's no support for
concurrency in C, for example. In a modern language such
as Erlang, primitives exist in the language for creating
different threads of execution and sending messages
between them. This is very important today, when it's a
lot cheaper to buy two computers than one that's twice
as fast.
C also lacks a number of
other features present in modern languages. The most
obvious is lack of support for strings. The lack of
bounds-testing on arrays is another example—one
responsible for a large number of security holes in UNIX
software. Another aspect of C that's responsible for
several security holes is the fact that integers in C
have a fixed size—if you try to store something that
doesn't fit, you get an overflow. Unfortunately, this
overflow isn't handled nicely. In Smalltalk, the
overflow would be caught transparently to the developer
and the integer increased in size to fit it. In other
low-level languages, the assignment would generate an
error that could be handled by the program. In C, it's
silently ignored. And how big is the smallest value that
won't fit in a C integer? Well, that's up to the
implementation.
Next, we get to the
woefully inadequate C preprocessor. The preprocessor in
C works by very simple token substitution—it has no
concept of the underlying structure of the code. One
obvious example of the limitations of this setup is when
you try adding control structures to the language. With
Smalltalk, this is trivial—blocks of code in Smalltalk
can be passed as arguments, so any message call can be a
control statement. In LISP, the preprocessor can be used
to encode design patterns, greatly reducing the amount
of code needed. C can just about handle simple
inline-function equivalents.
The real problem with C,
however, is that it's the standard language for UNIX
systems. All system calls and common libraries expose C
functions, because C is the lowest common
denominator—and C is very low. C was designed when the
procedural paradigm was only just gaining acceptance,
when Real Programmers used assembly languages and
structured programming was something only people in
universities cared about. If you want to create an
object-oriented library on UNIX, you either expose it in
the language in which it was written—forcing other
developers to choose the same language as you—or you
write a cumbersome wrapper in C. Hardly an ideal
solution.
Making
Wrong Code Look Wrong - Joel on Software the main problem with C critics is
not the C is perfect (it is far from being perfect), but that critics are
ignorant. Joel rehashes old C warts without real understanding of solutions
available. For example indent is one of the
simplest solutions to "deceptive nesting" problem in C. BTW the problem was
present even in languages with better, more flexible code blocks like PL/1 BTW
PL/1 permits label on each closing bracket in order to match the opening
bracket; it also permit multiple block closure with a singled labeled bracket
like
a: begin; ...
begin; ... begin ... end a; /* end a closes all 3 blocks */
Anyway here is his rant:
...As you get more proficient at writing code in a
particular environment, you start to learn to see other things. Things that
may be perfectly legal and perfectly OK according to the coding convention,
but which make you worry.
For example, in C:
char* dest, src;
This is legal code; it may conform to
your coding convention, and it may even be what was intended, but when you’ve
had enough experience writing C code, you’ll notice that this declares
dest as
a char
pointer while declaring src
as merely a char,
and even if this might be what you wanted, it probably isn’t. That
code smells a little bit dirty.
Even more subtle:
if (i != 0)
foo(i);
In this case the code is 100% correct; it
conforms to most coding conventions and there’s nothing wrong with it, but the
fact that the single-statement body of the if
statement is not enclosed in braces may be bugging you, because you might be
thinking in the back of your head, gosh, somebody might insert another line of
code there
if (i != 0)
bar(i);
foo(i);
… and forget to add the braces, and thus
accidentally make foo(i)unconditional!
So when you see blocks of code that aren’t in braces, you might sense just a
tiny, wee, soupçon of uncleanliness which makes you uneasy.
OK, so far I’ve mentioned three levels of
achievement as a programmer:
- You don’t know clean from unclean.
- You have a superficial idea of cleanliness,
mostly at the level of conformance to coding conventions.
- You start to smell subtle hints of
uncleanliness beneath the surface and they bug you enough to reach out and
fix the code.
There’s an even higher level, though, which is
what I really want to talk about:
4. You deliberately architect your code in such
a way that your nose for uncleanliness makes your code more likely to be
correct.
This is the real art:
making robust code by literally inventing conventions that
make errors stand out on the screen.
So now I’ll walk you through a little example,
and then I’ll show you a general rule you can use for inventing these
code-robustness conventions, and in the end it will lead to a defense of a
certain type of Hungarian Notation, probably not the type that makes people
carsick, though, and a criticism of exceptions in certain circumstances,
though probably not the kind of circumstances you find yourself in most of the
time.
But if you’re so convinced that Hungarian
Notation is a Bad Thing and that exceptions are the best invention since the
chocolate milkshake and you don’t even want to hear any other opinions, well,
head on over to Rory’s and read the
excellent comix
instead; you probably won’t be missing much here anyway; in fact in a minute
I’m going to have actual code samples which are likely to put you to sleep
even before they get a chance to make you angry. Yep. I think the plan will be
to lull you almost completely to sleep and then to sneak the Hungarian=good,
Exceptions=bad thing on you when you’re sleepy and not really putting up much
of a fight.
An Example
Right. On with the example. Let’s pretend that
you’re building some kind of a web-based application, since those seem to be
all the rage with the kids these days.
Now, there’s a security vulnerability called
the Cross Site Scripting Vulnerability, a.k.a.
XSS. I won’t go
into the details here: all you have to know is that when you build a web
application you have to be careful never to repeat back any strings that the
user types into forms.
So for example if you have a web page that says
“What is your name?” with an edit box and then submitting that page takes you
to another page that says, Hello, Elmer! (assuming the user’s name is Elmer),
well, that’s a security vulnerability, because the user could type in all
kinds of weird HTML and JavaScript instead of “Elmer” and their weird
JavaScript could do narsty things, and now those narsty things appear to come
from you, so for example they can read cookies that you put there and forward
them on to Dr. Evil’s evil site.
Let’s put it in pseudocode. Imagine that
s = Request("name")
reads input (a POST argument) from the HTML
form. If you ever write this code:
Write "Hello, " & Request("name")
your site is already vulnerable to XSS attacks.
That’s all it takes.
Instead you have to encode it before you
copy it back into the HTML. Encoding it means replacing
" with
", replacing
> with
>, and so forth. So
Write "Hello, " &
Encode(Request("name"))
is perfectly safe.
All strings that originate from the user are
unsafe. Any unsafe string must not be output without encoding it.
Let’s try to come up with a coding convention
that will ensure that if you ever make this mistake, the code will just look wrong. If wrong code, at least,
looks wrong, then it has a
fighting chance of getting caught by someone working on that code or reviewing
that code.
V IDE
V IDE works with GNU g++, Borland C++ 5.5 and Java and runs
on Windows and Linux. It includes a syntax highlighting editor
for C/C++, Java, Perl, Fortran, TeX and HTML. It has a built-in
code beautifier, macro support, ctags support, project manager,
integrated support for the V applications generator and icon
editor, integrated support for the GNU gdb and Sun's jdb (for
Java), etc.
Slashdot Optimizations - Programmer vs. Compiler
Re:Clear Code (Score:5, Insightful)
by Rei (128717) on Friday
February 25, @04:56PM (#11782241)
(http://www.cursor.org/)
|
An important lesson that I wish I had learned when I was younger
;) It is crazy to start optimizing before you know where your
bottlenecks are. Don't guess - run a profiler. It's not hard, and you'll
likely get some big surprises.
Another thing to remember is this: the compiler isn't stupid; don't
pretend that it is. I had senior developers at an earlier job mad at me
because I wasn't creating temporary variables for the limits of my loop
indices (on unprofiled code, nonetheless!). It took actually digging up
an article on the net to show that all modern compilers automatically
dereference any const references (be they arrays, linked lists, const
object functions, etc) before starting the loop.
Another example: function calls. I've heard some people be insistant
that the way to speed up an inner loop is to remove the code from
function calls so that you don't have function call overhead. No! Again,
compilers will do this for you. As compilers were evolving, they added
the "inline" keyword, which does this for you. Eventually, the compilers
got smart enough that they started inlining code on their own when not
specified and not inlining it when coders told it to be inline if it
would be inefficient. Due to coder pressure, at least one compiler that
I read about had an "inline damnit" (or something to that effect) keyword
to force inlining when you're positive that you know better than the
compiler ;)
Once again, the compiler isn't stupid. If an optimization seems
"obvious" to you, odds are pretty good that the compiler will take care
of it. Go for the non-obvious optimizations. Can you remove a loop from
a nested set of loops by changing how you're representing your data? Can
you replace a hack that you made with standard library code (which tends
to be optimized like crazy)? Etc. Don't start dereferencing variables,
removing the code from function calls, or things like this. The compiler
will do this for you.
If possible, work with the compiler to help it. Use "restrict". Use
"const". Give it whatever clues you can. |
Write C for C programmers (Score:5, Insightful)
by swillden (191260)
* on
Friday February 25, @03:55PM (#11781306)
|
| With regard to your example, I can't imagine any modern compiler
wouldn't treat the two as equivalent.
However, in your example, I actually prefer "if (!ptr)" to "if (ptr
== NULL)", for two reasons. First the latter is more error-prone,
because you can accidentally end up with "if (ptr = NULL)". One common
solution to avoid that problem is to write "if (NULL == ptr)", but that
just doesn't read well to me. Another is to turn on warnings, and let
your compiler point out code like that -- but that assumes a decent
compiler.
The second, and more important, reason is that to anyone who's been
writing C for a while, the compact representation is actually clearer
because it's an instantly-recognizable idiom. To me, parsing the "ptr ==
NULL" format requires a few microseconds of thought to figure out what
you're doing. "!ptr" requires none. There are a number of common idioms
in C that are strange-looking at first, but soon become just another
part of your programming vocabulary. IMO, if you're writing code in a
given language, you should write it in the style that is most
comfortable to other programmers in that language. I think proper use of
idiomatic expressions *enhances* maintainability. Don't try to write
Pascal in C, or Java in C++, or COBOL in, well, anything, but that's a
separate issue :-)
Oh, and my answer to your more general question about whether or not
you should try to write code that is easy for the compiler... no. Don't
do that. Write code that is clear and readable to programmers and let
the compiler do what it does. If profiling shows that a particular piece
of code is too slow, then figure out how to optimize it, whether by
tailoring the code, dropping down to assembler, or whatever. But not
before. |
Check out the LLVM demo page (Score:5, Interesting)
by sabre (79070) on Friday
February 25, @03:58PM (#11781354)
(http://www.nondot.org/~sabre/)
|
LLVM is an aggressive compiler that is able to do many cool things.
Best yet, it has a demo page here: http://llvm.org/demo [llvm.org], where you can
try two different things and see how they compile.
One of the nice things about this is that the code is printed in a
simple abstract assembly language that is easy to read and understand.
The compiler itself is very cool too btw, check it out.
:) |
If you're not willing to TIME it... (Score:4, Insightful)
by dpbsmith (263124) on
Friday February 25, @04:30PM (#11781872)
(http://world.std.com/~dpbsmith)
|
...then the code isn't important enough to optimize. Plain and
simple.
Never try to optimize anything unless you have measured the speed of the
code before optimizing and have measured it again after optimizing.
Optimized code is almost always harder to understand, contains more
possible code paths, and more likely to contain bugs than the most
straightforward code. It's only worth it if it's really faster...
And you simply cannot tell whether it's faster unless you actually time
it. It's absolutely mindboggling how often a change you are certain will
speed up the code has no effect, or a truly negligible effect, or slows
it down.
This has always been true. In these days of heavily optimized compilers
and complex CPUs that are doing branch prediction and God knows what
all, it is truer than ever. You cannot tell whether code is fast just by
glancing at it. Well, maybe there are processor gurus who can accurately
visualize the exact flow of all the bits through the pipeline, but I'm
certainly not one of them.
A corollary is that since the optimized code is almost always trickier,
harder to understand, and often contains more logic paths than the most
straightforward code, you shouldn't optimize unless you are committed to
spending the time to write a careful unit-test fixture that exercises
everything tricky you've done, and write good comments in the code. |
Premature Optimization (Score:5, Insightful)
by fizban (58094) <fizban@umich.edu> on Friday February 25,
@05:06PM (#11782376)
(http://www.sophicstudios.com/)
|
Premature Optimization is the DEVIL! I repeat, it is the gosh darn
DEVIL! Don't do it. Write clear code so that I don't have to spend days
trying to figure out what you are trying to do.
The biggest mistake I see in my professional (and unprofessional) life
is programmers who try to optimize their code is all sorts of "733+"
ways, trying to "trick" the compiler into removing 1 or 2 lines of
assembly, yet completely disregard that they are using a map instead of
a hash_map, or doing a linear search when they could do a binary search,
or doing the same lookup multiple times, when they could do it just
once. It's just silly, and goes to show that lots of programmers don't
know how to optimize effectively.
Compilers are good. They optimize code well. Don't try to help them out
unless you know your code has a definite bottleneck in a tight loop that
needs hand tuning. Focus on using correct algorithms and designing your
code from a high level to process data efficiently. Write your code in a
clear and easy to read manner, so that you or some other programmer can
easily figure out what's going on a few months down the line when you
need to add fixes or new functionality. These are the ways to build
efficient and maintainable systems, not by writing stuff that you could
enter in an obfuscated code contest. |
valgrind (Score:4, Informative)
by cyco/mico (178541) on
Friday February 25, @05:12PM (#11782431)
|
If in doubt, use valgrind and kcachegrind [sourceforge.net].
One run with callgrind gives you all the information you want:
- How often are functions called (and branches taken)
- Which functions take most of the time
- See the assembler code for each line with a mouse click (no need
to guess anymore)
callgrind/kcachegrind is by far the easiest profiling solution I ever
tried, and it seems answer more or less all of your questions. |
Rules for writing fast code (aka optimization) (Score:4,
Insightful)
by MSBob (307239) on Friday
February 25, @05:56PM (#11782860)
|
| First: Avoid doing what you don't have to do. Sounds obvious
but I rarely see code that does the absolute minimum it needs to. Most
of the code I've seen to date seems to precalculate too much stuff, read
too much data from external storage, redraw too much stuff on screen
etc... Second: Do it later. There are thousands of situations where you
can postpone the actual computations. Imagine writing a Matrix class
with the invert() method. You can actually postpone calculating the
inverse of the matrix until there is a call to access on of the fields
in the matrix. Also you can calculate only the field being accessed. Or
at some sensible threshold you may assume that the user code will read
the entire inverted matrix and you can just calculate the remaining
inverted fields... the options are endless.
Most string class implementations already make good use of this rule by
only copying their buffers only when the "copied" buffer changes.
Third: Apply minimum algorithmic complexity. If you can use a
hashmap instead of a treemap use the hash version it's O(1) vs Olog(n).
Use quicksort for just about any kind of sorting you need to do.
Fourth: Cache your data. Download or buy a good caching class
or use some facilities your language provides (eg. Java SoftReference
class) for basic caching. There are some enormous performance gains that
can be realized with smart caching strategies.
Fifth: Optimize using your language constructs. User the
register keyword, use language idioms that you know compile into faster
code etc... Scratch this rule! If you're applying rules one to
four you can forget about this one and still have fast AND readable
code. |
The never overused example that I have (Score:4, Informative)
by roman_mir (125474) on
Friday February 25, @05:56PM (#11782867)
(http://slashdot.org/
| Last Journal: Monday
December 08, @11:44AM) |
I got this job as a contractor 4 years ago now where the project was
developed by over 30 junior developers and one crazy overpaid lady (hey,
Julia,) who wouldn't let people touch her code so fragile it was (and it
was the main action executor,) she would rather fight you for hours than
make one change in the code (she left 2 months before the project
release.) Now, I have never witnessed such monstrocity of a code base
before - the business rules were redefined about once every 2 weeks dor
1.5 years straight. You can imagine.
So, the client decided not to pay the last million of dollars because
the performance was total shit. On a weblogic cluster of 2 Sun E45s they
could only achieve 12 concurrent transactions per second. So the client
decided they really did not want to pay and asked us to make it at least
200 concurrent transactions per second on the same hardware. If I may
make a wild guess, I would say the client really did not want to pay the
last million, no matter what, so they upped the numbers a bit from what
they needed. But anyway.
Myself and another developer (hi, Paul) spent 1.5 months - removing
unnecessary db calls (the app was incremental, every page would ask you
more questions that needed to be stored, but the app would store all
questions from all pages every time,) cached XML DOM trees instead of
reparsing them on every request, removed most of the session object,
reduced it from 1Mb to about 8Kb, removed some totally unnecessary and
bizarre code (the app still worked,) desynchronized some of the calls
with a message queue etc.
At the end the app was doing 320 and over concurrent transactions per
second. The company got their last million.
The lesson? Build software that is really unoptimized first and then
save everyone's ass by optimizing this piece of shit and earn total
admiration of the management - you are a miracle worker now.
The reality? Don't bother trying to optimize code when the business
requirements are constantly changing, the management has no idea how to
manage an IT dep't, the coders are so nube - there is a scent of
freshness in the air and there is a crazy deadline right in front of
you. Don't optimize, if the performance becomes an issue, optimize then.
|
Highlight UnMatched Brackets - Capture those unmatched brackets while u r still
in insert-mode vim online
Its really irksome when your compiler
complains for any unmatched "{" or "(" or "[".
With this plugin you can highlight all
those unmatched "{" or "(" or "[" as you type.
This helps you to keep track of where
the exact closing bracket should come.
This plugin also warns you of any extra "}" or ")" or "]" you typed.
Customization:
- Specifying Additional Bracket-pairs.
User can specify additional matching pairs in
the global option 'matchpairs', see :help 'matchpairs'
For eg: set mps+=<:> (to include <> pair)
put the above setting in .vimrc file and restart
vim.
- To get rid of highlighting when you quit insert
mode, add this mapping in your vimrc
noremap! <Esc> <Esc>:match NONE<CR>
To test how this plugin works type something like
{
( ) [ ]
( ( ( ) ) )
}
Happy vimming.
ShowFunc.vim - Creates a list of all tags - functions from a window, all windows
or buffers. vim online
This script creates a hyperlink list of all the tags (i.e.
functions, subroutines, classes, macros or procedures) from a single
buffer, all open windows or buffers and displays them in a dynamically
sized cwindow.
Supported File types with Exuberant Ctags version 5.5.4 (and newer): Asm,
Asp, Awk, Beta, C, C++, c#, Cobol, Eiffel, Erlang, Fortran, Java,
Javascript, Lisp, Lua, Make, Pascal, Perl, PHP, Python, PL/SQL, REXX,
Ruby, Scheme, Shell, SLang, SML, SQL, Tcl, Vera, Verilog, Vim, YACC......and
any user defined (i.e. --regex-lang=) types.
Default Key Mappings:
<F1> Run scan and open cwindow.
To reassign add the following to your .vimrc:
map NewKey <Plug>ShowFunc
map! NewKey <Plug>ShowFunc
For example to change the <F1> mapping to <F7>
map <F7> <Plug>ShowFunc
map! <F7> <Plug>ShowFunc
ShowFunc Window commands:
c Close cwindow.
h Display help dialog.
r Refresh.
s Change file sort, results will appear in either alphabetical or
file order. (Default: file order)
t Change scan type, results will be from either the current file,
all open windows or all open buffers. (Default: all open buffers)
| |
| install details |
Put this file in the vim plugins directory (~/.vim/plugin/) to load
it automatically, or load it with :so ShowFunc.vim.
You need Exuberant CTags installed for this function to work.
Website: http://ctags.sourceforge.net/
Source:
http://prdownloads.sourceforge.net/ctags/ctags-5.5.4.tar.gz
Redhat/Fedora RPM: http://prdown |