|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Nikolai
Bezroukov. Portraits of Open Source Pioneers
For readers with high
sensitivity to grammar errors access to this page is explicitly prohibited :-)
Introduction to the Unix shell history
The UNIX system was one of the first operating systems that didn't make the command
interpreter a part of the operating system or a privileged task. It
became an application program somewhat similar to compilers. This idea, like
many other Unix ideas, was taken from Multics. This architectural decision
made Unix a fertile ground for scripting development and Unix essentially pioneered
scripting as we know it. It was the environment in which AWK, C-shell,
ksh and
later Perl emerged and gain popularity.
I would like to stress again that shell in Unix was a stand-alone program, a command interpreter with few special
permissions and kernel level calls. This rather novel concept proved to be fruitful
and has led to a succession of better and better shells.
Unix shells have a long history and we can talk about four distinct generations
of Unix shells:
- First generation shells (Thomson shell and Mashey shell)
- Second generation shells (C-shell and Bourne shell, which were developed
in parallel and in which Bill Joy team provided really strong competition
to professionals from AT&T ;-). In many ways C-shell represented a real
breakthrough as it was more programmable (even the name implies closeness to
C) and more flexible then any of the previous generations of the shells
- Third generation shells (tcsh and ksh88)
- Fourth generation shells (ksh93, bash. zsh and, especially,
Microsoft Power Shell)
First generation shells
The first generation shells were descendant of Multics shell. The first of them
was Thomson shell. It was the first Unix shell and as such it was very primitive
with only basic control structures and no variables. The shell's design was intentionally
minimalistic; even the if and goto statements, essential for control
of program flow, were implemented as separate external commands [
Thompson shell
- Wikipedia, the free encyclopedia] Despite its shortcomings Thomson shell was
a definite improvement over the shell that people used most (IBM JCL).
An important early feature of the Thompson shell, new in comparison with Multic,
was more compact syntax for input/output redirection. Multics shell used separate
commands for redirection of the input or output of a command. One command was needed
to start redirection and one to stop it; in Unix, one could simply add an argument
consisting of the "< " symbol followed by a filename for input or the ">" symbol
for output to the command line, and the shell would redirect I/O for the duration
of the command. This syntax was already present by the release of the first version
of Unix in 1971. A later addition to Thomson shell was the concept of pipes.
At the suggestion of Douglas McIlroy, the redirection syntax was expanded so that
the output of one command could be passed to the input of another command. By Version
4, the "|" symbol was adopted to use for pipes. Both redirection symbols and
semantic and pipe symbol and semantic survived to those days and was used in other
operating system including DOS and Windows.
The Mashey shell (also known as PWB shell) was the second representative of the
first generation of Unix shells. It was written and maintained by John
Mashey (now a chief scientist at Silicon Graphics, who since 1998 owns a nameplate
UNIX in California, previously owned by Ted Dolotta ;-). It was distributed
with Programmer's Workbench UNIX one of early versions of Unix that existed from
1975 to 1977.
As Mashey recollected in
Languagesm
Levels, Libraries and Longevity
In 1970, “real computers” were still mainframes, although
minicomputers were seeing increasing use. The DEC (Digital Equipment
Corporation) 16-bit PDP-11 was introduced in 1970, and of particular
importance, the PDP-11/45 appeared in 1972, with up to 248 KB of MOS
(metal-oxide semiconductor) memory. By 1975, the PDP-11/70 allowed a
huge increase to 4 MB, although each program was still restricted to
64 KB instructions and 64 KB data. Some sites supported 16
simultaneous users on an 11/45, and with heroic effort, 48 on an
11/70. The VAX-11/780 was introduced in 1977 and spread lower-cost
32-bit computing more widely. By the end of the decade,
minicomputers were “real computers,” and 32-bit microcomputers were
beginning to appear.
In 1970, there was widespread use of applications languages such
as Fortran, Cobol, and PL/I, but many applications’ and most
systems’ codes were still written in assembly language, and the idea
that an operating system would be portable among machines was
laughable. In the end, Unix was ported to many systems, C was widely
used, and applications were being written with various combinations
of higher-level tools.
In 1973, the PWB (Programmer’s Workbench) began in a Bell Labs
software tools department.4 It supported a 1,000-person division
that produced database and communications application software
products that ran on various mainframes and minicomputers. It wished
to move many programming activities off expensive mainframes onto a
common Unix-based development environment to avoid the creation of
unique support software for each target system. Programming
departments needed to be convinced to change their ways, that Unix
was a good thing, that minicomputers were not toys, and that they
should transfer budget to the tools department for more PDP-11s.
In 1973, most of the several dozen existing Unix systems were the
property of individual departments, used by small numbers of people
for their own projects and administered informally, sometimes with
minimal security. The PWB site was the first in Bell Labs to run a
“Unix computer center” for shared general use among departments,
including typing pools. For years it was the largest single Unix
site, and it often endured early encounters with problems of
scalability, system administration, charging, security, automation,
and usability for nontechnical users.
In 1973, Ken Thompson’s Unix shell was primarily used as an
interactive interpreter, but had some rudimentary scripting ability,
including separate IF and GOTO commands. In 1974, I used shell
scripts to build a small document management package for a potential
client department and found this to be a great way to build such
software quickly, but with awkward restrictions. In 1975 and 1976
the PWB’s shell got simple variables, better control structures
(IF-THEN-ELSE-ENDIF, SWITCH, WHILE), and interrupt-catching. The
variables that later became $HOME (home directory) and $PATH
(variable search path for commands) date from this effort.
Shell programming rapidly became a widespread mechanism for PWB
users to help automate their work.5 Substantial CPU time was
consumed by shell procedures, to the point where previously separate
commands, such as IF, GOTO, and SWITCH, were moved into the shell
itself with substantial performance improvements. Steve Bourne was
then working on a brand-new shell in Computing Research and, after
much discussion, evolved a fresh design whose performance and
features were interesting enough to eventually replace the PWB
shell. The variables got generalized into the “environment
variables” designed for 7th Edition Unix.
Al Aho, Peter Weinberger, and Brian Kernighan had written awk,
the philosophical ancestor of some popular current scripting
languages. In the 1970s, Bell Labs was busily constructing computer
systems to improve Bell System operations, and many were built on
Unix and even used scripting languages in delivered software. CRAS
(Cable Repair Adminstrative System) was a data-mining software
package that integrated data from several other systems, was
distributed between IBM mainframes and Unix minicomputers, had to be
deployed quickly, and was sensitive to organization-dependent
requirements in a time of major reorganization.6 The first version
included 10 KLOC (thousands of lines of code) of C plus 15 KLOC of
shell+awk scripts and was modified quickly in the field to adapt to
newly revealed customer requirements. A large listing of these
scripts appeared on Kernighan’s desk—to his great surprise, as the
awk writers had never expected such extensive use in production.
Shell scripting used late-binding, high-level interpretation to
combine higher-performance, compiled components. Awk gave us a more
flexible language above C, although we sometimes later converted
heavily used awk to C for performance, after requirements had
settled. We sometimes wished for an awk compiler. Raising the
language level enabled vast improvements in productivity in that
decade, as C replaced assembly language, and script-level languages
greatly augmented C.
While it was a derivative of Thomson shell it introduced several features
that make shell more suitable for programming. Among important
innovations introduced by Mashey shell:
- The if and goto commands
were made internal to the shell, and switch and while constructs were
introduced.
- Simple variables could be used, although their names were limited to
one letter and some letters were reserved for special purposes.
- The $ character,
used previously for identifying arguments to a shell script,
became the marker for dereferencing a variable, and could be used to insert a variable's
value into a string in
double
quotes (this feature became standard in many scripting languages like Perl
and PHP).
The second generation of shells
The next, the second generation of shells was defined by C-shell, Bourne shell
which were developed in parallel. And that parallelism actually shows
tremendous difference in talents between Bill Joy and Steve Bourne.
While Bourne shell was hampered by compatibility requirements, Steve
Bourne did a remarkably poor job in area of lexical structure and
syntax. It's far to say that Bourne shell has note so any reference to
his previous experience with Algol68 compilers might be well considered
as a joke. It's a shame for any compiler writer. In a way we still
are suffering from Steve Borne mistakes and blunders. Laterly he
tried to promotea a revisionist version of history and in his
latest interview to Computer World (Bourne
shell, or sh - Computerworld) forgot to mention that the
prototype for his shell was Mashey shell not the original Thompson
shell and the former did had switch and while constructs
(see above; also see Mashe shell man page):
My own interest,
before I went to
Bell Labs, was
in programming
language design
and compilers.
At Cambridge I
had worked on
the language
ALGOL68 with
Mike Guy. A
small group of
us wrote a
compiler for
ALGOL68 that we
called ALGOL68C.
We also made
some additions
to the language
to make it more
usable. As an
aside we boot
strapped the
compiler so that
it was also
written in
ALGOL68C.
When I arrived
at Bell Labs a
number of people
were looking at
ways to add
programming
capabilities
such as
variables and
control flow
primitives to
the original
shell. One day
[mid 1975?]
Dennis [Ritchie]
and I came out
of a meeting
where somebody
was proposing
yet another
variation by
patching over
some of the
existing design
decisions that
were made in the
original shell
that Ken wrote.
And so I looked
at Dennis and he
looked at me and
I said “you know
we have to re-do
this and
re-think some of
the original
design decisions
that were made
because you
can’t go from
here to there
without changing
some fundamental
things”. So that
is how I got
started on the
new shell.
Was there a
particular
problem that the
language aimed
to solve?
The primary
problem was to
design the shell
be a fully
programmable
scripting
language that
could also serve
as the interface
to users typing
commands
interactively at
a terminal.
First of all, it
needed to be
compatible with
the existing
usage that
people were
familiar with.
There were two
usage modes. One
was scripting
and even though
it was very
limited there
were already
many scripts
people had
written. Also,
the shell or
command
interpreter
reads and
executes the
commands you
type at the
terminal. And so
it is
constrained to
be both a
command line
interpreter and
a scripting
language. As the
Unix command
line
interpreter, for
example, you
wouldn’t want to
be typing
commands and
have all the
strings quoted
like you would
in C, because
most things you
type are simply
uninterpreted
strings. You
don’t want to
type ls
directory
and have the
directory name
in string quotes
because that
would be such a
royal pain.
Also, spaces are
used to separate
arguments to
commands. The
basic design is
driven from
there and that
determines how
you represent
strings in the
language, which
is as
un-interpreted
text. Everything
that isn’t a
string has to
have something
in front of it
so you know it
is not a string.
For example,
there is $ sign
in front of
variables. This
is in contrast
to a typical
programming
language, where
variables are
names and
strings are in
some kind of
quote marks.
There are also
reserved words
for built-in
commands like
for loops
but this is
common with many
programming
languages.
So that is one
way of saying
what the problem
was that the
Bourne Shell was
designed to
solve. I would
also say that
the shell is the
interface to the
Unix system
environment and
so that’s its
primary
function: to
provide a fully
functional
interface to the
Unix system
environment so
that you could
do anything that
the Unix command
set and the Unix
system call set
will provide
you. This is the
primary purpose
of the shell.
One of the other
things we did,
in talking about
the problems we
were trying to
solve, was to
add environment
variables to
Unix system.
When you execute
a command script
you want to have
a context for
that script to
operate in. So
in the old days,
positional
parameters for
commands were
the primary way
of passing
information into
a command. If
you wanted
context that was
not explicit
then the command
could resort to
reading a file.
This is very
cumbersome and
in practice was
only rarely
used. We added
environment
variables to
Unix. These were
named variables
that you didn’t
have to
explicitly pass
down from the
parent to the
child process.
They were
inherited by the
child process.
As an example
you could have a
search path set
up that
specifies the
list of
directories to
used when
executing
commands. This
search path
would then be
available to all
processes
spawned by the
parent where the
search path was
set. It made a
big difference
to the way that
shell
programming was
done because you
could now see
and use
information that
is in the
environment and
the guy in the
middle didn’t
have to pass it
to you. That was
one of the major
additions we
made to the
operating system
to support
scripting.
How did it
improve on the
Thompson shell?
I did change the
shell so that
command scripts
could be used as
filters. In the
original shell
this was not
really feasible
because the
standard input
for the
executing script
was the script
itself. This
change caused
quite a
disruption to
the way people
were used to
working. I added
variables,
control flow and
command
substitution.
The case
statement
allowed strings
to be easily
matched so that
commands could
decode their
arguments and
make decisions
based on that.
The for
loop allowed
iteration over a
set of strings
that were either
explicit or by
default the
arguments that
the command was
given.
I also added an
additional
quoting
mechanism so
that you could
do variable
substitutions
within quotes.
It was a
significant
redesign with
some of the
original flavor
of the Thompson
shell still
there. Also I
eliminated
goto in
favour of flow
control
primitives like
if and
for. This
was also
considered
rather radical
departure from
the existing
practice.
Command
substitution was
something else I
added because
that gives you
very general
mechanism to do
string
processing; it
allows you to
get strings back
from commands
and use them as
the text of the
script as if you
had typed it
directly. I
think this was a
new idea that I,
at least, had
not seen in
scripting
languages,
except perhaps
LISP.
How long did
this process
take?
It didn’t take
very long; it’s
surprising. The
direct answer to
the question is
about maybe 3-6
months at the
most to make the
basic design
choices and to
get it working.
After that I
iterated the
design and fixed
bugs based on
user feedback
and requests.
I honestly don’t
remember exactly
but there were a
number of design
things I added
at the time. One
thing that I
thought was
important was to
have no limits
imposed by the
shell on the
sizes of strings
or the sizes of
anything else
for that matter.
So the memory
allocation in
the
implementation
that I wrote was
quite
sophisticated.
It allowed you
to have strings
that were any
length while
also maintaining
a very efficient
string
processing
capability
because in those
days you
couldn’t use up
lots of
instructions
copying strings
around. It was
the
implementation
of the memory
management that
took the most
time. Bugs in
that part of any
program are
usually the
hardest to find.
This part of the
code was worked
on after I got
the initial
design up and
running.
The memory
management is an
interesting part
of the story. To
avoid having to
check at run
time for running
out of memory
for string
construction I
used a less well
known property
of the sbrk
system call. If
you get a memory
fault you can,
in Unix,
allocate more
memory and then
resume the
program from
where it left
off. This was an
infrequent event
but made a
significant
difference to
the performance
of the shell. I
was assured at
the time by
Dennis if this
was part of the
sbrk
interface
definition.
However,
everyone who
ported Unix to
another computer
found this out
when trying to
port the shell
itself. Also at
that time at
Bell Labs, there
were other
scripting
languages that
had come into
existence in
different parts
of the lab.
These were
efforts to solve
the same set of
problems I
already
described. The
most widely used
“new” shell was
in the
programmer’s
workbench --
John Mashey
wrote that. And
so there was
quite an
investment in
these shell
scripts in other
parts of the lab
that would
require
significant cost
to convert to a
the new shell.
The hard part
was convincing
people who had
these scripts to
convert them.
While the shell
I wrote had
significant
features that
made scripting
easier, the way
I convinced the
other groups was
with a
performance bake
off. I spent
time improving
the performance,
so that probably
took another, I
don’t know, 6
months or a year
to convince
other groups at
the lab to adopt
it. Also, some
changes were
made to the
language to make
the conversion
of these scripts
less painful.
How come it
fell on you to
do this?
The way it
worked in the
Unix group [at
Bell Labs] was
that if you were
interested in
something and
nobody else
owned the code
then you could
work on it. At
the time Ken
Thompson owned
the original
shell but he was
visiting
Berkeley for the
year and he
wasn’t
considering
working on a new
shell so I took
it on. As I said
I was interested
in language
design and had
some ideas about
making a
programmable
command
language.
A very influential development of AWK
by far more talanted team which was far more advanced design,
understanding of scripting and clean lexical and syntax levels happened
on sidelines. Like in any large organization left hand in Bell Labs does
not know what right hand was doing. Due to this unfortunate fact and
complete lack of architectural vision of Unix project managers it did not get enough traction to
be extended into replacement or more tightly integrated into shell.
BTW the fact the Steve Bourne was unable/unwilling to borrow ideas from
AWK development is also pretty telling.
Another parallel development was REXX which later
was used as a shell in VM/CMS environment:
- AWK was originally written in 1977,
and distributed with Version 7 Unix. The book The AWK Programming Language,
the first scripting book was published 1988. The designers of AWK were
higher class language designers then Stephen Bourne but unfortunately they did
not get enough traction/support to extend AWK into full fledged shell. Notion
that Unix should consist of small number of tools backfired in this case.
Not only AWK became instantly popular when it was introduced but it
also stimulated the creation of the next generation of scripting
languages: 10 years later Larry Wall created Perl, which is now one
of the most popular scripting language in the world. Despite its age
AWK remains
important standard Unix utility.
The prototype for
the development of
AWK was GREP. But GREP
had a very limited
form of pattern
action processing,
so the authors generalized
the capabilities of GREP considerably.
They also managed to
exploit similarity
between string
matching and the compiler
construction tools LEX
(generator of
lexical scanners) and YACC
(generator of
lexical analyzers) These
compiler
construction
utilities were
widely used in Bell
labs to create
specialised little
languages ( Brian
Kernighan was one of
the first users).
For example
LEX provide the
language with a
distinct "lexical"
level: program can
be converted into a
sequence of lexemes
or tokens. The
latter
are sequences of
characters that make
up logical units
like identifier,
quoted string,
comment, etc.
This is classic
first pass that the
compilers perform on
the input and it a
distinct lexical
structure is a must
for any solid
scripting language.
AWK was designed as a
specialized language
for processing text files
treated as a
sequence of
lines but proved to
be more powerful
than that. An AWK program
is of a sequence of
pattern-action
statements. AWK
reads the input a
line at a time
splitting records
into fields. Each
line is scanned for
patterns present in
AWK program and for
each
match, the
associated action is
executed. The
patterns can be
Boolean combinations
of strings and
numbers; the actions
can be statements in
a C-like programming
language.
Here is how
Axo recollected the
events(AWK
- Computerworld):
AWK was developed by
three people: me,
Brian Kernighan and
Peter Weinberger.
Peter Weinberger was
interested in what
Brian and I were
doing right from the
start. We had
created a
grammatical
specification for
AWK but hadn't yet
created the full
run-time
environment.
Weinberger came
along and said 'hey,
this looks like a
language I could use
myself', and within
a week he created a
working run time for
AWK. This initial
form of AWK was very
useful for writing
the data processing
routines that we
were all interested
in but more
importantly it
provided an
evolvable platform
for the language.
One of the most
interesting parts of
this project for me
was that I got to
know how Kernighan
and Weinberger
thought about
language design: it
was a really
enlightening
process! With the
flexible compiler
construction tools
we had at our
disposal, we very
quickly evolved the
language to adopt
new useful syntactic
and semantic
constructs. We spent
a whole year
intensely debating
what constructs
should and shouldn't
be in the language.
Language design is a
very personal
activity and each
person brings to a
language the classes
of problems that
they'd like to
solve, and the
manner in which
they'd like them to
be solved. I had a
lot of fun creating
AWK, and working
with Kernighan and
Weinberger was one
of the most
stimulating
experiences of my
career. I also
learned I would not
want to get into a
programming contest
with either of them
however! Their
programming
abilities are
formidable.
Interestingly, we
did not intend the
language to be used
except by the three
of us. But very
quickly we
discovered lots of
other people had the
need for the routine
kind of data
processing that AWK
was good for. People
didn't want to write
hundred-line C
programs to do data
processing that
could be done with a
few lines of AWK, so
lots of people
started using AWK.
For many years AWK
was one of the most
popular commands on
UNIX, and today,
even though a number
of other similar
languages have come
on the scene, AWK
still ranks among
the top 25 or 30
most popular
programming
languages in the
world. And it all
began as a little
exercise to create a
utility that the
three of us would
find useful for our
own use.
- Steve Bourne, at Bell Labs, worked on extension of Mashey shell starting from
1975 and this shell was released in 1977-1978 as Bourne shell. Steve previously
was involved with the development of Algol-68 compiler but that influenced
his design only very superficially as Algol-68 syntax sugar in if
statement and loops (if ... fi syntax is the trademark
of Algol86; paradoxically a non Algol-68 syntax was used for
the for loop for ... do ... done ). As a language it was a huge
step back from AWK, almost a disaster and inability of AT&T management to see
the difference is a huge black spot in Unix history and Bell Labs
history. In retrospect Algol-68
syntax sugar looks bizarre and completely out of line with the rest of the Unix
and first of all with the C-language. And what is really strange for
the designer who came from Algol-68 compiler development
environment, the first version did not provided for comments (as
Sven Mascheck
noted comments were available via ":"(colon) command, but not
without side effects). Still despite all shortcomings and weakness
of Steve Bourne as a developer Bourne shell made some steps in formulating
Unix component development paradigm: use many small, flexible programs as tools
glued by shell. Despite huge shortcomings it was "good enough" and it became
a classic Unix shell and slowed down shell development for many decades.
Actually it's a shame that the weight of Bell Labs prevented
adoption of C-shell as the standard shell and further development
along C-shell lines.
The Bourne shell, became the default Unix shell
of
Unix Version 7. Despite huge shortcomings, until recently Bourne
shell for some strange reasons remained the default shell in Solaris and
a couple of
other commercial Unixes.
-
C shell
(csh) was a ground breaking shell that was written by Bill Joy and first distributed
with BSD in 1978-79. Bill Joy essentially wiped the floor with AT&T shell design
team (and personally with Steve Bourne) introducing more elegant and more compatible with C scripting language
which generated many ideas and tremendous following. Unfortunately Bill Joy
did not continue the development and shell stagnated. Despite this C-shell and
its modification (tcsh) remained best interactive shells till probably year
2000.
Not only C-shell was vastly superior interactive shell, C-shell innovations
brought it closer to scripting language. Among them "built-ins" where several
of the most common utilities were built into the shell. Another very important
innovation was C-like syntax that made it easier for C-programmers to use the
shell and made it look like a new albeit slightly strange programming language,
not just a job control language. Features like built-in arithmetic operations
and C language expression syntax were a huge move forward in comparison with
Bourne shell. That happened partially due to the increased capacity of the machines
that became standard for this period. But again the ides of syntax compatibility
with C was the key idea that made C-shell the most close ancestor to modern
scripting languages.
-
The first important non-Unix scripting
language and shell that was more or less widely used was probably
REXX which was
designed and first implemented between 1979 and mid-1982 by Mike Cowlishaw of
IBM. It trails AWK by several years, but by and large can be considered
parallel independent reinvention of the concept. The pioneering
feature of REXX was that it from the beginning was design a macro
language and can be used as an internal language for applications.
One such application (Xedit) survived from those dinosaur epoch. It was used as shell
in VM/CMS and later in Amiga OS and OS/2
The third generation of shells
The third generation of shells was by-and
large a reaction of C-shell. It includes the extension of C-shell called tcsh
and Korn shell (version which later became known as ksh88). POSIX shell belongs
to the same category.
-
In 1981 C-shell was extended into tcsh, which for a long time
(probably until 2000) was the most popular interactive shell in Unix. Tcsh is C shell with file
name completion, command line editing and other features such as enhanced variable
modifiers. All this made tcsh one of the most convenient shells for interactive users.
In tcsh environment variables are defined in upper and lowercase. Usually uppercase
parameters refer to parameters seen by all shells(global
environment), while lowercase parameters
are specific to the particular instance of shell specific. As an example the path variable is set both as PATH and path.
Uppercase parameters are listed by the built-in command environment and lowercase
parameters are listed/set by the built-in command set run without options.
It was influenced by TENEX OS. TENEX and TOPS-20 up until version
3, had command completion via a user-code-level subroutine library
called ULTCMD. Tcsh provides an emacs and vi command-line
editing modes. While it was based on Joy's original source, many people have
added to tcsh since. Over 50 names are listed as contributors in the 6.08 man
page.
-
The Korn
shell (ksh) was a reaction to C-shell which was developed by
David Korn
(AT&T Bell
Laboratories) in the early
1980s. It is
backwards compatible with the
Bourne
shell and includes many features of the
C shell as
well as new features like a
command history, which was inspired by the requests of Bell Labs users.
It was released in 1983. Essentially he attempted to incorporate main
ideas of C-shell into Borne shall framework. Ksh proved to more powerful and
flexible shell than the Bourne shell, but it failed to match either C-shell
or Tcsh as interactive shell. Among other things it lacked arrow history browsing,
the "!!" and "!$" features of C-shell, etc.
David Korn was more talanted shell designer then Steve Bourne and
the shell was much more solid with more clean semantic (syntax and
lexical level remained a mess due to compatibility requirements).
The main problem with ksh was that while in power it approach
scripting languages it did not have a build-in debugger.
The original version of Korn Shell, which is now usually called ksh88 became
a dominant shell for commercial Unixes. It introduced several interesting attributes
(read-only, uppercase, lowercase attributes for strings, etc.). Due to restrictions
in distribution (it was a commercial product, not an open source) several free
open source clones like pdksh and bash were created. Due to popularity of
bash, usage of Korn shell currently is now mainly limited to commercial environment
and power users who understand the difference in the quality of implementation
or need more compatible implementation between Unix and Windows.
-
Some attempts to standardize shell were attempted by POSIX
group as a reaction from IBM and others to the treat to ST&T Sun alliance. The
level of standardization was very low as they took ksh88 as a base ("status
quo standardization") and the standard does not proposed any interesting features.
Forth generation of shells
Perl, the first "post-modern" scripting language was first released
in 1987. It represented a breakthrough that lifted scripting of a
qualitatively new level. It was immediately clear that for Unix scripts
Perl represent a much better language that shells although in some parts
in was lower level language (limited support of pipes). All subsequent
popular scripting languages like Python, PHP, Ruby to name a few are in
a way derivatives of Perl. The forth generation of shell were also
influenced by Perl popularity and one of them (ksh93) was, in part, a reaction to
Perl popularity.
Another important development was the creation of Tcl. Tcl was first scripting
language designed to be a macro language for other applications. While Perl was
an attempt to integrate functionality of the shell and AWK into a single product
TCL extended shell functionality into new area. TCL never managed to Neither replace
shell (and it was never intended to), although it is probably not so difficult to
create a shell based on Tcl (one was created on the base of Korn shell by David
Korn son). There were also some attempts to use Perl as a shell (Perl shell).
Two new "forth-generation" shells were written after Perl was created and
incorporated some innovations introduced by Perl and TCL but in old shell-style
framework:
- In 1990, Paul Falstad wrote zsh, a superset of the ksh88 which also had
many csh features. Z-shell was the latest attempt to merge C-shell and
Korn shell and it provided enhancements in globbing and other areas in non-orthogonal
Perl-style manner. Some people dismiss zsh as too bloated and suffering from
the feature creep. Still along with the ksh93 it is still probably the
most powerful shell for Unix and many power users like its power and flexibility.
As David Korn remarked:
"There are two different areas of functionality
in shells. First is interactive use and the second is scripting. Much of
the debate about shells has focused on interactive use only. For example,
tcsh is an acceptable shell for interactive use but practically unusable
for scripting."
It proved to be very difficult to reconcile those two areas and zsh remains
one of the important early attempts in this direction.
-
Ksh93 was later and more coherent attempt to merge shell with
some functionality of Tcl and Perl and integrate their most interesting
features back into the shell framework. As a programming language, it has comparable
speed and functionality to each of these languages. But still string manipulation
was weaker and due to "compatibility curse" it used pretty wild syntax. That
was an important drawback.
Also David Korn make a strategic mistake of not providing a build-in debugger
which was necessary for such a complex language.
All-in-all the attempt to match Tcl and Perl largely failed, but the power of
shell is impressive and it became a inspiration for other shell developers (bash
3.0 is one example, see below, but many complain that it's too bloated). Ksh93
can be considered as a superset of the POSIX 1003.2 shell standard. Paradoxically
due to the quality of Uwin ksh93 is widely used on windows (it is a part as
Uwin environment; in terms of technology, Uwin's process management was certainly
much faster, and I/O subsystem is much better then Cygwin. The system runtime
uses AST library, which is also very well done). One of the major Indian outsourcers
Wipro for some time supported Uwin as a product. Like Tcl, ksh93 is extensible
and embeddable with a C language API. There are also two graphical shells based
on Ksh93 -- dtksh, is a Motif based language developed by Novell (somewhat
popular of Solaris as it is included in CDE environment) and Tksh, written by
Jeff Korn at Princeton University (it includes TCL functionality directly into
the shell). None got any significant following.
In March 2000 AT&T opened ksh93 and the whole Uwin package and
licensed them using very liberal the
Common Public
License Version 1.0 (CPL-1.0). Here is the announcement from
kornshell.com
March 1, 2000: I am happy to
announce the 'i' point release of ksh93 is now available for download. For
the first time, source is available as well as binaries for several architectures.
If you build binaries for new architectures, and send them to us, we can
add them to the download site. The
download
page has been completely revised in a manner that hopefully will be
easier to use. ksh93 is part of the ast-open package. tksh (ksh with tk
support) is also part of this package.
-
bash at the beginning was a "retro" shell, shell which did
not introduced any important innovations, and which was written as an alternative
to commercial Korn shell for purely ideological reasons: it was part of GNU
project. With time it was extended incorporating most ksh88 and, later,
some ksh93
features. Bash was originally written by Brian Fox of the Free Software Foundation.
In 1997 Bash 2.0 closed the gap in power with ksh88, while some
features remain broken (pipe implementation). Later bash 3.2 became pretty
close in functionality to ksh93. With version 3.2 became prominent forth
generation shell of its own, eclipsing all others members of this family.
In 2004 Bash 3.0 introduced a built-in debugger, the feature
that unfortunately was absent in previous shells (although there were add-on
scripts for ksh that provided some debugging capabilities). At the same
time size of the code raise suspicions about "bloatware dead end" of this open
source product.
-
Hamilton
C shell was another C-shell extension.
See the
on-line user guide for details. It is a commercial shell but
a free demo version is
available for download.
-
Windows PowerShell (formerly know as Monad) is ksh-style shell
with some Microsoft specific extensions. It integrates with
.NET Framework and provides an environment to perform administrative tasks
by execution of cmdlets (pronounced commandlets) which are specialized
.NET classes implementing a particular operation. Windows PowerShell also provides
a hosting mechanism with which the Windows PowerShell runtime can be embedded
inside other applications, which can then leverage Windows PowerShell functionality
to implement certain operations, including those exposed via the graphical interface.
Back into the future -- Retro shells
There was also distinct line of development that can be called retro-shells.
One of the most prominent members of this family was pdksh which was an attempt
to re-implement ksh88. It did not get much traction and only bash,
another retro-shell, remains viable
those days as it is used as the standard shell in Linux.
Possible future developments
Still one needs to understand that currently shells are pretty archaic family
of scripting languages and outside of interactive usage their generally outlived
their usefulness. That's why for more or less complex tasks Perl is usually used
instead of shells.
While shells continued to improve since the original C-shell and Korn shell (Borne
shell from the language standpoint is just a joke), the shell syntax is frozen in
space and time and now looks completely archaic. There are a large number
of problems with this syntax as it does not cleanly separate lexical analysis from
syntax analysis.
Some syntax features in shell are idiosyncratic as Steve Bourne played with Algol
68 before starting work on the shell. He proved to be a bad language designer: there
is very little logic in how different types of blocks are ended. Conditional
statements end with broken classic Algor-68 the reverse keyword syntax: 'if
condition; then echo yes; else echo no; fi', but loops are structure
like perverted version of PL/1 (do; done;) , individual case branches blocks ends
with ';;' . Functions have C-style bracketing "{", "}". M.
D. McIlroy should be ashamed as the result ;-).
Also the original Bourne shell was a almost pure macro language. It performed variable
substitution, tokenization and other operations on one line at a time without understanding
the underlying syntax. This results in many unexpected side effects: Consider a
simple commandrm $file
If variable $file is accidentally contains space that leads to treating it as two
separate augments to the rm command with possible nasty side effects. To fix
this, the user has to make sure every use of a variable in enclosed in quotes, like
in rm "$file".
Variable assignments in Bourne shell are whitespace sensitive. 'foo=bar'
is an assignment, but 'foo = bar' is not. This is another strange idiosyncrasy.
There is also an overlap between aliases and functions. Aliases are positional
macros that are recognized only as the first word of the command like in classic
alias ll='ls -l'. Because of this, aliases have several limitation
limitations:
- You can only redirect input/output to the last command in the alias.
- You can only specify arguments to the last command in the alias.
- Alias definitions are a single text string, this means complex functions
are nearly impossible to create.
functions are not positional and can in most cases emulated aliases functionality:
ll() { ls -l $*; }
The curly brackets are some sort of pseudo-commands, so skipping the semicolon in
the example above results in a syntax error. As there is no clean separation between
lexical analysis and syntax analysis removing the whitespace between the opening
bracket and 'ls' will also result in a syntax error.
Since the use of variables as commands is allowed, it is impossible to reliably
check the syntax of a script as substitution can accidentally result in key word
as in example that I found in the paper about fish (not that I like or recommend
fish):
if true; then if [ $RANDOM -lt 1024 ]; then END=fi; else END=true; fi; $END
Both bash and zsh try to determine if the command in the current buffer is finished
when the user presses the return key, but because of issues like this, they will
sometimes fail.
One way to alleviate those problems is the introduction of deprecation mechanism.
It can be done via POSIX framework (which currently is stagnant).
Clean definition of lexical level in shells can help too. As the number
of legacy scripts is substantial any improvements can be done only with the simultaneous
introduction of automatic converters.
Paradoxically currently the only sizable entity working on improvement of shells
is Microsoft. According to Wikipedia:
Microsoft is working on the next version of PowerShell and has made a
CTP release of the same publicly available. It includes changes to the scripting
language and hosting API, in addition to including a number of cmdlets. A non-exhaustive
list of the new features is:[27]
- PowerShell Remoting: Using
WS-Management, PowerShell 2.0 allows scripts and cmdlets to be invoked
on a remote machine or a large set of remote machines.
- Background Jobs: Also called a PSJob, it allows a command
sequence (script) or pipeline to be invoked asynchronously. Jobs can be
run on the local machine or on multiple remote machines. A PSJob cannot
include interactive cmdlets.
- ScriptCmdlets: These are cmdlets written using the PowerShell
scripting language.
- SteppablePipelines: This allows the user to control when the
BeginProcessing(), ProcessRecord() and EndProcessing()
functions of a cmdlet are called.
- Data Language: A domain-specific subset of the PowerShell scripting
language, that allows data definitions to be decoupled from the scripts
and allow localized string resources to be imported into the script at runtime.
- Script Debugging: It allows
breakpoints
to be set in a PowerShell script or function. Breakpoints can be set on
lines, line & columns, commands and read or write access of variables. It
includes a set of cmdlets to control the breakpoints via script.
- New Cmdlets: Including Out-GridView, which displays tabular
data in the
WPF
GridView
object.
- New Operators: -Split -Join and @ operators.
- New APIs: The new APIs range from handing more control over the
PowerShell parser and runtime to the host, to creating and managing collection
of Runspaces (Runspace Pools) as well as the ability to create restricted
Runspaces which only allow a configured subset of PowerShell to be invoked.
- Graphical PowerShell: PowerShell 2.0 includes a GUI-based PowerShell
host that provides integrated debugger,
syntax highlighting and up to 8 PowerShell consoles (Runspaces) in a
tabbed UI, as well as to run only the selected parts in a script.. However,
it is a very early release which suffers from performance problems with
large scripts and does not include tab completion.
Copyright © 1996-2009 by Dr. Nikolai Bezroukov.
www.softpanorama.org was
created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time.
Submit
comments This document is an industrial compilation designed and created
exclusively for educational use and is placed under the copyright of the
Open Content License(OPL).
Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made
for educational purposes only in compliance with the fair use doctrine.
Disclaimer:
- The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with.
- We do not warrant the correctness of the information provided or its
fitness for any purpose
- In no way this site is associated with or endorse cybersquatters
using
the term "softpanorama" with other main or country domains (e.g. softpanorama.com) with
bad faith intent to profit from the goodwill belonging to
someone else.
Last modified:
August 15, 2009