|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
The UNIX system was one of the first operating systems that didn't make the command interpreter a part of the operating system or a privileged task. This idea, like many other Unix ideas, was taken from Multics. This architectural decision made Unix a fertile ground for scripting development and Unix essentially pioneered scripting as we know it. It was the environment in which AWK, C-shell, and later Perl emerged and gain popularity.
Shell in Unix was a stand-alone program, a command interpreter with few special permissions and kernel level calls. This rather novel concept proved to be fruitful and has led to a succession of better and better shells.
Unix shells have a long history and we can talk about four distinct generations of Unix shells:
The first generation shells were descendant of Multics shell. The first of them was Thomson shell. It was the first Unix shell and as such it was very primitive with only basic control structures and no variables. The shell's design was intentionally minimalistic; even the if and goto statements, essential for control of program flow, were implemented as separate external commands [ Thompson shell - Wikipedia, the free encyclopedia] Despite its shortcomings Thomson shell was a definite improvement over the shell that people used most (IBM JCL).
An important early feature of the Thompson shell, new in comparison with Multic, was more compact syntax for input/output redirection. Multics shell used separate commands for redirection of the input or output of a command. One command was needed to start redirection and one to stop it; in Unix, one could simply add an argument consisting of the "< " symbol followed by a filename for input or the ">" symbol for output to the command line, and the shell would redirect I/O for the duration of the command. This syntax was already present by the release of the first version of Unix in 1971. A later addition to Thomson shell was the concept of pipes. At the suggestion of Douglas McIlroy, the redirection syntax was expanded so that the output of one command could be passed to the input of another command. By Version 4, the "|" symbol was adopted to use for pipes. Both redirection symbols and semantic and pipe symbol and semantic survived to those days and was used in other operating system including DOS and Windows.The Mashey shell (also known as PWB shell) was the second representative of the first generation of Unix shells. It was written and maintained by John Mashey (now a chief scientist at Silicon Graphics, who since 1998 owns a nameplate UNIX in California, previously owned by Ted Dolotta ;-). It was distributed with Programmer's Workbench UNIX one of early versions of Unix that existed from 1975 to 1977. While it was a derivative of Thomson shell it introduced several features that make shell more suitable for programming. The if and goto commands were made internal to the shell, and switch and while constructs were introduced. Simple variables could be used, although their names were limited to one letter and some letters were reserved for special purposes. The $ character, used previously for identifying arguments to a shell script, became the marker for dereferencing a variable, and could be used to insert a variable's value into a string in double quotes. (this feature became standard in many scripting languages like Perl and PHP).
The next, the second generation of shells was defined by C-shell, Bourne shell which were developed in parallel. A side but very influential development was AWK which was far more advanced then Borne shell but did not get enough traction to be extended into replacement. Another parallel development was REXX which later was used as a shell in VM/CMS environment:
C shell
(csh) was a ground breaking shell that was written by Bill Joy and first distributed
with BSD in 1978-79. Bill Joy essentially wiped the floor with AT&T shell design
team introducing more elegant and more compatible with C scripting language
which generated many ideas and tremendous following. Unfortunately Bill Joy
did not continue the development and shell stagnated. Despite this C-shell and
its modification (tcsh) remained best interactive shells till probably year
2000.
Not only C-shell was vastly superior interactive shell, C-shell innovations
brought it closer to scripting language. Among them "built-ins" where several
of the most common utilities were built into the shell. Another very important
innovation was C-like syntax that made it easier for C-programmers to use the
shell and made it look like a new albeit slightly strange programming language,
not just a job control language. Features like built-in arithmetic operations
and C language expression syntax were a huge move forward in comparison with
Bourne shell. That happened partially due to the increased capacity of the machines
that became standard for this period. But again the ides of syntax compatibility
with C was the key idea that made C-shell the most close ancestor to modern
scripting languages.
The first important non-Unix scripting language and shell that was more or less widely used was probably REXX which was designed and first implemented between 1979 and mid-1982 by Mike Cowlishaw of IBM. It trails AWK by several years, but by and large can be considered parallel independent reinvention of the concept. It was used as shell in VM/CMS and later in Amiga OS and OS/2
The third generation of shells was by-and large a reaction of C-shell. It includes the extension of C-shell called tcsh and Korn shell (version which later became known as ksh88). POSIX shell belongs to the same category.
In 1981 C-shell was extended into tcsh, which for a long time
became the most popular interactive shell in Unix. Tcsh is C shell with file
name completion, command line editing and other features such as enhanced variable
modifiers that make tcsh one of the most convenient shells for interactive users.
In tcsh environment variables are defined in upper and lowercase. Usually uppercase
parameters refer to parameters seen by all shells, while lowercase parameters
are tcsh specific. As an example the path variable is set both as PATH and path.
Uppercase parameters are listed by the built-in command environment and lowercase
parameters are listed/set by the built-in command set run without options.
It was influenced by TENEX OS. TENEX and TOPS-20 up until version
3, had command completion via a user-code-level subroutine library
called ULTCMD. Tcsh provides an emacs and vi command-line
editing modes. While it was based on Joy's original source, many people have
added to tcsh since. Over 50 names are listed as contributors in the 6.08 man
page.
The Korn
shell (ksh) was a reaction to C-shell which was developed by
David Korn
(AT&T Bell
Laboratories) in the early
1980s. It is
backwards compatible with the
Bourne
shell and includes many features of the
C shell as
well as new features like a
command history, which was inspired by the requests of Bell Labs users.
It was released in 1983. Essentially he attempted to incorporate main
ideas of C-shell into Borne shall framework. Ksh proved to more powerful and
flexible shell than the Bourne shell, but it failed to match either C-shell
or Tcsh as interactive shell. Among other things it lacked arrow history browsing,
the "!!" and "!$" features of C-shell, etc.
The original version of Korn Shell, which is now usually called ksh88 became
a dominant shell for commercial Unixes. It introduced several interesting attributes
(read-only, uppercase, lowercase attributes for strings, etc.). Due to restrictions
in distribution (it was a commercial product, not an open source) several free
open source clones like pdksh and bash were created. Due to popularity of
bash, usage of Korn shell currently is now mainly limited to commercial environment
and power users who understand the difference in the quality of implementation
or need more compatible implementation between Unix and Windows.
Some attempts to standardize shell were attempted by POSIX group as a reaction from IBM and others to the treat to ST&T Sun alliance. The level of standaratization was very low as they took ksh88 as a base (status quo standardization) and the standard does not proposed any interesting features.
Perl, the first "post-modern" scripting language was first released in 1987. It represented a breakthrough that lifted scripting of a qualitatively new level. It was immediately clear that for Unix scripts Perl represent a much better language that shells although in some parts in was lower level language (limited support of pipes). All subsequent popular scripting languages like Python, PHP, Ruby to name a few are in a way derivatives of Perl. The forth generation of shell were also influenced by Perl popularity and one of them (ksh93) was a reaction to it.
Another important development was the creation of Tcl. Tcl was first scripting language designed to be a macro language for other applications. While Perl was an attempt to integrate functionality of the shell and AWK into a single product TCL extended shell functionality into new area. TCL never managed to Neither replace shell (and it was never intended to), although it is probably not so difficult to create a shell based on Tcl (one was created on the base of Korn shell by David Korn son). There were also some attempts to use Perl as a shell (Perl shell).
Two new "forth-generation" shells were written after Perl was created and incorporated some innovations introduced by Perl and TCL but in old shell-style framework:
It proved to be very difficult to reconcile those two areas and zsh remains one of the important early attempts in this direction."There are two different areas of functionality in shells. First is interactive use and the second is scripting. Much of the debate about shells has focused on interactive use only. For example, tcsh is an acceptable shell for interactive use but practically unusable for scripting."
Ksh93 was later and more coherent attempt to merge shell with
some functionality of Tcl and Perl and integrate their most interesting
features back into the shell framework. As a programming language, it has comparable
speed and functionality to each of these languages. But still string manipulation
was weaker and due to "compatibility curse" it used pretty wild syntax. That
was an important drawback.
Also David Korn make a strategic mistake of not providing a build-in debugger
which was necessary for such a complex language.
All-in-all the attempt to match Tcl and Perl largely failed, but the power of
shell is impressive and it became a inspiration for other shell developers (bash
3.0 is one example, see below, but many complain that it's too bloated). Ksh93
can be considered as a superset of the POSIX 1003.2 shell standard. Paradoxically
due to the quality of Uwin ksh93 is widely used on windows (it is a part as
Uwin environment; in terms of technology, Uwin's process management was certainly
much faster, and I/O subsystem is much better then Cygwin. The system runtime
uses AST library, which is also very well done). One of the major Indian outsourcers
Wipro for some time supported Uwin as a product. Like Tcl, ksh93 is extensible
and embeddable with a C language API. There are also two graphical shells based
on Ksh93 -- dtksh, is a Motif based language developed by Novell (somewhat
popular of Solaris as it is included in CDE environment) and Tksh, written by
Jeff Korn at Princeton University (it includes TCL functionality directly into
the shell). None got any significant following.
In March 2000 AT&T opened ksh93 and the whole Uwin package and licensed them using very liberal the Common Public License Version 1.0 (CPL-1.0). Here is the announcement from kornshell.com
March 1, 2000: I am happy to announce the 'i' point release of ksh93 is now available for download. For the first time, source is available as well as binaries for several architectures. If you build binaries for new architectures, and send them to us, we can add them to the download site. The download page has been completely revised in a manner that hopefully will be easier to use. ksh93 is part of the ast-open package. tksh (ksh with tk support) is also part of this package.
bash at the beginning was a "retro" shell, shell which did
not introduced any important innovations, and which was written as an alternative
to commercial Korn shell for purely ideological reasons: it was part of GNU
project. With time it was extended incorporating most ksh88 and later ksh93
features. In 1997 Bash 2.0 closed the gap in power with ksh88, while some
features remain broken (pipe implementation). Later bash 3.0 became pretty
close in functionality to ksh93. With version 3.2 became prominent forth
generation shell of its own, eclipsing all others members of this family.
In 2004 Bash 3.0 introduced a built-in debugger, the feature
that unfortunately was absent in previous shells (although there were add-on
scripts for ksh that provided some debugging capabilities). At the same
time size of the code raise suspicions about "bloatware dead end" of this open
source product.
Hamilton
C shell was another C-shell extension.
See the
on-line user guide for details. It is a commercial shell but
a free demo version is
available for download.
Windows PowerShell (formerly know as Monad) is ksh-style shell and some microsoft specific extensions. It integrates with .NET Framework and provides an environment to perform administrative tasks by execution of cmdlets (pronounced commandlets) which are specialized .NET classes implementing a particular operation. Windows PowerShell also provides a hosting mechanism with which the Windows PowerShell runtime can be embedded inside other applications, which can then leverage Windows PowerShell functionality to implement certain operations, including those exposed via the graphical interface.
There was also distinct line of development that can be called retro-shells. One of the most prominent members of this family was pdksh which was an attempt to re-implement ksh88. It did not get much traction and only bash remains viable those days as it is used as the standard shell in Linux.
Still one needs to understand that currently shells are pretty archaic family of scripting languages and outside of interactive usage their generally outlived their usefulness. That's why for more or less complex tasks Perl is usually used instead of shells.
While shells continued to improve since the original C-shell and Korn shell (Borne shell from the language standpoint is just a joke), the shell syntax is frozen in space and time and now looks completely archaic. There are a large number of problems with this syntax as it does not cleanly separate lexical analysis from syntax analysis.
Some syntax features in shell are idiosyncratic as Steve Bourne played with Algol 68 before starting work on the shell. He proved to be a bad language designer: there is very little logic in how different types of blocks are ended. Conditional statements end with broken classic Algor-68 the reverse keyword syntax: 'if condition; then echo yes; else echo no; fi', but loops are structure like perverted version of PL/1 (do; done;) , individual case branches blocks ends with ';;' . Functions have C-style bracketing "{", "}". M. D. McIlroy should be ashamed as the result ;-).
Also the original Bourne shell was a almost pure macro language. It performed variable substitution, tokenization and other operations on one line at a time without understanding the underlying syntax. This results in many unexpected side effects: Consider a simple commandrm $fileIf variable $file is accidentally contains space that leads to treating it as two separate augments to the rm command with possible nasty side effects. To fix this, the user has to make sure every use of a variable in enclosed in quotes, like in rm "$file".
Variable assignments in Bourne shell are whitespace sensitive. 'foo=bar' is an assignment, but 'foo = bar' is not. This is another strange idiosyncrasy.
There is also an overlap between aliases and functions. Aliases are positional macros that are recognized only as the first word of the command like in classic alias ll='ls -l'. Because of this, aliases have several limitation limitations:
ll() { ls -l $*; }
The curly brackets are some sort of pseudo-commands, so skipping the semicolon in
the example above results in a syntax error. As there is no clean separation between
lexical analysis and syntax analysis removing the whitespace between the opening
bracket and 'ls' will also result in a syntax error.
Since the use of variables as commands is allowed, it is impossible to reliably check the syntax of a script as substitution can accidentally result in key word as in example that I found in the paper about fish (not that I like or recommend fish):
if true; then if [ $RANDOM -lt 1024 ]; then END=fi; else END=true; fi; $ENDBoth bash and zsh try to determine if the command in the current buffer is finished when the user presses the return key, but because of issues like this, they will sometimes fail.
One way to alleviate those problems is the introduction of deprecation mechanism. It can be done via POSIX framework (which currently is stagnant).
Clean definition of lexical level in shells can help too. As the number of legacy scripts is substantial any improvements can be done only with the simultaneous introduction of automatic converters.
Paradoxically currently the only sizable entity working on improvement of shells is Microsoft. According to Wikipedia:
Microsoft is working on the next version of PowerShell and has made a CTP release of the same publicly available. It includes changes to the scripting language and hosting API, in addition to including a number of cmdlets. A non-exhaustive list of the new features is:[27]
- PowerShell Remoting: Using WS-Management, PowerShell 2.0 allows scripts and cmdlets to be invoked on a remote machine or a large set of remote machines.
- Background Jobs: Also called a PSJob, it allows a command sequence (script) or pipeline to be invoked asynchronously. Jobs can be run on the local machine or on multiple remote machines. A PSJob cannot include interactive cmdlets.
- ScriptCmdlets: These are cmdlets written using the PowerShell scripting language.
- SteppablePipelines: This allows the user to control when the
BeginProcessing(),ProcessRecord()andEndProcessing()functions of a cmdlet are called.- Data Language: A domain-specific subset of the PowerShell scripting language, that allows data definitions to be decoupled from the scripts and allow localized string resources to be imported into the script at runtime.
- Script Debugging: It allows breakpoints to be set in a PowerShell script or function. Breakpoints can be set on lines, line & columns, commands and read or write access of variables. It includes a set of cmdlets to control the breakpoints via script.
- New Cmdlets: Including Out-GridView, which displays tabular data in the WPF GridView object.
- New Operators: -Split -Join and @ operators.
- New APIs: The new APIs range from handing more control over the PowerShell parser and runtime to the host, to creating and managing collection of Runspaces (Runspace Pools) as well as the ability to create restricted Runspaces which only allow a configured subset of PowerShell to be invoked.
- Graphical PowerShell: PowerShell 2.0 includes a GUI-based PowerShell host that provides integrated debugger, syntax highlighting and up to 8 PowerShell consoles (Runspaces) in a tabbed UI, as well as to run only the selected parts in a script.. However, it is a very early release which suffers from performance problems with large scripts and does not include tab completion.
Copyright © 1996-2007 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: April 02, 2008