Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


[an error occurred while processing this directive]

Introduction to the Unix shell history

The UNIX system was one of the first operating systems that didn't make the command interpreter a part of the operating system or a privileged task. This idea, like many other Unix ideas, was taken from Multics.  This architectural decision made Unix a fertile ground for scripting development and Unix essentially pioneered scripting as we know it. It was the environment in which AWK, C-shell,  and later Perl emerged and gain popularity.

Shell in Unix was a stand-alone program, a command interpreter with few special permissions and kernel level calls. This rather novel concept proved to be fruitful and has led to a succession of better and better shells.

Unix shells have a long history and we can talk about four distinct generations of Unix shells:

  1. First generation shells  (Thomson shell and Mashey shell)
  2. Second generation shells (C-shell and Bourne shell, which were developed in parallel and in which Bill Joy  team provided really strong competition to professionals from AT&T ;-).  In many ways C-shell represented a real breakthrough as it was more programmable (even the name implies closeness to C) and more flexible then any of the previous generations of the shells
  3. Third generation shells (tcsh and ksh88)
  4. Fourth generation shells (ksh93, bash and zsh)

First generation shells

The first generation shells were descendant of Multics shell. The first of them was Thomson shell. It was the first Unix shell and as such it was very primitive with only basic control structures and no variables. The shell's design was intentionally minimalistic; even the if and goto statements, essential for control of program flow, were implemented as separate external commands [ Thompson shell - Wikipedia, the free encyclopedia] Despite its shortcomings Thomson shell was a definite improvement over the shell that people used most (IBM JCL).

An important early feature of the Thompson shell, new in comparison with Multic, was more compact syntax for input/output redirection. Multics shell used separate commands for redirection of the input or output of a command. One command was needed to start redirection and one to stop it; in Unix, one could simply add an argument consisting of the "< " symbol followed by a filename for input or the ">" symbol for output to the command line, and the shell would redirect I/O for the duration of the command. This syntax was already present by the release of the first version of Unix in 1971. A later addition to Thomson shell  was the concept of pipes. At the suggestion of Douglas McIlroy, the redirection syntax was expanded so that the output of one command could be passed to the input of another command. By Version 4,  the "|" symbol was adopted to use for pipes. Both redirection symbols and semantic and pipe symbol and semantic survived to those days and was used in other operating system including DOS and Windows.

The Mashey shell (also known as PWB shell) was the second representative of the first generation of  Unix shells.  It was written and maintained by John Mashey (now a chief scientist at Silicon Graphics, who since 1998 owns a nameplate UNIX in California, previously owned by Ted Dolotta ;-).  It was distributed with Programmer's Workbench UNIX one of early versions of Unix that existed from 1975 to 1977. While it was a derivative of Thomson shell it introduced several features that make shell more suitable for programming. The if and goto commands were made internal to the shell, and switch and while constructs were introduced. Simple variables could be used, although their names were limited to one letter and some letters were reserved for special purposes. The $ character, used previously for identifying arguments to a shell script, became the marker for dereferencing a variable, and could be used to insert a variable's value into a string in double quotes. (this feature became standard in many scripting languages like Perl and PHP).

The second generation of shells

The next, the second generation of shells was defined by C-shell, Bourne shell which were developed in parallel. A side but very influential development was AWK which was far more advanced then Borne shell but did not get enough traction to be extended into replacement. Another parallel development was REXX which later was used as a shell in VM/CMS environment:

The third generation of shells

The third generation of shells was by-and large a reaction of C-shell.  It includes the extension of C-shell called tcsh and Korn shell (version which later became known as ksh88). POSIX shell belongs to the same category.

Forth generation of shells

Perl, the first "post-modern" scripting language was first released in 1987. It represented a breakthrough that lifted scripting of a qualitatively new level. It was immediately clear that for Unix scripts Perl represent a much better language that shells although in some parts in was lower level language (limited support of pipes). All subsequent popular scripting languages like Python, PHP, Ruby to name a few are in a way derivatives of Perl. The forth generation of shell were also influenced by Perl popularity and one of them (ksh93) was a reaction to it.

Another important development was the creation of  Tcl. Tcl was first scripting language designed to be a macro language for other applications. While Perl was an attempt to integrate functionality of the shell and AWK into a single product TCL extended shell functionality into new area. TCL never managed to Neither replace shell (and it was never intended to), although it is probably not so difficult to create a shell based on Tcl (one was created on the base of Korn shell by David Korn son). There were also some attempts to use Perl as a shell (Perl shell).

 Two new "forth-generation" shells were written after Perl was created and incorporated some innovations introduced by Perl and TCL but in old shell-style framework:

Back into the future -- Retro shells

There was also distinct line of development that can be called retro-shells. One of the most prominent members of this family was pdksh which was an attempt to re-implement ksh88. It did not get much traction and only bash remains viable those days as it is used as the standard shell in Linux.   

Possible future developments

Still one needs to understand that currently shells are pretty archaic family of scripting languages and outside of interactive usage their generally outlived their usefulness. That's why for more or less complex tasks Perl is usually used instead of shells.

While shells continued to improve since the original C-shell and Korn shell (Borne shell from the language standpoint is just a joke), the shell syntax is frozen in space and time and now looks completely archaic.  There are a large number of problems with this syntax as it does not cleanly separate lexical analysis from syntax analysis.

Some syntax features in shell are idiosyncratic as Steve Bourne played with Algol 68 before starting work on the shell. He proved to be a bad language designer: there is very little logic in how  different types of blocks are ended. Conditional statements end with broken classic Algor-68 the reverse keyword syntax: 'if condition; then echo yes; else echo no; fi', but loops are structure like perverted version of PL/1 (do; done;) , individual case branches blocks ends with ';;' . Functions have C-style bracketing "{", "}".  M. D. McIlroy  should be ashamed as the result ;-).

Also the original Bourne shell was a almost pure macro language. It performed variable substitution, tokenization and other operations on one line at a time without understanding the underlying syntax. This results in many unexpected side effects: Consider a simple command
rm $file
If variable $file is accidentally contains space that leads to treating it as two separate augments to the rm command with possible nasty side effects.  To fix this, the user has to make sure every use of a variable in enclosed in quotes, like in rm "$file"

Variable assignments in Bourne shell are whitespace sensitive. 'foo=bar' is an assignment, but 'foo = bar' is not. This is another strange idiosyncrasy.

There is also an overlap between aliases and functions. Aliases are positional macros that are recognized only as the first word of the command like in classic  alias ll='ls -l'.  Because of this, aliases have several limitation limitations:

functions are not positional and can in most cases emulated aliases functionality:
ll() { ls -l $*; }
The curly brackets are some sort of pseudo-commands, so skipping the semicolon in the example above results in a syntax error. As there is no clean separation between lexical analysis and syntax analysis  removing the whitespace between the opening bracket and 'ls' will also result in a syntax error.

Since the use of variables as commands is allowed, it is impossible to reliably check the syntax of a script as substitution can accidentally result in key word as in example that I found in the paper about fish (not that I like or recommend fish):

if true; then if [ $RANDOM -lt 1024 ]; then END=fi; else END=true; fi; $END
Both bash and zsh try to determine if the command in the current buffer is finished when the user presses the return key, but because of issues like this, they will sometimes fail.

One way to alleviate those problems is the introduction of deprecation mechanism. It can be done via POSIX framework (which currently is stagnant).

 Clean definition of lexical level in shells can help too. As the number of legacy scripts is substantial any improvements can be done only with the simultaneous introduction of automatic converters.

Paradoxically currently the only sizable entity working on improvement of  shells is Microsoft.  According to Wikipedia:

Microsoft is working on the next version of PowerShell and has made a CTP release of the same publicly available. It includes changes to the scripting language and hosting API, in addition to including a number of cmdlets. A non-exhaustive list of the new features is:[27]

  1. PowerShell Remoting: Using WS-Management, PowerShell 2.0 allows scripts and cmdlets to be invoked on a remote machine or a large set of remote machines.
  2. Background Jobs: Also called a PSJob, it allows a command sequence (script) or pipeline to be invoked asynchronously. Jobs can be run on the local machine or on multiple remote machines. A PSJob cannot include interactive cmdlets.
  3. ScriptCmdlets: These are cmdlets written using the PowerShell scripting language.
  4. SteppablePipelines: This allows the user to control when the BeginProcessing(), ProcessRecord() and EndProcessing() functions of a cmdlet are called.
  5. Data Language: A domain-specific subset of the PowerShell scripting language, that allows data definitions to be decoupled from the scripts and allow localized string resources to be imported into the script at runtime.
  6. Script Debugging: It allows breakpoints to be set in a PowerShell script or function. Breakpoints can be set on lines, line & columns, commands and read or write access of variables. It includes a set of cmdlets to control the breakpoints via script.
  7. New Cmdlets: Including Out-GridView, which displays tabular data in the WPF GridView object.
  8. New Operators: -Split -Join and @ operators.
  9. New APIs: The new APIs range from handing more control over the PowerShell parser and runtime to the host, to creating and managing collection of Runspaces (Runspace Pools) as well as the ability to create restricted Runspaces which only allow a configured subset of PowerShell to be invoked.
  10. Graphical PowerShell: PowerShell 2.0 includes a GUI-based PowerShell host that provides integrated debugger, syntax highlighting and up to 8 PowerShell consoles (Runspaces) in a tabbed UI, as well as to run only the selected parts in a script.. However, it is a very early release which suffers from performance problems with large scripts and does not include tab completion.


Copyright © 1996-2007 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: April 02, 2008