Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

A Slightly Skeptical View on Scripting Languages

Dr. Nikolai Bezroukov

Version 2.1


Note: Earlier version of this paper was published in Softpanorama bulletin Vol. 24, No 4, 2012


Copyright 1998-2013, Dr. Nikolai Bezroukov. This is a copyrighted unpublished work. All rights reserved.

  Rule: "Don't Try To Force People"

Programmers are smart people. They are engaged in challenging tasks and need all the help they can get from a programming language as well as from other supporting tools and techniques. Trying to seriously constrain programmers to do "only what is right" is inherently wrongheaded and will fail. Programmers will find a way around rules and restrictions they find unacceptable. The language should support a range of reasonable design and programming styles rather than try to force people into adopting a single notion.

This does not imply that all ways of programming are equally good or that C++ should try to support every kind of programming style. [...] However, moralizing over how to use the features is kept to a minimum, language mechanisms are as far as possible kept policy free, and no feature is added to or subtracted from C++ exclusively to prevent a coherent style of programming.

I am well aware that not everyone appreciates choice and variety. However, people who prefer a more restrictive environment can impose one through style rules in C++ or choose a language designed to provide the programmer with a smaller set of alternatives.

-- Stroustrup "The Design and Evolution of C++", page 113

Contrary to popular delusion, the programming world is a lot more diverse than merely Open Systems, @pple, and Micro$oft.

Perl Power Tools Webpage


The paper argues the compactness of the code is a very important metric. It is one of the key metrics by which to judge programming languages because the language in which the solution to the problem can be expressed in the most compact and transparent for humans (and shorter code is usually more transparent) way will be winner on the marketplace.

In this sense scripting languages represent the new generation of languages, called very high level languages, the class of languages pioneered by SETL.  In best scripting languages the size of the codebase (in lexical tokens) for the solution of the same problem can be approximately ten times less than in Java (which is approximately on the same level as C++). That increases the productivity of programmers and reduces the number of bugs in the code.

The immense number of pretty complex LAMP-based e-commerce WEB sites (including Yahoo) and wide usage of Python by Google suggests that scripting languages, not Java or OO represent the most promising way for developing complex software applications today, and in the foreseeable future. Being a new Cobol, Java isn't going to die any time soon, but we are probably far beyond the hype part of the curve, so it will gradually lose developers to more productive languages. This process is already visible in diminishing of the number of Java books published each year.

Introduction

Scripting languages are the main achievement of the open source movement. Moreover the scripting languages are probably the last bastion that can at least partially protect us from the current software "over complexity" push and resulting bloatware be it Microsoft or Linux style. And bloatware kills the idea of open source more effectively that anything else: when implementation language become "a new assembler" like in many C-based open source projects, the level of openness of the codebase is open for discussion.

Anybody who participated in large scale reengineering projects know the feeling when you get thousands  of pages of badly documented source code... That's not an open source, that's open mess. Modification, refactoring and rewriting costs grow exponentially with the growth of the size of the codebase: a fairly trivial changes in a large codebase can cost tremendous amount of money (and effort) to implement as infamous Year 2000 problem showed all too well.  And we all know that Java is a very verbose language, and that refactoring is expensive in Java due to static types.  That suggests that Java is a very expensive language to work with and despite being a new Cobol it opens possibilities for alternatives, and first of all, scripting languages, to enter enterprise environment.

It still possible to write compact useful programs (where source really can be modified by a person without spending a month in a closet staring on the pages ) in scripting languages and as such they (along with Forth) might represent the last refuge for those who wants to adhere to KISS principle. Scripting languages "big seven":  ksh93/bash, Perl, Python, PHP, Ruby, TCL, and JavaScript (JavaScript is prototype-based and  fun to work with when you discover that) are definitely new and very interesting development in software engineering, the first practical examples of "very high level languages" (VHL).

The concept of "very high level languages"  was pioneered more then 30 years ago by SETL [Schwartz1970, Bacon2000].  In his doctoral dissertation David Bacon explains the value of SETL in the following way [Bacon2000]:

First of all, SETL strives to put the needs of the programmer ahead of those of the machine, as is reflected in the automatic memory management, in the fact that flexible structures can be employed as easily as size-constrained ones can, and in the presence of an interface to powerful built-in datatypes through a concise and natural syntax. This high-level nature makes SETL a pleasure to use, and has long been appreciated outside the world of distributed data processing. Flexibility does not itself ensure good discipline, but is highly desirable for rapid prototyping. This fills an important need, because experimentation is a crucial early phase in the evolution of most large software systems, especially those featuring novel designs[135,173,66,70,71].

Second, SETL's strong bias in favor of ``value semantics'' facilitates the distribution of work and responsibility over multiple processes in the client-server setting. The absence of pointers eliminates a major nuisance in distributed systems design, namely the question of how to copy data structures which contain pointers. SETL realizes Hoare's ideal of programming without pointers [114].

Third, the fact that every SETL object, except for atoms and procedure values, can be converted to a string and back (with some slight loss of precision in the case of floating-point values), and indeed will be so converted when a sender's writea call is matched by a receiver's reada, means that SETL programs are little inconvenienced by process boundaries, while they enjoy the mutual protections attending private memories. Maps and tuples can represent all kinds of data structures in an immediate if undisciplined way, and the syntactic extension presented in Section 2.15 [Field Selection Syntax for Maps], which allows record-style field selection on suitably domain-restricted maps to be made with a familiar dot notation, further abets the direct use of maps as objects in programs, complementing the ease with which they can be transmitted between programs. A similar freedom of notation exists in JavaScript, where associative arrays are identified with ``properties'' [152].

Open source is the most valuable when you still are able to change the source to better suit you needs. When you an adaptable solution instead of  fixed codebase. When you gradually improve the codebase instead of just maintenance of it. Here the key advantage of scripting language -- the compactness of the code is of great value. Like with other VHL, the compactness of scripting language code gives a possibility to write the same applications as a tiny fraction lines of code in comparison with traditional compiled (C, Pascal) or semi-compiled (Java) strongly typed languages.  Although not all shorter programs are better,  the mere fact that a typical scripting language allow to shrink the number of lines of code for a typical application several times means lower development costs, lower amount of bugs and potentially better architecture. In enterprise environment you can also save of positions of  "software architects": highly paid lucrative positions that proliferated in Java based application development :-).  One interesting example of the project in which the reference C-code exists and direct comparisons of the sizes of codebase can be made is Perl Power Tools Project

Please note that with the current level of complexity of applications, C almost deteriorated to the assembler level.  Java is just marginally higher and the number of lines you need to write for even a trivial program in Java can be larger then in equivalent C++ program. That fact alone suggest that the designers of the language did not understand the trend toward very high level languages.

As for the number of lines (or more correctly lexical tokens) to express a particular algorithm Java looks like C++- : if this is called a progress, that would be a very strange definition of progress. Java designers tried to create "C++ done right."  but "done right" in not enough. The only valuable innovation was garbage collection. Pointers were removed and for the level of language Java represents this is a very questionable decision which suggest inability to understand the difference between the programming in the large and programming in the small. . 

While Java managed to displace Cobol, it did so not without some help of pure luck (timing is all in the software world)  and deep valets (huge amount of money spend by Sun and IBM to promote the language and to create the necessary infrastructure ). Still Java with its "class-based" OO has a serious problem that is very noticeable as the system size grows. Programs rarely remain static, and invariably the original class structure becomes less useful with time. That results in more code being added to new classes, which largely negates the value of OO systems and lead to "class hell": the number of classes grows to the level when nobody can see the whole picture and due to this different class libraries are reinventing the bicycle. Moreover often the amount of class libraries grow to the level when just  loading them at startup consumes considerable time making Java look very slow despite significant progress on JVM side. It looks like Gossling in his attempt to fix some problems with C++ badly missed prototype-based programming ideas, the ideas that found its way into JavaScript. In a recent blog entry he even mentioned:

Over the years I've used and created a wide variety of scripting languages, and in general, I'm a big fan of them. When the project that Java came out of first started, I was originally planning to do a scripting language. But a number of forces pushed me away from that.

James Gosling, Dec 15, 2005

Of course, there is no free lunch and sometimes you pay the price for using VHL instead of plain vanilla high level languages, but with the current 3GHz CPUs and 8-16GB of memory on desktops (and even laptops), it is not an unreasonable price for probably 80-95% of the most applications.  Small critical part can always be rewritten in lower level language, but only if necessary and after exact determination of this part by profiling.

This approach of using two well connected languages, and not using lower level language for everything is the optimal solution to so called "scalability" problems.  The classic observation is that premature optimization (and choice of lower level language is a premature optimization) is the source of most problems in large programming projects. Scripting languages allow to use two languages in the same project with scripting language serving as a glue (BTW that was explicit design goal of TCL) and the other, typically the system language in which scripting language interpreter is written as component development language. Such "dual programming" approach to creation of a large software systems is a very interesting and efficient software development paradigm. It it especially well suited to the development of virtual machine appliances when all the infrastructure is hidden within the virtual machine. 

At the same time I think that it's a little bit naive and premature to search for a scripting Eldorado. There is a lot of work that needs to be done. None of the existing scripting languages is perfect. I also think it's a good idea to be wary of "the one true way" of anything. Let's accept the fact that programmers benefits from the use of multiple paradigms and multiple languages.

I think that it's a little bit naive to search for a scripting Eldorado ;-). 
None of the existing scripting languages is perfect.
I also think it's a good idea to be wary of "the one true way" of anything...

And you do not necessary need a scripting language with the most sophisticated OO system, the key idea of scripting languages is that everything is a string. Therefore while OO is a nice thing to have as a way to partition namespace and provide some useful primitives for iterators (like in Ruby) it is not end in itself.  Moreover if one really wants an OO language, other things equal it might be beneficial to use a prototype based OO implementation which better suits scripting languages then a  static class-based OO implementations.  In this case each object is essentially a hash that contains slots for dynamic code and methods can be assigned/added dynamically. Unfortunately among major scripting languages only JavaScript uses prototype-based OO model.

But again the key is powerful string handling capabilities. As Ronald P. Loui aptly noted, paradoxically, even one of the simplest scripting language in existence (GAWK) that has decent string processing capabilities can be a powerful tool for complex software development tasks like AI prototyping and among his students those who used GAWK for the class project turned out the best work:

Most people are surprised when I tell them what language we use in our undergraduate AI programming class.  That's understandable.  We use GAWK.  GAWK, Gnu's version of Aho, Weinberger, and Kernighan's old pattern scanning language isn't even viewed as a programming language by most people.  Like PERL and TCL, most prefer to view it as a "scripting language."  It has no objects; it is not functional; it does no built-in logic programming. 

Their surprise turns to puzzlement when I confide that (a) while the students are allowed to use any language they want; (b) with a single exception, the best work consistently results from those working in GAWK.  (footnote:  The exception was a PASCAL programmer who is now an NSF graduate fellow getting a Ph.D. in mathematics at Harvard.) Programmers in C, C++, and LISP haven't even been close (we have not seen work in PROLOG or JAVA).

Such a paradoxical advantage in productivity might be partially explained that for a simple scripting languages students struggle less with the language and can devote more time to the task itself. Perl and other more complex scripting languages have a steeper learning curve but can serve the same role for professionals. Still it is interesting to note that AWK while one of the first scripting languages invented is simultaneously "the last of Mohicans":  language without feature creep that even on old hardware could be used very efficiently with pipes.  Old DOS AWK interpreters were as small as 160K of uncompressed executable (mawk.exe).  That's simply incredible in the current world of bloated to 100K "Hello world" programs :-)

Paraphrased Greenspun's Tenth Rule of Programming

More compact and cleaner code that is achievable by using scripting languages often helps to achieve higher quality in ways that are not immediately obvious. Paraphrasing Greenspun's Tenth Rule of Programming we can suggest that:

Any sufficiently complicated C or Java program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Perl or TCL.

Any sufficiently complicated C or Java program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Perl or TCL.

Programming in the small vs. programming in the large

Another aspect of the same problem that is reflected in  Greenspun's Tenth Rule of Programming  is that there are two distinct types of programming:

While those two activities intersect and can be accomplished in a single language, it is often beneficial, especially for large and complex project to use different but tightly coupled language for those two activities. In other words for a large programming project a single language typically is not enough. For scripting languages which cover "programming in the large" to larger extent then "programming in the small" (although modern scripting languages cover surprisingly large part of the programming in the small activities) that means that the simplicity and transparency of connecting to high level language (C or C++) should be considered as a feature of language design, not a feature of language implementation. The best result happens when the second, lower level language is the language in which interpreter for the particular scripting language is written.  

This is very important feature of large projects as for them no language is best in everything. Sometimes you need lower level details.  Sometimes you need higher level up to microprogramming, known from VBA and programmable editors (actually REXX pioneering this area).  Many programmers (for example Richard Stallman) just do not understand the difference between programming in the large (scripting) and programming in the small (creation of the component set) and think that a single language should be used for both activities (look at the discussion related to TCL vs. Guile for more information). This is not true. Components can be produced in other language, the language that has lower level and which is more flexible in operation on the level of detail required for some of them.

And the larger the project is, the more components are specialized enough to benefit from implementation in the second (typically lower level) language. Here TCL is really good, but Python and Ruby have more or less clean interfaces too. Both provide clean interface to C++ and C respectively.

In a sense, TCL is probably the most underappreciated scripting language in existence as it directly promotes such a dual language approach to construction of large programming systems. It has almost zero learning curve and as such can be used in any complex project that uses C as a second implementation language (for programming in the small) almost instantly. Few people understand that usage of TCL with C (TCL+C) is a unique and very powerful software development paradigm, probably in some ways superior in many real life scenarios to overhyped single complex OO languages be it Java,  Ruby, C++ or C with some STL-like library like glib. 

There is another approach to the same problem exemplified by Microsoft .NET framework. This framework permits using the same runtime engine for multiple languages so that you can use both Iron Python and C# in the same project.  The same advantage can be achieved using scripting language that is compiled into JVM (Ruby has one such implementation).  

It is interesting to note that several important open source projects like MC  got into a tar pit of the combination of too many C libraries and because of limited manpower might stagnate because nobody is able to  see the whole picture and drive the architectural development. I think the deterioration of the quality of the C-based codebase due to exceeding of the critical size is  more common reason of stagnation of open source projects then many people would like to accept.  If you look, for example, at MC, then it is clear that mono-language programming (C-language programming in case of MC) automatically runs into serious difficulties after a certain size and without funds and discipline typical for commercial programming environment crisis in open source project development is inevitable. . In such cases it might be better to switch to scripting language combination just to shrink codebase to a manageable size. the second interesting aspect of this problem is that when a key developer leave the project for personal reasons (typically burned out after several years) the gap in knowledge of internals between old and new generation of developers (and most of those internals are undocumented) leads to suboptimal or plain vanilla wrong development decisions and severe loss of architectural integrity of the project. 

In dual-language ( .NET +  scripting language, or scripting language +Java) implementations each language offers support to each other that is useful in non trivial ways. For example C-programs can use libraries that were developed for the scripting language interpreter (and those libraries are usually higher quality then a typical C library) and even polish them based on their understanding of the project; the understanding that also increases their understanding of the scripting language implementation. It is interesting to note that this crucial average of dual language programming with "programming in the large" performed by a scripting language and "programming in the small" by a lower level strongly typed language was never understood by Richard Stallman and probably was one of the reasons behind stagnation of GNU project: scripting language are the essence of open source, but all major scripting languages originated outside the project and none adopted "pure" GPL license (which is a good thing ;-). As John Ousterhout aptly put it:

I think that Stallman's objections to Tcl may stem largely from one aspect of Tcl's design that he either doesn't understand or doesn't agree with. This is the proposition that you should use *two* languages for a large software system: one, such as C or C++, for manipulating the complex internal data structures where performance is key, and another, such as Tcl, for writing small-ish scripts that tie together the C pieces and are used for extensions. For the Tcl scripts, ease of learning, ease of programming and ease of glue-ing are more important than performance or facilities for complex data structures and algorithms. I think these two programming environments are so different that it will be hard for a single language to work well in both. For example, you don't see many people using C (or even Lisp) as a command language, even though both of these languages work well for lower-level programming.

Thus I designed Tcl to make it really easy to drop down into C or C++ when you come across tasks that make more sense in a lower-level language. This way Tcl doesn't have to solve all of the world's problems. Stallman appears to prefer an approach where a single language is used for everything, but I don't know of a successful instance of this approach. Even Emacs uses substantial amounts of C internally, no?

I didn't design Tcl for building huge programs with 10's or 100's of thousands of lines of Tcl, and I've been pretty surprised that people have used it for huge programs. What's even more surprising to me is
that in some cases the resulting applications appear to be manageable.
This certainly isn't what I intended the language for, but the results haven't been as bad as I would have guessed.

Actually all the posts in the infamous Stallman-initiated attack on TCL, called "TCL wars" deserve to be read just to understand complex interplay of factors between programming in the large and programming in the small and the level of misunderstanding of the difference between those two that is typical even for pretty gifted programmers like Stallman and Gosling. BTW the resulting Stallman-inspired alternative to TCL -- GNU scripting language (Guile) proved to be stillborn. So far no major GNU projects adopted it as a macro language and no important Linux application uses it. So by abandoning TCL (and failing to produce a viable alternative) Stallman essentially undermined the long term viability of GNU project...

Level of integration with the OS or why libraries are more Important then the language

Proliferation of virtual machines brought another interesting aspect of "respect toward predecessors" idea of a scripting language design: the Level of integration with the major OSes. Quality and availability of "connectors" that permit using OS API (both built-in in the language and external libraries)) might make 80% of the usability of the language in a large and complex programming projects. In this (limited) sense libraries are more important then the language itself. And it takes a lot of time (or money or both) for a language to get a quality libraries.

But excessive usage of libraries (or classes in OO languages) actually is a mixed blessing and, for example, in Java hides rather low level of language itself.  Class libraries hell in Java is a well known deficiency of the language: when everything is taken from the libraries the quality of the project suffers greatly in a quite dialectical way (extremes meet). 

Handling of strings and expressive power of the language

In many ways expressive power of the language is connected with the availability complex data structures, pointers and coroutines. Among them strings have a special place.

As for Unix everything is a file, for scripting languages everything is a string. And this idea of "string as an ultimate method of unification and expression of complex data structures " proved to be very powerful and flexible paradigm, especially with the prominence achieved by HTML and XML. At the same time strings are rather complex to implement efficiently and they require garbage collection to be present in the language; that's why they were left out of C despite the fact that the major C prototype language (PL/1) has had very rich support for strings. And that's why C++ is really deficient in this area: string class is too little too late as strings need to be a built-in language type. Implementation details also are non-trivial. For example, mutable strings might be an advantage for certain very frequent operations like chop in Perl.  Here you can see that not everything is actually an object, sometimes string is simply a string :-).

Not everything is actually an object, sometimes string is simply a string :-).

The set of string operations needs to very carefully designed as shortcuts do matter and orthogonal approach is rather naive in this area (think about implicit value of Huffman encoding here: the most frequent operation mush have the shortest representation :-). All existing languages are deficient  in this area and set of string manipulation functions looks like student diploma work when the person who wrote it never studies previous generations of languages (or were too preoccupied with other things) and/or did it in a haste to get rid of the task. In many cases the set of operations lacks real expressive power and flexibility. For example this set of functions should be a almost perfect superset of operation on arrays, as every string can be viewed as array of characters. Here good old Perl is still competitive with newer offerings despite all its warts. In Perl strings and arrays have completely distinct sets of operations, for example substr in stings corresponds to splice for arrays, but the correspondence is rather fuzzy; functions like chop is alias for substr(string,-1,1)=1, but lacks the ability to chop chunks larger then one byte, in a way chomp is a caricature of trim from REXX, etc).

Importance of availability of coroutines

Proper integration of coroutines into the language requires that all variables be allocated in a heap. Until recently that was too high price to pay and few languages implemented them (Modula and Modula2 were two more or less popular languages with this construct).

What is interesting that a very interesting integration of coroutines in the language existed in Unix shell for many, many years. It was done in a limited form of "internal pipes"  (Usage of pipes with loops in shell ):

Let's assume that we need to find all files that contain string "19%" which is a typical for printing commands like "19%2d":

cd/ /usr/bin
ls | while read file
do
    echo $file
    string $file | grep '19%'
done

Here we use the ls command to generate the list of the file names and this list it piped into a loop. In a loop we echo command and then run strings piped to grep looking for suspicious format strings.

In another example from O'Reilly "Learning Korn Shell" (first edition). Here we will pipe awk output into the loop. This is a  function that, given a pathname as argument, prints its equivalent in tilde notation if possible:

function tildize {
    if [[ $1 = $HOME* ]]; then
        print "\~/${1#$HOME}"
        return 0
    fi
    awk '{FS=":"; print $1, $6}' /etc/passwd | 
        while read user homedir; do
            if [[ $homedir != / && $1 = ${homedir}?(/*) ]]; then
                print "\~$user/${1#$homedir}"
                return 0
            fi
        done
    print "$1"
    return 1
}

Loop can also serve as a source to input for the pipe. For example

{ while read line'?adc> '; do
      print "$(alg2rpn $line)"
  done 
} | dc

As an example; assume that you want to go through all  files of a directory and, if they are readable to you, convert the filenames to contain lowercase letters only. We can do it in slightly different ways.

There are two major ways to accomplish this:

  1. The first, more traditional, variant calls tr inside the for loop:
    #!/bin/ksh
    for x in * 
    do
      [ -r $x ] && echo $x | tr 'A-Z' 'a-z'
    done
    
  2. The second, more elegant variant uses pipe to feed tr from the loop:
    #!/bin/ksh
    for x in * 
    do
      [ -r $x ] && echo $x 
    done | tr 'A-Z' 'a-z'

Due to their expressive power coroutines should be transparently integrated into the language. Paradoxically,  Perl does not support coroutines.

In Python they were implemented as an afterthought.

In Ruby they were implemented from day 1 (I think).

Importance of availability of exceptions

Exceptions are best understood as stopped coroutines that are activated by external event (for example end of file).

It was PL/1 that was the first (and might be the last) language which has exceptions fully and transparently implemented in a way that is useable by regular programmers for day to day programming tasks. In C++ it was just an afterthought and the same crippled, limited implementation exists in Java.  both Python and Ruby does this "Java-way".

Generally like in case with coroutines, proper implementation of exceptions requires that all variables be stored in heap and garbage collected, which became possible in mainstream language without penalty only recently (although LISP has this feature from day one).

"Respect toward predecessors"

Respect toward predecessors in very important. One reason why Perl failed as the major Web development language and PHP succeeded despite very weak design and numerous flaws is higher level of respect to C in PHP.  For example, the fact that in Perl string comparison requires symbol "eq" and numeric comparison requires "==" is a source to tremendous amount of mistakes and resentment toward the language. Like Talleyrand said for similar decisions "it is worse then a crime, it was a blunder".

A language should adhere to principle of "least surprise" and do not break with previous languages (and first of all C/C++ family as the dominant family of languages) unless it is justified by some gains in power or transparency. 

One classic example of violation of this principle is Perl redefinition of additional control statement in loops. Also many language designers understood that C-style code blocks ( { } ) waist two important symbols and are not shorter notation then the Algol style blocks (do - end, with do optional for a single statement). But they never fully implement Algol-style notation either. For example, Ruby does not address the problem of multiple closure of several blocks with one end statement like PL/1 did many years ago with labels.

Quality of debugging and profiling tools

Paradoxically debugging for scripting language can be more complex then in mainstream languages like C/C++ for which the level of language is lower and the tools are definitely more mature, feature rich and often commercially supported.  Debugging tools available even for popular scripting languages such as Perl, PHP and Python are still rather crude.  This has a real impact when working on non-trivial programs.  Paradoxically for complex application development the quality of the debugger is often as important as the quality of the language implementation.  It's is actually an important part of the quality of the language implementation.  It is not accidental that Donald Knuth, who probably is one of the greatest computer scientists of  all times, preferred to work with the language that is best integrated with the OS and has the best debugger.  For a viable scripting language the debugger should be part of the language design and the key part of the implementation not an afterthought. Scripting language designers are still slow to realize this shift of the paradigm. In this area significant progress is needed. IDE environments like Active State Komodo can help too (I can attest that they manage to eliminate problems that hunted earlier versions and version 3.1 is usable for Perl).

The performance of virtual machine and garbage collection can be improved; better profiling tools badly needed. While it is true that an virtual machine based scripting language will rarely equal the performance of optimized native code and the quality of the virtual machine implementation often can be significantly improved, this emphasis on ultimate performance is often very naive and is completely misplaced. Only a very small part of the application (less then 20%) have significant influence on the total time consumed in performing a particular  function. Detection and selective optimization of those critical parts and if necessary rewriting then in complied language can help to archive parity or even beat Java and C++ based applications in raw performance. But here we need an adequate profiling tools. This is especially important because all scripting and languages have automatic memory management with garbage collection. The latter can have a significant impact on the performance profile of a script if large volumes of data need to be processed. Much depends on the nature of the program: for real-time applications this might be an additional concern that needs to be thoroughly investigated with profiler, but for many other programs it's not.  Each language's garbage collection implementation differ, so the scripting language selection may need to take this into account to find a better match with the application needs.  One way to bypass this problem is mimicry: usage of  .NET or JVM opens really significant tool chest that might be too expensive and/or time consuming to develop for a particular language alone.

Availability of systematic process of removing warts

There should a systematic process of removing wards via obsolesce feature. While each of the scripting language has innovative features in the design , the strong points that helped wide adoption of the language,  in certain areas each of existing scripting languages has problems that need to be recognized and rectified ASAP.  For example I cannot explain why Perl does not (yet) allow multiple (labeled) closures of nested blocks (like PL/1); or why the problem of scalar variables comparison (like in if ($a == $b ) if both are strings) was not treated more seriously with interpreter warnings or some pragma constructs. See my older paper for the discussion of those issues in Perl [Bezroukov2005]. Or why mode of access of variable (read-only or read-write) can't be specified dynamically.

Typically scripting languages are typeless. While this is definitely a more reasonable compromise that type safely straitjacket of Java extremeness meet and it also can create some additional problem which can easily be rectified by more "moderate approach". The key insight here is that not all conversions are created equal, and while some conversions are pretty benign (number to string) others are less so. Other elements of more moderate approach are:  better interpreter diagnostics, high quality cross reference tool, name space diagrams, pretty printers etc. 

Also the concept of typeless variable can be made more flexible, by allowing to specify the legitimate conversions (actually no legitimate conversions is a definition of Java straightjacket).

Actually exactly because of weak typing, high quality cross-reference tools should be considered as a part of any decent scripting language implementation, not an add-on tool. Unfortunate support for those tools are horrible and urgently needs to be improved. IMHO currently only Perl and JavaScript have more or less adequate pretty printers and cross reference tools. 

But scripting languages quickly evolve and each year they become more and more competitive with Java and C++ for developing mainstream enterprise applications. In its turn Java tried to adapt by adding regular expressions and coroutine emulation (via threads) to the mix.  while I was a critic of Java from the beginning, now I started to realize that Java can serve as a lower level implementation language for scripting languages and that usage of common virtual machine environment represents a significant advantage that should not be overlooked.

Money and the role of powerful sponsor

Due to tremendous push and amount of money spent on creating Java infrastructure  (supported by huge amount of money from Sun and IBM) on the current stage of development of scripting languages it might be wise to use scripting languages that try to utilize JVM. At least for large and demanding software projects usage of JVM (and all corresponding infrastructure) is a big plus. This way you have the space to retreat in case things go wrong and can switch back to Java on certain parts where scripting language proved to be less suitable and  "programming in the small" language is needed. A dozen of such languages already exist with Jython as probably the most prominent example. As there is a JavaScript implementation in Java (Rhino) it should be seriously considered too. Other implementations like Beanshell and Groovy have their advantages too. Groovy is probably the most fashionable of JVM based scripting languages.

It's still unclear which scripting language prevail in a long run, therefore right now one should probably diversify and experiment with several of them. Finance power of sponsors here means a lot, probably more the technical features of language per se (after some minimum quality).

For example in Windows only scripting language which support .NET  can be major players. And it takes a lot of effort to create .Net compiler. Python has one.

Different languages can be optimal for a different parts of the project. But still any large project should have the "principal" language, the language that you feel best match the majority of the project's needs. It's just impossible to learn several scripting languages to an equal degree. I currently consider Perl to be my primary scripting language, but there is no JVM based implementation of Perl and that affects scalability. I also use Python for tasks that benefit from coroutines. Python also has a distinct advantage of having a JVM-based implementation (Jython). Still Python puts more restrictions than Perl and in this sense is a little bit lower level language. Python's innovating "indentation reveals real block structure" solution partly compensates for that as  it produces more vertically compact programs. moreover you can chose your style of braces and prettyprinting as it's easy (and probably necessary) to imitate C-style curvy braces using comments and a pretty printer. In this sense the Python is the most modern  language, the language where the editor in IDE should contain pretty printer by default.

The level of the language and why it is too early to write out Perl

Currently probably the highest level language in wide use is Unix shell which is used as quintessential glue language.

Perl if probably the next is line as it has very good integration with Unix and can compete in shell in major tasks, where shell shines.  This is the main reason I prefer Perl to Python. Python is somewhat detached from Unix culture and while it directly supports exceptions and coroutines it is slightly less higher language then Perl.

Pointers a very important programming concept are directly available in Perl and indirectly (via classes) in Python. And for me that's extremely important consideration.

Perl provides advantages when I need the maximum power for rapid prototype development and I am ready to pay for this power with some inconveniences. Also unlike Python, Perl has mutable strings and that means that operations like chomp does not create the whole new string just to cut off the last byte out of the old string.  At the same time I would like to stress that everybody who likes Unix needs to know TCL at least to the level that is necessary to use Expect, a really brilliant, breakthrough application based on TCL.

Everybody who likes Unix needs to know Expect or expect module in his favorite scripting language

Perl and Python can be considered as attempts to provide a "compromise" language that is usable for both programming in the large and programming in the small. Here Python has an important advantage: unlike Perl, Python have more or less usable interface to C++, so it can be used for dual language programming, although such cases are still infrequent ( Python philosophy is generally that same as Perl ).

Despite difficulties with the managing huge and very complex interpreters both Perl and Python have a very strong following and nothing succeed like success. Still they differ only in nuances, not in principle and both represent an approach opposite to TCL: both can be considered as attempts to replicate PL/1 approach to language design on a new level.  Whether "Right thing" (Python) is better then "New Jersey" or "worse is better" approach (Perl), it currently unclear.  Anyway it's probably wise to use both languages when appropriate. None is a silver bullet that solves all the software-engineering problems.  Moreover like Brooks noted, no silver bullet exists.

Importance of programming language environment

One test of whether someone is a good programmer is to ask him about the shortcomings of the tools he uses. Watch if he talks only about language constructs. He/she probably is a mediocre programmer.  Programming language environment (language + IDE + debugger + libraries) is as important or more important then the language itself.  Someone who do not understand that flaws and limitation of their favorite language can be compensated by the environment, who cannot view the language as a part of a larger development environment, is either unable to think analytically and thus cannot be a good programmer, or is blindly partisan (i.e. a zealot) like many participants of Perl vs. Python debate; but please note that even the worst participant of Perl vs. Python debate is usually heads above participants of Linux vs. Windows advocacy wars...

Programming, at least as I understand it, is both art and science, and inability to see a larger picture of environment in which the language is imbedded as a part of it as well as to view implementation of a programming environment as continuation or at least an important "feature" of the language design is a serious intellectual limitation.  

An interesting question is why "worse is better" approach is so successful. Why can complex, non-orthogonal and far from being elegant languages make it to mainstream ?  I think a partial answer might be that pure luck (of which timing is one dimension and the place where the language was born another) plays more important role in the language success that one might think. Early comers that managed somehow to grab the niche have tremendous advantage: one thing is to read language manual and appreciate how good the concepts are, and another to bet your project on new unproved language without good debuggers, manuals and, what is very important,  libraries. The quality of the debugger and level of integration with the underling OS (libraries) are probably as important or more important then the language itself. Think about Perl debugger and CPAN (Comprehensive Perl Archive Network) as the major parts of ensuring the language success. BTW, paradoxically for Java only Microsoft had a Java development environment (J++) that was well integrated with the OS and had a good debugger. From this point of view, it's clear that Sun's management in its infinite wisdom killed with its lawsuit a little bit more then just "deviations from the standard language implementation".

From my point of view languages are much like cars. For many people car is the thing that they use get to work and shopping mall and they are not very interested in such facts as is the engine inline or V-type and the use of fuzzy logic in the transmission. What they care is safety, reliability, mileage, insurance and the size of trunk. In this sense  success of  the "Worse is better" approach should not surprise anybody.

Fighting myths about scripting

A popular belief that scripting is "unsafe" or "second rate" or "prototype" solution is completely wrong. If the project had died, then it does not matter what was the implementation language, so for any successful project and tough schedules scripting language (especially in dual scripting language+strongly-typed language combination, for example TCL+C,  any two .NET compiled languages pairs, or JVM-based pairs) might be more optimal blend than a single bulky OO language like C++ or Java.  Flexibility and higher level that scripting languages provide is a strategic advantage for any complex software project because the experimentation is the crucial stage of the development of any large software project, especially those featuring novel designs. In this sense any large and complex programming project includes tremendous amount of prototyping. Only experimentation can help to move the project toward adopting a solid architecture that significantly increases the chances a complex software project chances to succeed.

If the project had died, then it does not matter what was the implementation language

Moreover architecturally such an approach helps to separate architectural decisions from implementation details much better that any OO model with huge amount of beautifully looking UML diagrams (and especially with huge amount of completely detached form reality UML diagrams, which is the most common paradigm of UML usage  ;-).   In this sense firing people who overemphasize the UML usage might be not a bad idea of solving programming project problems; at least the manager gets a space for new people in critical areas without getting over the budget :-).

Paradoxically but with 3GHz or better CPUs and 1GHz or even on desktops even tasks that handle a fair amount of computations and data (computationally intensive tasks) became more viable for such languages as Python and Perl. In some cases (but not often !) such solutions might be even competitive with C++, C# and, especially, Java.  The reason is that when you are operating at a higher level, you often are able to find a better, more optimal, algorithm, data structures, problem decomposition schema or all of the above.  That's the same argument that many years ago was promoted by high level languages adherents against assembler and it is still true on a new level.

Actually if you know the history of language development, then OO languages will remind you so called compiler-compiler approach -- the class of languages that extends itself by accommodating new constructs. The problem with this approach is the diagnostics is either runtime, or sucks, or both. That mean that for complex projects the direct construction of a specialized language with YACC+TCL+C  as a poor man compiler-compiler implementation tools can be a better approach that might exclude a lot of run time overhead inherent in OO.

Design of a set of classes can be (and often is) as time consuming (and politically charged) as compiler-construction approach to software design (each large software project has a specialized language buried in it) and actually share a lot of similar challenges. At the same time even well-designed set of classes is inferior to a specialized compiler/interpreter in several major aspects, first of all in compiler-time error checking and efficiency. In the absence of specialized complier one can use TCL or Python to glue low level C modules that implements specific language constructs to produce more or less clean and debuggable "pseudo-compiler" solution.

In this respect I see a general trend toward more expressive, "very high level" solutions, the trend that drove Perl into prominence to continue. It is this trend that launched LAMP (Linux-Apache-MySQL-Perl/Python/PHP) tool set into prominence. Here neither Linux not MySQL play a significant role. For this reason LAMP should probably more correctly called  WDS (Web server-database-scripting language). Solaris, FreeBSD or even Windows can be used instead of Linux with the same tool set and sometimes with more success.  The same is true for MySQL, which is just one database out of several possibilities. Note the WDS became the cornerstone of Web site development despite complete ignorance of this topics in all major universities CS curriculums. that means that we should treat Java and OO with skepticism they deserve as any proponent needs to explain paradoxical fact that most commercial WEB sites (which are actually a pretty complex software applications) are now driven by LAMP (or more correctly by WDS). Even Yahoo now uses PHP for the development of its huge franchise, which (although it's a complex mix of informational and e-commerce sites) both in complexity and traffic requirements probably belongs to the top dozen of world e-commerce sites. Moreover the trends in hardware probably will help to preserve and extend scripting languages dominance in WEB applications despite paradoxical inroad of Java on the server side in large enterprise environments as a new Cobol (and like was the case with Cobol, not without some help from IBM I think ;-) 

Now let's briefly discuss debugging issues. One of the best things about scripting is that it encourages to create a dramatically more compact programs then compiled languages or, god forbid, OO languages.  The length of some trivial Java programs might lead to a suspicion that progress in CS simply stopped.  And the length of the program is high correlated with the number of bugs in the code as we all should remember from the assembler vs. high level languages debate.  For example, Python programs are typically 3-5 times shorter then equivalent Java programs.  Like JavaScript (and unlike Java), Python supports a programming style that uses simple functions and variables without engaging in class definitions. Perl is even shorter.   Both Perl and Python comes "with batteries included": code that connects to sockets, parses HTML, etc: both have enormous standard libraries, which was lately enhanced for numeric processing, string processing and database interface modules.

The shorter code not only lead to much less number of bugs in the program (the complexity of the program grows as least as a square of the number of lines of code in the program); more expressive language prevents reinventing the bicycle and thus might save both execution and debugging time. In certain cases when a small part of the program needs really top efficiency and consume the largest amount of time, scripting language can be used to generate C code for a particular special case, then compile it and execute this generated specialized for this particular case module on the fly instead of writing generic C-code and paramerizing it to death.  Time to compile and link a small C-program on a typical modern server with several 3Ghz CPUs is less then a second. It can be made much less by using specialized instead of general purpose compiler.  In this sense "on the fly" compilation of computationally intensive parts of the applications is a viable optimization strategy on today's hardware.

As for manageability of the code high-level coding that scripting language promote is easier that writing millions of classes and even can be fun. One can also rather quickly become expert in any scripting language, at least sooner then Java expert with its nightmare of class libraries that can do everything but at the same time can do nothing properly. String-oriented languages such as TCL and Perl also encourage uniform treatment of complex data formats and naturally blend with XHTML and XML.

Winner takes all nature of programming language landscape

As for competition inside scripting languages family, I would like to note that the scripting language landscape mirrors "winner-takes-all" mentality of a larger IS culture: in a set of competing languages, the largest will gain size, at the expense of smaller ones, regardless all but the most blatant discrepancies in quality of the technology.   That can be called a Softpanorama law of language design.

In a set of competing languages, the largest will gain size, at the expense of smaller ones,
regardless all but the most blatant discrepancies in quality of the technology.

Big seven major scripting languages

With the maturity of the WEB that was the major driving force behind the scripting languages, days of great surprises and surprise winners (as for example PHP victory over Perl in WEB site scripting) are over.  Despite being open source efforts, the development of scripting languages now became a cruel, unforgiving area ruled by the merciless dynamics of the marketplace. Of course there are other valuable scripting languages like Icon, Scheme and Ruby and that also deserve study and might be successfully used for certain projects. But I think that "big five" are here to stay. Please note that each of them has particular strength that makes it uniquely valuable:

Conclusions

In several major dimensions and first of all due to compactness of code scripting languages are more modern approach to software development then regular high level languages or Java.  In this respect we should generally expect the repetition of the battle between high level languages and assembler on a new level with a predictable result. Like in previous case there will be no quick success and the battle can take several decades (it took almost 30 years for high level languages to completely displace assembler in software development). The author argue the compactness of the code is the crucial dimension by which scripting languages can be compared with alternatives and between each other.

People often think that the most important factor in software development is not the tools and techniques used by the programmers, but rather the quality of the programmers themselves.  I respectfully disagree. Best programmers need best tools to fully realize their talents.  Otherwise a large part of their productivity will disappear due to struggle with inadequate tools.  Right now one of the most important tool for gifted application programmer is a good scripting language.

The immense number of pretty complex LAMP-based e-commerce WEB sites (that includes Yahoo) suggests that scripting languages, not Linux kernel represent the most promising and the most innovative area of open source development.

Webliography

[Bacon2000] David Bacon SETL for Internet Data Processing. Ph.D  dissertation. New York University, January, 2000

[Bezroukov2005] Nikolai Bezroukov Perl Warts and Quirks

[Bezroukov2006a] Nikolai Bezroukov Softpanorama Scripting Languages Page

[Bezroukov2006b] Nikolai Bezroukov Classic Papers on Scripting (contains references to several papers that shaped the author understanding of the problem)

[Bezroukov2006c] Nikolai Bezroukov  Softpanorama Anti-OO links.

[Schwartz1970] Jacob T. Schwartz. Set Theory as a Language for Program Specification and Programming.
Courant Institute of Mathematical Sciences, New York University, 1970.

[Schwartz1981] Jacob T. Schwartz. SETL- A VERY HIGH LEVEL LANGUAGE ORIENTED TO SOFTWARE SYSTEMS PROTOTYPING. September 1981 ACM SIGAPL APL Quote Quad , Proceedings of the international conference on APL APL '81,  Volume 12 Issue  1

[Prechelt2000a] Lutz Prechelt An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl  Fakult¨at f¨ur Informatik, Universit¨at Karlsruhe (Germany) — PDF — March 14, 2000 refereed journal paper

[Prechelt2000b] Lutz Prechelt An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a search/string-processing program — PDF — March 2000 technical report

[Prechelt2003] Lutz Prechelt Are Scripting Languages Any Good? A Validation of Perl, Python, Rexx, and Tcl against C, C++, and Java — PDF — 2003 study




Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2014 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine. This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: softpanorama.info (the fastest) and softpanorama.net.

Disclaimer:

The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: February 19, 2014