Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Perl for Unix System Administrators

News

Scripting Languages

eBook: Perl for system admins

Recommended Perl Books Recommended Links Recommended Papers Perl Language Perl Reference
Perl history and evolution Notes on Perl Hype Overview of Perl regular expressions More Complex Perl Regular Expressions HTML Matching Examples Subroutines and Functions Perl namespaces Perl modules
Perl power tools Perl applications Perl One Liners Perl as a command line utility tool Perl IDE Perl Certification Pipes Perl for Win32
Perl Debugging Perl Style Beautifiers and Pretty Printers Perl Xref Perl power tools Perl IDE and Programming Environment Perl Error Checklist Perl Warts
Perl POD documentation Larry Wall Quotes History and philosophy Tips Random Findings Humor Etc

Introduction

This is a very limited effort to help Unix sysadmins to learn of Perl. It is based on my FDU lectures to CS students. We discuss mainly "simple Perl" and the site tries to counter "excessive complexity" drive that is dominant in many Perl-related sites and publications. It can be also used for preparing to the Certified Internet Web Professional Exam. See also Introduction to Perl for Unix system administrators.

Systems administrators need to deal with many repetitive task in a complex, changing environment which often includes several different flavors of Unix. Perl is the only scripting language which now is included in all major Unix flavors. That means that it provides the simplest way to automate recurring tasks on multiple platforms. Among typical tasks which sysadmin need to deal with:

IMHO the main advantage of using powerful complex language like Perl is the ability to write simple programs. Perhaps the world has gone overboard on this object-oriented thing. You do not need many tricks used in lower level languages as Perl itself provides you high level primitives for the task. This page is linked to several sub-pages. The most important among them are:

All language have quirks, and all inflict a lot of pain before one can adapt to them. Once learned the quirks become incorporated into your understanding of the language. But there is no royal way to mastering the language. The more different is one's background is, more one needs to suffer. Generally any user of a new programming language needs to suffer a lot ;-)

When mastering a new language first you face a level of "cognitive overload" until the quirks of the language become easily handled by your unconscious mind. At that point, all of the sudden the quirky interaction becomes a "standard" way of performing the task. For example regular expression syntax seems to be a weird base for serious programs, fraught with pitfalls, a big semantic mess as a result of outgrowing its primary purpose. On the other hand, in skilled hands its a very powerful tool that can serve a parser for complex data and a substitute for string functions.

There are three notable steps in adaptation to Perl idiosyncrasies for programmers who got used to other languages:

  1. One early sign is when you start to put $ on all scalar variables automatically.  Most mistakes when you omit $ in front of the variable are diagnosed by interpreter but some like $location{city} are not. If you are using two languages this stage might never come and you need conscious effort all the time. This is the case with me.
  2. The next step is to overcome notational difficulties of using two different comparison operations, one for strings and the other for numerical values (like "==" for numbers vs. eq for strings) for comparison numbers and strings. In case two variables are involved, the interpreter does not provide any warnings, so you need to be very careful as if you use other language in parallel with Perl such errors crop into your scripts automatically. If one of the operators of "==" is a string constant automatic diagnostic can be provided.
  3. Conscious use of "==" as equality predicate for numbers, if you previous language allowed to used "=" for checking equality or assignments. The pitfall of using "=" for assignment,  results in the side effect of introducing errors in comparisons. For example,    if ($a=1)...  instead of if  ($a==1)... This problem was understood by designers on Algol 60, which was designed is late 50th. To avoid it they used := for assignment instead of plain =. But designers of Fortran, PL/1 and later C as derivative of PL/1 ignored this lesson (actually Fortran predates Algol 60, so its designers were pioneers and are not guilty). As Fortran and C (with its derivatives such as C++ and Java) became dominant programming languages we have, what we have.  Now talk about progress in programming language design, if the design blunder about which designers knew 65 years ago still is present in the most popular languages used today ;-). In all languages that have lexical structure of C, this blunder remains the  most rich source of subtle errors for novices. Naturally this list includes Perl.  As it also present in C and C++.  C programmers typically are already trained to be aware about  this language pitfall. Protective measures
    1. modify syntax highlighting in your editor  so that such cases were marked in bold red.
    2. Manually or automatically (simple regex done in the editor can do 99% of cases) reorganize such comparisons putting the constant on the left part of comparison, like in  if (1==$a)....
    3. Recent versions of Perl interpreter provide warning in this case, so checking your script with option -cw also helps if IDE provides capability to display of results of checking of syntax in one of the windows and jump to the line in code listed in the error or warning (this is a standard feature of all IDEs and actually this can be done in most editors too). 
  4. Learning to fight missing closing "}". This problem is typical for all C-style languages and requires pretty printer to spot.  But Perl interpret has a blunder -- it does not recognize the fact that in Perl subroutines can't be nested within blocks and does not point to the first subroutine as the diagnostic point -- it points to the end of the script. The longer the program is the more acute this problem becomes. PL/1 has labeled closing statement as in "mainloop: do... end mainloop" which closed all intermediate constructs automatically by C and Perl failed to adopt this innovation.  Neither Perl not C also use the concept of "local labels"  -- labels that exist only until they are redefined, typically such labels are numeric not identifies, see discussion at Knuth.

Please note that as syntax of Perl is complex the diagnostic in Perl interpreter is really bad. It is nowhere near the quality of diagnostics that mainframe programmers got in IBM PL/1 diagnostic complier, which is also probably 50 years old and run on tiny by today standard machines with 256K of RAM and 7M harddrives.   The only comfort  is that other scripting languages are even worse then Perl ;-).

C-style language traps in Perl

Avoiding C-style languages design blunder of "easy" mistyping "=" instead of "=="

One of most famous C design blunders was two small lexical difference between assignment and comparison (remember that Algol used := for assignment) caused by the design decision  to make the language more compact (terminals at this type were not very reliable and number of symbols typed matter greatly. In C assignment is allowed in if statement but no attempts were  made to make language more failsafe by avoiding possibility of mixing up  "=" and "==".  In  C syntax if ($a = $b) assigns the contents of $b to a and executes the code following if b not equal to 0. It is easy to mix thing and write if ($a = $b ) instead of (if ($a == $b)  which is a pretty nasty bug.  You can often reverse the sequence and put constant first like in

if ( 1==$i ) ...
as
if ( 1=$i ) ...
does not make any sense, such a blunder will be detected on syntax level.

Dealing with unbalanced "{" and "}"  error

One of the nasty problems with C, C++, Java, Perl and other C-style languages is that missing curvy brackets are pretty difficult to find. One effective solution that was first implemented in PL/1 and calculation of nesting (in compiler listing) and ability of multiple closure of blocks in the end statement (PL/1 did not use brackets {}, they were introduced in C). 

the optimal way to spot this problem is to use pretty printer.  In the absence of pretty printer you can insert '}' in binary search fashion until you find the spot where it is missing. This error actually discourages writing long Perl scripts so the is a silver lining in each dark cloud.

One can use pseudo comments that signify nesting level zero and check those points with special program or by writing an editor macro. One also can mark closing brackets with the name of construct it is closing

if (... ) { 

} # if 
You can write a simple script to do this automatically.

Many editors have the ability to point to the closing bracket for any given opening bracket and vise versa. This is also useful but less efficient way to solve the problem.  

Problem of unclosed at the end of the line string literal  ("...")

Specifying max length of literals is an effecting way of catching missing quote. This was implemented in PL/1 compilers. You can also have an option to limit literal to a single line. In general multi-line literals should have different lexical markers (like "here" construct in shell). Perl provides the opportunity to use concatenation operator for splitting literals into multiple line, which are "merged" at compile time, so there is no performance penalty for this constructs. But there is no limit on the number of lines string literal can occupy so this does not help much.

If such limit can be communicated via pragma statement at compile type in a particular fragment of text this is an effective way to avoid the problem. Usually only few places in program use multiline literals, if any.  Editors that use coloring help to detect unclosed literal problem but there are cases when they are useless.

Benefits that Perl brings to system administration 

All-in-all Perl is a great language. But even sun has dark spots... I have some doubts that Perl 6 in an improvement over Perl 5, but we will see.  Among benefits that Perl bring to system administration are

In short if make sense to learn Perl as it makes sysadmin like a lot easier. Probably more so then any other tool in sysadmin arsenal...

The problem of Perl complexity junkies

There is a type of Perl books authors that enjoy the fact that Perl is complex non orthogonal language and like to drive this notion to the extreme. I would call them complexity junkies.

Be skeptical and do not take recommendations of Perl advocates like Randal L. Schwartz  or Tom Christiansen for granted :-) Fancy idioms are very bad for novices. Please remember about KISS principle and try to write simple Perl scripts without complex regular expressions and/or fancy idioms. Some Perl gurus pathological preoccupation with idioms is definitely not healthy and is part of the problem, not a part of the solution...

We can defines three main types of Perl complexity junkies:

Please remember about KISS principle and try to write simple Perl scripts without overly complex regular expressions or fancy idioms.  Simplicity has great merits even if goes again current fancy.

Generally the problems mentioned above are more fundamental than the trivial "abstraction is the enemy of convenience". It is more like that badly chosen notational abstraction at one level can lead to an inhibition of innovative notational abstraction on others.

Perl as a new programming paradigm

Perl + C and, especially Perl+Unix+shell represent a  new programming paradigm in which the OS became a part of your programming toolkit and which is much more productive for large class of programs that  OO-style development (OO-cult ;-). It became especially convenient in virtual machine environment when application typically "owns" the machine. You can use shell for file manipulation and pipelines, Perl for high-level data structure manipulation and C when Perl is insufficient or too slow. The latter question for complex programs is non-trivial and correct detection of bottlenecks needs careful measurements; generally Perl is fast enough for most system programs.

The key idea here is that any sufficiently flexible and programmable environment and Perl is such an environment -- gradually begins to take on characteristics of both language and operating system as it grows. See Stevey's Blog Rants Get Famous By Not Programming for more about this effect.

Any sufficiently flexible and programmable environment and Perl is such an environment -- gradually begins to take on characteristics of both language and operating system as it grows.

Unix shell can actually provide a good "in the large" framework of complex programming system serving as a glue for the components.

From the point of view of typical application-level programming Perl is very under appreciated and very little understood language. Almost nobody is interested in details of interpreter, where debugger is integrated with the language really brilliantly. Also namespaces in Perl and OO constructs are very unorthodox and very interesting design.

References are major Perl innovation

References are Perl innovation: classic CS view is that scripting language should not contain references. Role of list construct as implicit subroutine argument list is also implemented non trivially (elements are "by reference" not "by name") and against CS orthodoxy (which favors default "by name" passing of arguments). There are many other unique things about design of Perl. All-in-all for a professional like me, who used to write compilers,  Perl is one of the few relatively "new" languages that is not boring :-).

Syntax of Perl is pretty regular and is favorably compared with the disaster which is syntax of Borne shell and derivatives as well as with syntax of C and C-derivatives. Larry Wall managed to avoid most lassic pitfalls in creating of the syntax of the language, pitfalls in which creators on PHP readily fell ("dangling else" is one example).

Perl has a great debugger

Debugger for the language is as important as the language itself. Perl debugger is simply great. see Debugging Perl Scripts

Brilliance of Perl Artistic license

Perl license is a real brilliance. Incredible from my point of view feat taking into account when it was done. It provided peaceful co-existence with GPL which is no small feat ;-). Dual licensing was a neat, extremely elegant cultural hack to make Perl acceptable both to businesses and the FSF.

It's very sad that there no really good into for Perl written from the point of view of CS professional despite 100 or more books published.

Perl warts


A small, crocky feature that sticks out of an otherwise clean design. Something conspicuous for localized ugliness, especially a special-case exception to a general rule. ...

Jargon File's definition of the term "wart"

Perl extended C-style syntax in innovative way. For example if statement always uses {} block, never an individual statement, also ; before } is optional. But it shares several C-style syntax shortcomings and introduced a couple of its own:

There are also several semantical problems with the language:

Absence of good development environment

R-language has RStudio which probably can be viewed as gold standard of minimal features needed  for scripting language GUI. While RStudio has a weak editor it has syntax highlighting and integration with debugger and as such is adequate for medium scripts. 

There is no similar "established" as standard de-facto GUI for Perl although you can use Orthodox file manage (such as Midnight commander, or in Windows Far or Total Commander) as poor man IDE.

This is not a show stopper for system administrators as they can use screen and mc and generally write mostly small script that can be written and debugged without IDE as well as with IDE.

But this is a problem when you try to write program with over 1K lines. Many things in modern IDE helps to avoid typical errors (for example identifiers can be picked up from the many by right clicking, braces are easier to match if editor provide small almost invisible vertical rulers, color of the string help to detect running string constants, etc.

Currently Komodo and free Komodo editor are almost the only viable game in town.

See

for additional discussion.

The most common versions of Perl  5 in production

RHEL now ships with Perl 5.10. Many classic Unixes still ship with Perl 5.8.8 . Older versions of  Solaris and HP-US might have version below Perl 5.8.8 but that's rare. So wring for Perl 5.8.8 virtually guarantee compatibility (there is an exception; see note about Perl 5.22. Hopefully this version will never get into production). In other words no "state" variables if you want "perfect" compatibility. Non perfect but acceptable for Linux deployment compatibility can be achieved by using version 5.10 which allow you to use "state" variables. 

Lost development priorities

For mature language the key area of development is not questionable enhancements, but improvement of interpreter diagnostics and efforts in preventing typical errors (which at this point are known).

Perl version 5.10  was the version when two useful enhancement to the language were introduced:

Still very little was done to improve interpreter in order to help programmers to avoid most typical Perl errors. that means that the quality of the editor for Perl programmers is of paramount importance. I would recommend with reservations (it's a pretty buggy product) version 9.3 of free Komodo editor. This version allows you to see the list of already declared variables in the program and thus avoid classic "typo in the variable" type of errors. 

Not all enhancements that Perl developers adopers after version 5.10 have practical value. Sometime as requirement to use backstalah in regular expressions number of iterations ( so that /\d{2}/ in "normal" Perl became /\d\{2}/ in version 5.22) they are counterproductive. For that reason I do not recommend using version 5.22. You can also use pragma

use v5.12.0

to avoid stupid warning version 5.12 generates.

There is no attempts to standardize Perl and do enhancements via orderly, negotiated by major stakeholders process. Like is done with C or Fortran (each 11 years). At the same time quality of diagnostics of typical errors by Perl interpreter remains  weak.

Support for a couple of useful pragma, for example, the ability to limit the length of string constants to a given length (for example 120) for certain parts of the script is absent. Ot something similar like "do not cross the line" limitation. 

Local labels might help to close multiple level of nesting (the problem of missing curvy bracket is typical in al C-style languages)

 1:if( $i==1 ){
     if( $k==0 ){
         if ($m==0 ){
   # the curvy bracket below closes all opened clock since the local label 1
 }:1 

Multiple entry points into subroutines might help to organize namespaces.

Working with namespaces can and should be improved and rules for Perl namespaces should be much better better documented. Like pointers namespaces provide powerful facity to structuring language programs. which can be used with or without modules framework. this is a very nice and very powerful Perl feature that makes Perl a class or its own for experienced programmers. Please note that modules are not the only game in town. Actually the way they were constructed has some issues and (sometime stupid) overemphasis on OO only exacerbate those issues. Multiple entry points in procedures would be probably more useful and more efficient addition to the language. Additional that is very easy to implement. The desire to be like the rest of the pack often backfire... From SE point of view scripting language as VHL stands above OO in pecking order ;-). OO is mainly force feed for low level guys who suffer from Java...

Actually there are certain features that should probably be eliminated from  Perl 5. For example use of unquoted words as indexes to hashes is definitely a language designers blunder and should be gone.  String functions and array functions should be better unified. Exception mechanism should be introduced.  Assignment in  if statements should be somehow restricted. Assignment of constants to variables in if statement (and all conditions)  should be flagged as a clear error (as in if ($a=5) ... ). I think latest version of Perl interpreter do this already.

Problems with Perl 5. 22

Attention: The release contains an obvious bug in regex tokenizer, which now incorrectly requires backslash for number of repetitions part of basic regex symbols. For example in case of /\d{2}/ which you now need to write /\d\{2}/ -- pretty illogical as a curvy brace here a part of \d construct not a separate symbol (which of course should be escaped);  This is a typical SNAFU. 

This newly introduced bug (aka feature) also affects regexes that use opening curvy bracket as a delimiter. Which is a minor but pretty annoying "change we can believe in" ;-). I think that idiosyncrasy will prevent spread this version into production version of Linux Unix for a long, long time (say 10 years) or forever.  Image the task of modification of somebody else 3-5K lines Perl script for those warnings that heavily uses curvy braces in regex or use \d{1,3} constructs for parsing IP addresses.

Dr. Nikolai Bezroukov


Top updates

Softpanorama Switchboard
Softpanorama Search


NEWS CONTENTS

Old News ;-)

[Jun 28, 2017] A Short Guide to DBI

Notable quotes:
"... Structured Query Language ..."
"... database handle ..."
"... statement handle ..."
Jun 18, 2017 | www.perl.com
By Mark-Jason Dominus on October 22, 1999 12:00 AM Short guide to DBI (The Perl Database Interface Module) General information about relational databases

Relational databases started to get to be a big deal in the 1970's, and they're still a big deal today, which is a little peculiar, because they're a 1960's technology.

A relational database is a bunch of rectangular tables. Each row of a table is a record about one person or thing; the record contains several pieces of information called fields . Here is an example table:

        LASTNAME   FIRSTNAME   ID   POSTAL_CODE   AGE  SEX
        Gauss      Karl        119  19107         30   M
        Smith      Mark        3    T2V 3V4       53   M
        Noether    Emmy        118  19107         31   F
        Smith      Jeff        28   K2G 5J9       19   M
        Hamilton   William     247  10139         2    M

The names of the fields are LASTNAME , FIRSTNAME , ID , POSTAL_CODE , AGE , and SEX . Each line in the table is a record , or sometimes a row or tuple . For example, the first row of the table represents a 30-year-old male whose name is Karl Gauss, who lives at postal code 19107, and whose ID number is 119.

Sometimes this is a very silly way to store information. When the information naturally has a tabular structure it's fine. When it doesn't, you have to squeeze it into a table, and some of the techniques for doing that are more successful than others. Nevertheless, tables are simple and are easy to understand, and most of the high-performance database systems you can buy today operate under this 1960's model.

About SQL

SQL stands for Structured Query Language . It was invented at IBM in the 1970's. It's a language for describing searches and modifications to a relational database.

SQL was a huge success, probably because it's incredibly simple and anyone can pick it up in ten minutes. As a result, all the important database systems support it in some fashion or another. This includes the big players, like Oracle and Sybase, high-quality free or inexpensive database systems like MySQL, and funny hacks like Perl's DBD::CSV module, which we'll see later.

There are four important things one can do with a table:

SELECT
Find all the records that have a certain property

INSERT

DELETE
Remove old records

UPDATE
Modify records that are already there

Those are the four most important SQL commands, also called queries . Suppose that the example table above is named people . Here are examples of each of the four important kinds of queries:

 SELECT firstname FROM people WHERE lastname = 'Smith'

(Locate the first names of all the Smiths.)

 DELETE FROM people WHERE id = 3

(Delete Mark Smith from the table)

 UPDATE people SET age = age+1 WHERE id = 247

(William Hamilton just had a birthday.)



   

(Add Leonhard Euler to the table.)

There are a bunch of other SQL commands for creating and discarding tables, for granting and revoking access permissions, for committing and abandoning transactions, and so forth. But these four are the important ones. Congratulations; you are now a SQL programmer. For the details, go to any reasonable bookstore and pick up a SQL quick reference.

About Databases --

Every database system is a little different. You talk to some databases over the network and make requests of the database engine; other databases you talk to through files or something else.

Typically when you buy a commercial database, you get a library with it. The vendor has written some functions for talking to the database in some language like C, compiled the functions, and the compiled code is the library. You can write a C program that calls the functions in the library when it wants to talk to the database.

There's a saying that any software problem can be solved by adding a layer of indirection. That's what Perl's DBI (`Database Interface') module is all about. It was written by Tim Bunce.

DBI is designed to protect you from the details of the vendor libraries. It has a very simple interface for saying what SQL queries you want to make, and for getting the results back. DBI doesn't know how to talk to any particular database, but it does know how to locate and load in DBD modules have the vendor libraries in them and know how to talk to the real databases; there is one DBD module for every different database.

When you ask DBI module, which spins around three times or drinks out of its sneaker or whatever is necessary to communicate with the real database. When it gets the results back, it passes them to DBI . Then DBI gives you the results. Since your program only has to deal with DBI , and not with the real database, you don't have to worry about barking like a chicken.

Here's your program talking to the DBI library. You are using two databases at once. One is an Oracle database server on some other machine, and another is a DBD::CSV database that stores the data in a bunch of plain text files on the local disk.

Your program sends a query to DBI , which forwards it to the appropriate DBD module; let's say it's DBD::Oracle . DBD::Oracle knows how to translate what it gets from DBI into the format demanded by the Oracle library, which is built into it. The library forwards the request across the network, gets the results back, and returns them to DBD::Oracle . DBD::Oracle returns the results to DBI as a Perl data structure. Finally, your program can get the results from DBI .

On the other hand, suppose that your program was querying the text files. It would prepare the same sort of query in exactly the same way, and send it to DBI in exactly the same way. DBI would see that you were trying to talk to the DBD::CSV database and forward the request to the DBD::CSV module. The DBD::CSV module has Perl functions in it that tell it how to parse SQL and how to hunt around in the text files to find the information you asked for. It then returns the results to DBI as a Perl data structure. Finally, your program gets the results from DBI in exactly the same way that it would have if you were talking to Oracle instead.

There are two big wins that result from this organization. First, you don't have to worry about the details of hunting around in text files or talking on the network to the Oracle server or dealing with Oracle's library. You just have to know how to talk to DBI .

Second, if you build your program to use Oracle, and then the following week upper management signs a new Strategic Partnership with Sybase, it's easy to convert your code to use Sybase instead of Oracle. You change exactly one line in your program, the line that tells DBI to talk to DBD::Oracle , and have it use DBD::Sybase instead. Or you might build your program to talk to a cheap, crappy database like MS Access, and then next year when the application is doing well and getting more use than you expected, you can upgrade to a better database next year without changing any of your code.

There are DBD modules for talking to every important kind of SQL database. DBD::Oracle will talk to Oracle, and DBD::Sybase will talk to Sybase. DBD::ODBC will talk to any ODBC database including Microsoft Acesss. (ODBC is a Microsoft invention that is analogous to DBI itself. There is no DBD module for talking to Access directly.) DBD::CSV allows SQL queries on plain text files. DBD::mysql talks to the excellent MySQL database from TCX DataKonsultAB in Sweden. (MySQL is a tremendous bargain: It's $200 for commercial use, and free for noncommerical use.)

Example of How to Use DBI

Here's a typical program. When you run it, it waits for you to type a last name. Then it searches the database for people with that last name and prints out the full name and ID number for each person it finds. For example:

 Enter name> Noether
                118: Emmy Noether

        Enter name> Smith
                3: Mark Smith
                28: Jeff Smith

        Enter name> Snonkopus
                No names matched `Snonkopus'.
       
        Enter name> ^D

Here is the code:

 use DBI;

        my $dbh = DBI->connect('DBI:Oracle:payroll')
                or die "Couldn't connect to database: " . DBI->errstr;
        my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?')
                or die "Couldn't prepare statement: " . $dbh->errstr;

        print "Enter name> ";
        while ($lastname = <>) {               # Read input from the user
          my @data;
          chomp $lastname;
          $sth->execute($lastname)             # Execute the query
            or die "Couldn't execute statement: " . $sth->errstr;

          # Read the matching records and print them out         
          while (@data = $sth->fetchrow_array()) {
            my $firstname = $data[1];
            my $id = $data[2];
            print "\t$id: $firstname $lastname\n";
          }

          if ($sth->rows == 0) {
            print "No names matched `$lastname'.\n\n";
          }

          $sth->finish;
          print "\n";
          print "Enter name> ";
        }
         
        $dbh->disconnect;

Explanation of the Example --

 use DBI;

This loads in the DBI module. Notice that we don't have to load in any DBD module. DBI will do that for us when it needs to.

 my $dbh = DBI->connect('DBI:Oracle:payroll');
                or die "Couldn't connect to database: " . DBI->errstr;

The connect call tries to connect to a database. The first argument, DBI:Oracle:payroll , tells DBI what kind of database it is connecting to. The Oracle part tells it to load DBD::Oracle and to use that to communicate with the database. If we had to switch to Sybase next week, this is the one line of the program that we would change. We would have to change Oracle to Sybase .

payroll is the name of the database we will be searching. If we were going to supply a username and password to the database, we would do it in the connect call:

 my $dbh = DBI->connect('DBI:Oracle:payroll', 'username', 'password')
                or die "Couldn't connect to database: " . DBI->errstr;

If DBI connects to the database, it returns a database handle object, which we store into $dbh . This object represents the database connection. We can be connected to many databases at once and have many such database connection objects.

If DBI can't connect, it returns an undefined value. In this case, we use die to abort the program with an error message. DBI->errstr returns the reason why we couldn't connect-``Bad password'' for example.

 my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?')
                or die "Couldn't prepare statement: " . $dbh->errstr;

The prepare call prepares a query to be executed by the database. The argument is any SQL at all. On high-end databases, prepare will send the SQL to the database server, which will compile it. If prepare is successful, it returns a statement handle object which represents the statement; otherwise it returns an undefined value and we abort the program. $dbh->errstr will return the reason for failure, which might be ``Syntax error in SQL''. It gets this reason from the actual database, if possible.

The ? in the SQL will be filled in later. Most databases can handle this. For some databases that don't understand the ? , the DBD module will emulate it for you and will pretend that the database understands how to fill values in later, even though it doesn't.

 print "Enter name> ";

Here we just print a prompt for the user.

 while ($lastname = <>) {               # Read input from the user
          ...
        }

This loop will repeat over and over again as long as the user enters a last name. If they type a blank line, it will exit. The Perl <> symbol means to read from the terminal or from files named on the command line if there were any.

 my @data;

This declares a variable to hold the data that we will get back from the database.

 chomp $lastname;

This trims the newline character off the end of the user's input.

 $sth->execute($lastname)             # Execute the query
            or die "Couldn't execute statement: " . $sth->errstr;

execute executes the statement that we prepared before. The argument $lastname is substituted into the SQL in place of the ? that we saw earlier. execute returns a true value if it succeeds and a false value otherwise, so we abort if for some reason the execution fails.

 while (@data = $sth->fetchrow_array()) {
            ...
           }

fetchrow_array returns one of the selected rows from the database. You get back an array whose elements contain the data from the selected row. In this case, the array you get back has six elements. The first element is the person's last name; the second element is the first name; the third element is the ID, and then the other elements are the postal code, age, and sex.

Each time we call fetchrow_array , we get back a different record from the database. When there are no more matching records, fetchrow_array returns the empty list and the while loop exits.

 my $firstname = $data[1];
             my $id = $data[2];

These lines extract the first name and the ID number from the record data.

 print "\t$id: $firstname $lastname\n";

This prints out the result.

 if ($sth->rows == 0) {
            print "No names matched `$lastname'.\n\n";
          }

The rows method returns the number of rows of the database that were selected. If no rows were selected, then there is nobody in the database with the last name that the user is looking for. In that case, we print out a message. We have to do this after the while loop that fetches whatever rows were available, because with some databases you don't know how many rows there were until after you've gotten them all.

 $sth->finish;
          print "\n";
          print "Enter name> ";

Once we're done reporting about the result of the query, we print another prompt so that the user can enter another name. finish tells the database that we have finished retrieving all the data for this query and allows it to reinitialize the handle so that we can execute it again for the next query.

 $dbh->disconnect;

When the user has finished querying the database, they type a blank line and the main while loop exits. disconnect closes the connection to the database.

Cached Queries

Here's a function which looks up someone in the example table, given their ID number, and returns their age:

 sub age_by_id {
          # Arguments: database handle, person ID number
          my ($dbh, $id) = @_;
          my $sth = $dbh->prepare('SELECT age FROM people WHERE id = ?')
            or die "Couldn't prepare statement: " . $dbh->errstr;

 $sth->execute($id)
            or die "Couldn't execute statement: " . $sth->errstr;

 my ($age) = $sth->fetchrow_array();
          return $age;
        }

It prepares the query, executes it, and retrieves the result.

There's a problem here though. Even though the function works correctly, it's inefficient. Every time it's called, it prepares a new query. Typically, preparing a query is a relatively expensive operation. For example, the database engine may parse and understand the SQL and translate it into an internal format. Since the query is the same every time, it's wasteful to throw away this work when the function returns.

Here's one solution:

 { my $sth;
          sub age_by_id {
            # Arguments: database handle, person ID number
            my ($dbh, $id) = @_;

 if (! defined $sth) {
              $sth = $dbh->prepare('SELECT age FROM people WHERE id = ?')
                or die "Couldn't prepare statement: " . $dbh->errstr;
            }

 $sth->execute($id)
              or die "Couldn't execute statement: " . $sth->errstr;

 my ($age) = $sth->fetchrow_array();
            return $age;
          }
        }

There are two big changes to this function from the previous version. First, the $sth variable has moved outside of the function; this tells Perl that its value should persist even after the function returns. Next time the function is called, $sth will have the same value as before.

Second, the prepare code is in a conditional block. It's only executed if $sth does not yet have a value. The first time the function is called, the prepare code is executed and the statement handle is stored into $sth . This value persists after the function returns, and the next time the function is called, $sth still contains the statement handle and the prepare code is skipped.

Here's another solution:

 sub age_by_id {
          # Arguments: database handle, person ID number
          my ($dbh, $id) = @_;
          my $sth = $dbh->prepare_cached('SELECT age FROM people WHERE id = ?')
            or die "Couldn't prepare statement: " . $dbh->errstr;

 $sth->execute($id)
            or die "Couldn't execute statement: " . $sth->errstr;

 my ($age) = $sth->fetchrow_array();
          return $age;
        }

Here the only change to to replace prepare with prepare_cached . The prepare_cached call is just like prepare , except that it looks to see if the query is the same as last time. If so, it gives you the statement handle that it gave you before.

Transactions

Many databases support transactions . This means that you can make a whole bunch of queries which would modify the databases, but none of the changes are actually made. Then at the end you issue the special SQL query COMMIT , and all the changes are made simultaneously. Alternatively, you can issue the query ROLLBACK , in which case all the queries are thrown away.

As an example of this, consider a function to add a new employee to a database. The database has a table called employees that looks like this:

 FIRSTNAME  LASTNAME   DEPARTMENT_ID
        Gauss      Karl       17
        Smith      Mark       19
        Noether    Emmy       17
        Smith      Jeff       666
        Hamilton   William    17

and a table called departments that looks like this:

 ID   NAME               NUM_MEMBERS
        17   Mathematics        3
        666  Legal              1
        19   Grounds Crew       1

The mathematics department is department #17 and has three members: Karl Gauss, Emmy Noether, and William Hamilton.

Here's our first cut at a function to insert a new employee. It will return true or false depending on whether or not it was successful:

 sub new_employee {
          # Arguments: database handle; first and last names of new employee;
          # department ID number for new employee's work assignment
          my ($dbh, $first, $last, $department) = @_;
          my ($insert_handle, $update_handle);

 my $insert_handle =
            $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)');
          my $update_handle =
            $dbh->prepare_cached('UPDATE departments
                                     SET num_members = num_members + 1
                                   WHERE id = ?');

 die "Couldn't prepare queries; aborting"
            unless defined $insert_handle && defined $update_handle;

 $insert_handle->execute($first, $last, $department) or return 0;
          $update_handle->execute($department) or return 0;
          return 1;   # Success
        }

We create two handles, one for an insert query that will insert the new employee's name and department number into the employees table, and an update query that will increment the number of members in the new employee's department in the department table. Then we execute the two queries with the appropriate arguments.

There's a big problem here: Suppose, for some reason, the second query fails. Our function returns a failure code, but it's too late, it has already added the employee to the employees table, and that means that the count in the departments table is wrong. The database now has corrupted data in it.

The solution is to make both updates part of the same transaction. Most databases will do this automatically, but without an explicit instruction about whether or not to commit the changes, some databases will commit the changes when we disconnect from the database, and others will roll them back. We should specify the behavior explicitly.

Typically, no changes will actually be made to the database until we issue a commit . The version of our program with commit looks like this:

 sub new_employee {
          # Arguments: database handle; first and last names of new employee;
          # department ID number for new employee's work assignment
          my ($dbh, $first, $last, $department) = @_;
          my ($insert_handle, $update_handle);

 my $insert_handle =
            $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)');
          my $update_handle =
            $dbh->prepare_cached('UPDATE departments
                                     SET num_members = num_members + 1
                                   WHERE id = ?');

 die "Couldn't prepare queries; aborting"
            unless defined $insert_handle && defined $update_handle;

 my $success = 1;
          $success &&= $insert_handle->execute($first, $last, $department);
          $success &&= $update_handle->execute($department);

 my $result = ($success ? $dbh->commit : $dbh->rollback);
          unless ($result) {
            die "Couldn't finish transaction: " . $dbh->errstr
          }
          return $success;
        }

We perform both queries, and record in $success whether they both succeeded. $success will be true if both queries succeeded, false otherwise. If the queries succeded, we commit the transaction; otherwise, we roll it back, cancelling all our changes.

The problem of concurrent database access is also solved by transactions. Suppose that queries were executed immediately, and that some other program came along and examined the database after our insert but before our update. It would see inconsistent data in the database, even if our update would eventually have succeeded. But with transactions, all the changes happen simultaneously when we do the commit , and the changes are committed automatically, which means that any other program looking at the database either sees all of them or none.

Miscellaneous --

do

If you're doing an UPDATE , INSERT , or DELETE there is no data that comes back from the database, so there is a short cut. You can say

 $dbh->do('DELETE FROM people WHERE age > 65');

for example, and DBI will prepare the statement, execute it, and finish it. do returns a true value if it succeeded, and a false value if it failed. Actually, if it succeeds it returns the number of affected rows. In the example it would return the number of rows that were actually deleted. ( DBI plays a magic trick so that the value it turns is true even when it is 0. This is bizarre, because 0 is usually false in Perl. But it's convenient because you can use it either as a number or as a true-or-false success code, and it works both ways.)

AutoCommit

If your transactions are simple, you can save yourself the trouble of having to issue a lot of commit s. When you make the connect call, you can specify an AutoCommit option which will perform an automatic commit operation after every successful query. Here's what it looks like:

 my $dbh = DBI->connect('DBI:Oracle:payroll',
                               {AutoCommit => 1},
                              )
                or die "Couldn't connect to database: " . DBI->errstr;

Automatic Error Handling

When you make the connect call, you can specify a RaiseErrors option that handles errors for you automatically. When an error occurs, DBI will abort your program instead of returning a failure code. If all you want is to abort the program on an error, this can be convenient:

 my $dbh = DBI->connect('DBI:Oracle:payroll',
                               {RaiseError => 1},
                              )
                or die "Couldn't connect to database: " . DBI->errstr;

Don't do This

People are always writing code like this:

 while ($lastname = <>) {
          my $sth = $dbh->prepare("SELECT * FROM people
                                   WHERE lastname = '$lastname'");
          $sth->execute();
          # and so on ...
        }

Here we interpolated the value of $lastname directly into the SQL in the prepare call.

This is a bad thing to do for three reasons.

First, prepare calls can take a long time. The database server has to compile the SQL and figure out how it is going to run the query. If you have many similar queries, that is a waste of time.

Second, it will not work if $lastname contains a name like O'Malley or D'Amico or some other name with an ' . The ' has a special meaning in SQL, and the database will not understand when you ask it to prepare a statement that looks like

 SELECT * FROM people WHERE lastname = 'O'Malley'

It will see that you have three ' s and complain that you don't have a fourth matching ' somewhere else.

Finally, if you're going to be constructing your query based on a user input, as we did in the example program, it's unsafe to simply interpolate the input directly into the query, because the user can construct a strange input in an attempt to trick your program into doing something it didn't expect. For example, suppose the user enters the following bizarre value for $input :

 x' or lastname = lastname or lastname = 'y

Now our query has become something very surprising:

 SELECT * FROM people WHERE lastname = 'x'
         or lastname = lastname or lastname = 'y'

The part of this query that our sneaky user is interested in is the second or clause. This clause selects all the records for which lastname is equal to lastname ; that is, all of them. We thought that the user was only going to be able to see a few records at a time, and now they've found a way to get them all at once. This probably wasn't what we wanted.

References

A complete list of DBD modules are available here
You can download these modules here
DBI modules are available here
You can get MySQL from www.tcx.se

People go to all sorts of trouble to get around these problems with interpolation. They write a function that puts the last name in quotes and then backslashes any apostrophes that appear in it. Then it breaks because they forgot to backslash backslashes. Then they make their escape function better. Then their code is a big message because they are calling the backslashing function every other line. They put a lot of work into it the backslashing function, and it was all for nothing, because the whole problem is solved by just putting a ? into the query, like this

 SELECT * FROM people WHERE lastname = ?

All my examples look like this. It is safer and more convenient and more efficient to do it this way.

[Jun 28, 2017] Bless My Referents by Damian Conway

September 16, 1999 | www.perl.com
Introduction

Damian Conway is the author of the newly released Object Oriented Perl , the first of a new series of Perl books from Manning.

Object-oriented programming in Perl is easy. Forget the heavy theory and the sesquipedalian jargon: classes in Perl are just regular packages, objects are just variables, methods are just subroutines. The syntax and semantics are a little different from regular Perl, but the basic building blocks are completely familiar.

The one problem most newcomers to object-oriented Perl seem to stumble over is the notion of references and referents, and how the two combine to create objects in Perl. So let's look at how references and referents relate to Perl objects, and see who gets to be blessed and who just gets to point the finger.

Let's start with a short detour down a dark alley...

References and referents

Sometimes it's important to be able to access a variable indirectly- to be able to use it without specifying its name. There are two obvious motivations: the variable you want may not have a name (it may be an anonymous array or hash), or you may only know which variable you want at run-time (so you don't have a name to offer the compiler).

To handle such cases, Perl provides a special scalar datatype called a reference . A reference is like the traditional Zen idea of the "finger pointing at the moon". It's something that identifies a variable, and allows us to locate it. And that's the stumbling block most people need to get over: the finger (reference) isn't the moon (variable); it's merely a means of working out where the moon is.

Making a reference

When you prefix an existing variable or value with the unary \ operator you get a reference to the original variable or value. That original is then known as the referent to which the reference refers.

For example, if $s is a scalar variable, then \$s is a reference to that scalar variable (i.e. a finger pointing at it) and $s is that finger's referent. Likewise, if @a in an array, then \@a is a reference to it.

In Perl, a reference to any kind of variable can be stored in another scalar variable. For example:

$slr_ref = \$s;     
# scalar $slr_ref stores a reference to scalar $s
$arr_ref = \@a;     
# scalar $arr_ref stores a reference to array @a
$hsh_ref = \%h;     
# scalar $hsh_ref stores a reference to hash %h

Figure 1 shows the relationships produced by those assignments.

Note that the references are separate entities from the referents at which they point. The only time that isn't the case is when a variable happens to contain a reference to itself:

$self_ref = \$self_ref;
     # $self_ref stores a reference to itself!

That (highly unusual) situation produces an arrangement shown in Figure 2.

Once you have a reference, you can get back to the original thing it refers to-it's referent-simply by prefixing the variable containing the reference (optionally in curly braces) with the appropriate variable symbol. Hence to access $s , you could write $$slr_ref or ${$slr_ref} . At first glance, that might look like one too many dollar signs, but it isn't. The $slr_ref tells Perl which variable has the reference; the extra $ tells Perl to follow that reference and treat the referent as a scalar.

Similarly, you could access the array @a as @{$arr_ref} , or the hash %h as %{$hsh_ref} . In each case, the $whatever_ref is the name of the scalar containing the reference, and the leading @ or % indicates what type of variable the referent is. That type is important: if you attempt to prefix a reference with the wrong symbol (for example, @{$slr_ref} or ${$hsh_ref} ), Perl produces a fatal run-time error.

[A series of scalar variables with arrows pointing to
other variables]
Figure 1: References and their referents

[A scalar variable with an arrow pointing back to
itself]
Figure 2: A reference that is its own referent

The "arrow" operator

Accessing the elements of an array or a hash through a reference can be awkward using the syntax shown above. You end up with a confusing tangle of dollar signs and brackets:

${$arr_ref}[0] = ${$hsh_ref}{"first"};  
# i.e. $a[0] = $h{"first"}

So Perl provides a little extra syntax to make life just a little less cluttered:
$arr_ref->[0] = $hsh_ref->{"first"};    
# i.e. $a[0] = $h{"first"}

The "arrow" operator ( -> ) takes a reference on its left and either an array index (in square brackets) or a hash key (in curly braces) on its right. It locates the array or hash that the reference refers to, and then accesses the appropriate element of it.

Identifying a referent

Because a scalar variable can store a reference to any kind of data, and because dereferencing a reference with the wrong prefix leads to fatal errors, it's sometimes important to be able to determine what type of referent a specific reference refers to. Perl provides a built-in function called ref that takes a scalar and returns a description of the kind of reference it contains. Table 1 summarizes the string that is returned for each type of reference.

If $slr_ref contains... then ref($slr_ref) returns... undef
a reference to a scalar
a reference to an array "ARRAY"
a reference to a hash "HASH"
a reference to a subroutine "CODE"
a reference to a filehandle "IO" or "IO::Handle"
a reference to a typeglob "GLOB"
a reference to a precompiled pattern "Regexp"
a reference to another reference "REF"


Table 1: What ref returns

As Table 1 indicates, you can create references to many kinds of Perl constructs, apart from variables.

If a reference is used in a context where a string is expected, then the ref function is called automatically to produce the expected string, and a unique hexadecimal value (the internal memory address of the thing being referred to) is appended. That means that printing out a reference:

print $hsh_ref, "\n";
produces something like:

HASH(0x10027588)

since each element of print 's argument list is stringified before printing.

The ref function has a vital additional role in object-oriented Perl, where it can be used to identify the class to which a particular object belongs. More on that in a moment.

References, referents, and objects

References and referents matter because they're both required when you come to build objects in Perl. In fact, Perl objects are just referents (i.e. variables or values) that have a special relationship with a particular package. References come into the picture because Perl objects are always accessed via a reference, using an extension of the "arrow" notation.

But that doesn't mean that Perl's object-oriented features are difficult to use (even if you're still unsure of references and referents). To do real, useful, production-strength, object-oriented programming in Perl you only need to learn about one extra function, one straightforward piece of additional syntax, and three very simple rules. Let's start with the rules...

Rule 1: To create a class, build a package

Perl packages already have a number of class-like features:

In Perl, those features are sufficient to allow a package to act like a class.

Suppose you wanted to build an application to track faults in a system. Here's how to declare a class named "Bug" in Perl:

package Bug;
That's it! In Perl, classes are packages. No magic, no extra syntax, just plain, ordinary packages. Of course, a class like the one declared above isn't very interesting or useful, since its objects will have no attributes or behaviour.

That brings us to the second rule...

Rule 2: To create a method, write a subroutine

In object-oriented theory, methods are just subroutines that are associated with a particular class and exist specifically to operate on objects that are instances of that class. In Perl, a subroutine that is declared in a particular package is already associated with that package. So to write a Perl method, you just write a subroutine within the package that is acting as your class.

For example, here's how to provide an object method to print Bug objects:

package Bug;
sub print_me
{
       # The code needed to print the Bug goes here
}

Again, that's it. The subroutine print_me is now associated with the package Bug, so whenever Bug is used as a class, Perl automatically treats Bug::print_me as a method.

Invoking the Bug::print_me method involves that one extra piece of syntax mentioned above-an extension to the existing Perl "arrow" notation. If you have a reference to an object of class Bug, you can access any method of that object by using a -> symbol, followed by the name of the method.

For example, if the variable $nextbug holds a reference to a Bug object, you could call Bug::print_me on that object by writing:

$nextbug->print_me();
Calling a method through an arrow should be very familiar to any C++ programmers; for the rest of us, it's at least consistent with other Perl usages:
$hsh_ref->{"key"};           
# Access the hash referred to by $hashref
$arr_ref->[$index];          
# Access the array referred to by $arrayref
$sub_ref->(@args);           
# Access the sub referred to by $subref

$obj_ref->method(@args);     
# Access the object referred to by $objref


The only difference with the last case is that the referent (i.e. the object) pointed to by $objref has many ways of being accessed (namely, its various methods). So, when you want to access that object, you have to specify which particular way-which method-should be used. Hence, the method name after the arrow.

When a method like Bug::print_me is called, the argument list that it receives begins with the reference through which it was called, followed by any arguments that were explicitly given to the method. That means that calling Bug::print_me("logfile") is not the same as calling $nextbug->print_me("logfile") . In the first case, print_me is treated as a regular subroutine so the argument list passed to Bug::print_me is equivalent to:

( "logfile" )
In the second case, print_me is treated as a method so the argument list is equivalent to:
( $objref, "logfile" )
Having a reference to the object passed as the first parameter is vital, because it means that the method then has access to the object on which it's supposed to operate. Hence you'll find that most methods in Perl start with something equivalent to this:
package Bug;
sub print_me
{
    my ($self) = shift;

    # The @_ array now stores the arguments passed to &Bug::print_me
    # The rest of &print_me uses the data referred to by $self 
    # and the explicit arguments (still in @_)
}
or, better still:
package Bug;
sub print_me
{
    my ($self, @args) = @_;

    # The @args array now stores the arguments passed to &Bug::print_me
    # The rest of &print_me uses the data referred to by $self
    # and the explicit arguments (now in @args)
}
This second version is better because it provides a lexically scoped copy of the argument list ( @args ). Remember that the @_ array is "magical"-changing any element of it actually changes the caller's version of the corresponding argument. Copying argument values to a lexical array like @args prevents nasty surprises of this kind, as well as improving the internal documentation of the subroutine (especially if a more meaningful name than @args is chosen).

The only remaining question is: how do you create the invoking object in the first place?

Rule 3: To create an object, bless a referent

Unlike other object-oriented languages, Perl doesn't require that an object be a special kind of record-like data structure. In fact, you can use any existing type of Perl variable-a scalar, an array, a hash, etc.-as an object in Perl.

Hence, the issue isn't how to create the object, because you create them exactly like any other Perl variable: declare them with a my , or generate them anonymously with a [ ... ] or { ... } . The real problem is how to tell Perl that such an object belongs to a particular class. That brings us to the one extra built-in Perl function you need to know about. It's called bless , and its only job is to mark a variable as belonging to a particular class.

The bless function takes two arguments: a reference to the variable to be marked, and a string containing the name of the class. It then sets an internal flag on the variable, indicating that it now belongs to the class.

For example, suppose that $nextbug actually stores a reference to an anonymous hash:

$nextbug = {
                id    => "00001",
                type  => "fatal",
                descr => "application does not compile",
           };
To turn that anonymous hash into an object of class Bug you write:
bless $nextbug, "Bug";
And, once again, that's it! The anonymous array referred to by $nextbug is now marked as being an object of class Bug. Note that the variable $nextbug itself hasn't been altered in any way; only the nameless hash it refers to has been marked. In other words, bless sanctified the referent, not the reference. Figure 3 illustrates where the new class membership flag is set.

You can check that the blessing succeeded by applying the built-in ref function to $nextbug . As explained above, when ref is applied to a reference, it normally returns the type of that reference. Hence, before $nextbug was blessed, ref($nextbug) would have returned the string 'HASH' .

Once an object is blessed, ref returns the name of its class instead. So after the blessing, ref($nextbug) will return 'Bug' . Of course the object itself still is a hash, but now it's a hash that belongs to the Bug class. The various entries of the hash become the attributes of the newly created Bug object.

[A picture of an anonymous hash having a flag set within it]
Figure 3: What changes when an object is blessed

Creating a constructor

Given that you're likely to want to create many such Bug objects, it would be convenient to have a subroutine that took care of all the messy, blessy details. You could pass it the necessary information, and it would then wrap it in an anonymous hash, bless the hash, and give you back a reference to the resulting object.

And, of course, you might as well put such a subroutine in the Bug package itself, and call it something that indicates its role. Such a subroutine is known as a constructor, and it generally looks like this:

package Bug;
sub new
{
    my $class = $_[0];
    my $objref = {
                     id    => $_[1],
                     type  => $_[2],
                     descr => $_[3],
                 };
    bless $objref, $class;
    return $objref;
}
Note that the middle bits of the subroutine (in bold) look just like the raw blessing that was handed out to $nextbug in the previous example.

The bless function is set up to make writing constructors like this a little easier. Specifically, it returns the reference that's passed as its first argument (i.e. the reference to whatever referent it just blessed into object-hood). And since Perl subroutines automatically return the value of their last evaluated statement, that means that you could condense the definition of Bug::new to this:

sub Bug::new
{
        bless { id => $_[1], type => $_[2], descr => $_[3] }, $_[0];
}
This version has exactly the same effects: slot the data into an anonymous hash, bless the hash into the class specified first argument, and return a reference to the hash.

Regardless of which version you use, now whenever you want to create a new Bug object, you can just call:

$nextbug = Bug::new("Bug", $id, $type, $description);
That's a little redundant, since you have to type "Bug" twice. Fortunately, there's another feature of the "arrow" method-call syntax that solves this problem. If the operand to the left of the arrow is the name of a class -rather than an object reference-then the appropriate method of that class is called. More importantly, if the arrow notation is used, the first argument passed to the method is a string containing the class name. That means that you could rewrite the previous call to Bug::new like this:
$nextbug = Bug->new($id, $type, $description);
There are other benefits to this notation when your class uses inheritance, so you should always call constructors and other class methods this way.

Method enacting

Apart from encapsulating the gory details of object creation within the class itself, using a class method like this to create objects has another big advantage. If you abide by the convention of only ever creating new Bug objects by calling Bug::new , you're guaranteed that all such objects will always be hashes. Of course, there's nothing to prevent us from "manually" blessing arrays, or scalars as Bug objects, but it turns out to make life much easier if you stick to blessing one type of object into each class.

For example, if you can be confident that any Bug object is going to be a blessed hash, you can (finally!) fill in the missing code in the Bug:: print_me method:

package Bug;
sub print_me
{
    my ($self) = @_;
    print "ID: $self->{id}\n";
    print "$self->{descr}\n";
    print "(Note: problem is fatal)\n" if $self->{type} eq "fatal";
}
Now, whenever the print_me method is called via a reference to any hash that's been blessed into the Bug class, the $self variable extracts the reference that was passed as the first argument and then the print statements access the various entries of the blessed hash.

Till death us do part...

Objects sometimes require special attention at the other end of their lifespan too. Most object-oriented languages provide the ability to specify a subroutine that is called automatically when an object ceases to exist. Such subroutines are usually called destructors , and are used to undo any side-effects caused by the previous existence of an object. That may include:

In Perl, you can set up a destructor for a class by defining a subroutine named DESTROY in the class's package. Any such subroutine is automatically called on an object of that class, just before that object's memory is reclaimed. Typically, this happens when the last variable holding a reference to the object goes out of scope, or has another value assigned to it.

For example, you could provide a destructor for the Bug class like this:

package Bug;
# other stuff as before

sub DESTROY
{
        my ($self) = @_;
        print "<< Squashed the bug: $self->{id} >>\n\n";
}

Now, every time an object of class Bug is about to cease to exist, that object will automatically have its DESTROY method called, which will print an epitaph for the object. For example, the following code:
package main;
use Bug;

open BUGDATA, "Bug.dat" or die "Couldn't find Bug data";

while (<BUGDATA>)
{
    my @data = split ',', $_;       
# extract comma-separated Bug data
    my $bug = Bug->new(@data);      
# create a new Bug object
    $bug->print_me();               
# print it out
} 

print "(end of list)\n";
prints out something like this:
ID: HW000761
"Cup holder" broken
Note: problem is fatal
<< Squashed the bug HW000761 >>

ID: SW000214
Word processor trashing disk after 20 saves.
<< Squashed the bug SW000214 >> 

ID: OS000633
Can't change background colour (blue) on blue screen of death.
<< Squashed the bug OS000633 >> 

(end of list)
That's because, at the end of each iteration of the while loop, the lexical variable $bug goes out of scope, taking with it the only reference to the Bug object created earlier in the same loop. That object's reference count immediately becomes zero and, because it was blessed, the corresponding DESTROY method (i.e. Bug::DESTROY ) is automatically called on the object.

Where to from here?

Of course, these fundamental techniques only scratch the surface of object-oriented programming in Perl. Simple hash-based classes with methods, constructors, and destructors may be enough to let you solve real problems in Perl, but there's a vast array of powerful and labor-saving techniques you can add to those basic components: autoloaded methods, class methods and class attributes, inheritance and multiple inheritance, polymorphism, multiple dispatch, enforced encapsulation, operator overloading, tied objects, genericity, and persistence.

Perl's standard documentation includes plenty of good material- perlref , perlreftut , perlobj , perltoot , perltootc , and perlbot to get you started. But if you're looking for a comprehensive tutorial on everything you need to know, you may also like to consider my new book, Object Oriented Perl , from which this article has been adapted.

[Jun 28, 2017] Whats Wrong with sort and How to Fix It by Tom Christiansen

Unicode pose some tricky problems... Perl 5.14 introduced unicode_strings feature Perl.com June 2011 Archives
Aug 31, 2011 | www.perl.com
By now, you may have read Considerations on Using Unicode Properly in Modern Perl Applications . Still think doing things correctly is easy? Tom Christiansen demonstrates that even sorting can be trickier than you think.

NOTE : The following is an excerpt from the draft manuscript of Programming Perl , 4ᵗʰ edition

Calling sort without a comparison function is quite often the wrong thing to do, even on plain text. That's because if you use a bare sort, you can get really strange results. It's not just Perl either: almost all programming languages work this way, even the shell command. You might be surprised to find that with this sort of nonsense sort, B comes before a not after it, comes before d, and ff comes after zz. There's no end to such silliness, either; see the default sort tables at the end of this article to see what I mean.

There are situations when a bare sort is appropriate, but fewer than you think. One scenario is when every string you're sorting contains nothing but the 26 lowercase (or uppercase, but not both) Latin letters from a-z, without any whitespace or punctuation.

Another occasion when a simple, unadorned sort is appropriate is when you have no other goal but to iterate in an order that is merely repeatable, even if that order should happen to be completely arbitrary. In other words, yes, it's garbage, but it's the same garbage this time as it was last time. That's because the default sort resorts to an unmediated cmp operator, which has the "predictable garbage" characteristics I just mentioned.

The last situation is much less frequent than the first two. It requires that the things you're sorting be special‐purpose, dedicated binary keys whose bit sequences have with excruciating care been arranged to sort in some prescribed fashion. This is also the strategy for any reasonable use of the cmp operator.

So what's wrong with sort anyway?

I know, I know. I can hear everyone saying, "But it's called sort , so how could that ever be wrong?" Sure it's called sort , but you still have to know how to use it to get useful results out. Probably the most surprising thing about sort is that it does not by default do an alphabetic, an alphanumeric, or a numeric sort. What it actually does is something else altogether, and that something else is of surprisingly limited usefulness.

Imagine you have an array of records. It does you virtually no good to write:

@sorted_recs = sort @recs;

Because Perl's cmp operator does only a bit comparison not an alphabetic one, it does nearly as little good to write your record sort this way:

@srecs = sort {
    $b->{AGE}      <=>  $b->{AGE}
                   ||
    $a->{SURNAME}  cmp  $b->{SURNAME}
} @recs;

The problem is that that cmp for the record's SURNAME field is not an alphabetic comparison. It's merely a code point comparison. That means it works like C's strcmp function or Java's String.compareTo method. Although commonly referred to as a "lexicographic" comparison, this is a gross misnomer: it's about as far away from the way real lexicographers sort dictionary entries as you can get without flipping a coin.

Fortunately, you don't have to come up with your own algorithm for dictionary sorting, because Perl provides a standard class to do this for you: Unicode::Collate . Don't let the name throw you, because while it was first invented for Unicode, it works great on regular ASCII text, too, and does a better job at making lexicographers happy than a plain old sort ever manages.

If you have code that purports to sort text that looks like this:

@sorted_lines = sort @lines;

Then all you have to get a dictionary sort is write instead:

use Unicode::Collate;
@sorted_lines = Unicode::Collate::->new->sort(@lines);

For structured records, like those with ages and surnames in them, you have to be a bit fancier. One way to fix it would be to use the class's own cmp operator instead of the built‐in one.

use Unicode::Collate;
my $collator = Unicode::Collate::->new();
@srecs = sort {
    $b->{AGE}  <=>  $b->{AGE}
          ||
    $collator->cmp( $a->{SURNAME}, $b->{SURNAME} )
} @recs;

However, that makes a fairly expensive method call for every possible comparison. Because Perl's adaptive merge sort algorithm usually runs in O(n log n) time given n items, and because each comparison requires two different computed keys, that can be a lot of duplicate effort. Our sorting class therefore provide a convenient getSortKey method that calculates a special binary key which you can cache and later pass to the normal cmp operator on your own. This trick lets you use cmp yet get a truly alphabetic sort out of it for a change.

Here is a simple but sufficient example of how to do that:

use Unicode::Collate;
my $collator = Unicode::Collate::->new();

# first calculate the magic sort key for each text field, and cache it
for my $rec (@recs) {
    $rec->{SURNAME_key} = $collator->getSortKey( $rec->{SURNAME} );
} 

# now sort the records as before, but for the surname field,
# use the cached sort key instead
@srecs = sort {
    $b->{AGE}          <=>  $b->{AGE}
                      ||
    $a->{SURNAME_key}  cmp  $b->{SURNAME_key}
} @recs;

That's what I meant about very carefully preparing a mediated sort key that contains the precomputed binary key.

English Card Catalogue Sorts

The simple code just demonstrated assumes you want to sort names the same way you do regular text. That isn't a good assumption, however. Many countries, languages, institutions, and sometimes even librarians have their own notions about how a card catalogue or a phonebook ought to be sorted.

For example, in the English language, surnames with Scottish patronymics starting with Mc or Mac, like MacKinley and McKinley , not only count as completely identical synonyms for sorting purposes, they go before any other surname that begins with M, and so precede surnames like Mables or Machado .

Yes, really.

That means that the following names are sorted correctly -- for English:

Lewis, C.S.
McKinley, Bill
MacKinley, Ron
Mables, Martha
Machado, Jos
Macon, Bacon

Yes, it's true. Check out your local large English‐language bookseller or library -- presuming you can find one. If you do, best make sure to blow the dust off first.

Sorting Spanish Names

It's a good thing those names follow English rules for sorting names. If this were Spanish, we would have to deal with double‐barrelled surnames, where the patronym sorts before the matronym, which in turn sorts before any given names. That means that if Seor Machado's full name were, like the poet's, Antonio Cipriano Jos Mara y Francisco de Santa Ana Machado y Ruiz , then you would have to sort him with the other Machados but then consider Ruiz before Antonio if there were any other Machados . Similarly, the poet Federico del Sagrado Corazn de Jess Garca Lorca sorts before the writer Gabriel Jos de la Concordia Garca Mrquez .

On the other hand, if your records are not full multifield hashes but only simple text that don't happen to be surnames, your task is a lot simpler, since now all you have to is get the cmp operator to behave sensibly. That you can do easily enough this way:

use Unicode::Collate;
@sorted_text = Unicode::Collate::->new->sort(@text);

Sorting Text, Not Binary

Imagine you had this list of German‐language authors:

@germans = qw{
    Bll
    Born
    Bhme
    Bodmer
    Brandis
    Bttcher
    Borchert
    Bobrowski
};

If you just sorted them with an unmediated sort operator, you would get this utter nonsense:

Bobrowski
Bodmer
Borchert
Born
Brandis
Brant
Bhme
Bll
Bttcher

Or maybe this equally nonsensical answer:

Bobrowski
Bodmer
Borchert
Born
Bll
Brandis
Brant
Bhme
Bttcher

Or even this still completely nonsensical answer:

Bobrowski
Bodmer
Borchert
Born
Bhme
Bll
Brandis
Brant
Bttcher

The crucial point to all that is that it's text not binary , so not only can you never judge what its bit patterns hold just by eyeballing it, more importantly, it has special rules to make it sort alphabetically (some might say sanely), an ordering no nave code‐point sort will never come even close to getting right, especially on Unicode.

The correct ordering is:

Bobrowski
Bodmer
Bhme
Bll
Borchert
Born
Bttcher
Brandis
Brant

And that is precisely what

use Unicode::Collate;
@sorted_germans = Unicode::Collate::->new->sort(@german_names);

gives you: a correctly sorted list of those Germans' names.

Sorting German Names

Hold on, though.

Correct in what language? In English, yes, the order given is now correct. But considering that these authors wrote in the German language, it is quite conceivable that you should be following the rules for ordering German names in German , not in English. That produces this ordering:

Bobrowski
Bodmer
Bhme
Bll
Bttcher
Borchert
Born
Brandis
Brant

How come Bttcher now came before Borchert ? Because Bttcher is supposed to be the same as Boettcher . In a German phonebook or other German list of German names, things like and oe are considered synonyms, which is not at all how it works in English. To get the German phonebook sort, you merely have to modify your constructor this way:

use Unicode::Collate::Locale;
@sorted_germans = Unicode::Collate::Locale::
                      ->new(locale => "de_phonebook")
                      ->sort(@german_names);

Isn't this fun?

Be glad you're not sorting names. Sorting names is hard.

Default Sort Tables

Here are most of the Latin letters, ordered using the default sort :

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j 
k l m n o p q r s t u v w x y z                     
                                    
        Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě 
Ĝ ĝ Ğ ğ Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ 
ŀ Ł ł Ń ń Ņ ņ Ň ň Ŋ ŋ Ō ō Ŏ ŏ Ő ő   Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş   Ţ ţ Ť 
ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ  Ź ź Ż ż   ſ ƀ Ɓ Ƃ ƃ Ƈ ƈ Ɖ Ɗ Ƌ 
ƌ ƍ Ǝ Ə Ɛ Ƒ  Ɠ Ɣ ƕ Ɩ Ɨ Ƙ ƙ ƚ ƛ Ɯ Ɲ ƞ Ƥ ƥ Ʀ ƫ Ƭ ƭ Ʈ Ư ư Ʊ Ʋ Ƴ ƴ Ƶ ƶ Ʒ Ƹ 
ƹ ƺ ƾ ƿ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ Ǡ ǡ Ǣ ǣ 
Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ ǰ DZ Dz dz Ǵ ǵ Ƿ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ 
ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ Ȑ ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ Ș ș Ț ț Ȝ ȝ Ȟ ȟ Ƞ ȡ Ȥ ȥ Ȧ ȧ Ȩ ȩ Ȫ ȫ Ȭ ȭ Ȯ 
ȯ Ȱ ȱ Ȳ ȳ ȴ ȵ ȶ ȷ Ⱥ Ȼ ȼ Ƚ Ⱦ ɐ ɑ ɒ ɓ ɕ ɖ ɗ ɘ ə ɚ ɛ ɜ ɝ ɞ ɟ ɠ ɡ ɢ ɣ ɤ ɥ ɦ 
ɧ ɨ ɩ ɪ ɫ ɬ ɭ ɮ ɯ ɰ ɱ ɲ ɳ ɴ ɶ ɹ ɺ ɻ ɼ ɽ ɾ ɿ ʀ ʁ ʂ ʃ ʄ ʅ ʆ ʇ ʈ ʉ ʊ ʋ ʌ ʍ 
ʎ ʏ ʐ ʑ ʒ ʓ ʙ ʚ ʛ ʜ ʝ ʞ ʟ ʠ ʣ ʤ ʥ ʦ ʧ ʨ ʩ ʪ ʫ ˡ ˢ ˣ ᴀ ᴁ ᴂ ᴃ ᴄ ᴅ ᴆ ᴇ ᴈ ᴉ 
ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ᴑ ᴓ ᴔ ᴘ ᴙ ᴚ ᴛ ᴜ ᴝ ᴞ ᴟ ᴠ ᴡ ᴢ ᴣ ᴬ ᴭ ᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ 
ᴻ ᴼ ᴾ ᴿ ᵀ ᵁ ᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵍ ᵎ ᵏ ᵐ ᵑ ᵒ ᵖ ᵗ ᵘ ᵙ ᵚ ᵛ ᵢ ᵣ ᵤ ᵥ ᵫ ᵬ ᵭ 
ᵮ ᵯ ᵰ ᵱ ᵲ ᵳ ᵴ ᵵ ᵶ Ḁ ḁ Ḃ ḃ Ḅ ḅ Ḇ ḇ Ḉ ḉ Ḋ ḋ Ḍ ḍ Ḏ ḏ Ḑ ḑ Ḓ ḓ Ḕ ḕ Ḗ ḗ Ḙ ḙ Ḛ 
ḛ Ḝ ḝ Ḟ ḟ Ḡ ḡ Ḣ ḣ Ḥ ḥ Ḧ ḧ Ḩ ḩ Ḫ ḫ Ḭ ḭ Ḯ ḯ Ḱ ḱ Ḳ ḳ Ḵ ḵ Ḷ ḷ Ḹ ḹ Ḻ ḻ Ḽ ḽ Ḿ 
ḿ Ṁ ṁ Ṃ ṃ Ṅ ṅ Ṇ ṇ Ṉ ṉ Ṋ ṋ Ṍ ṍ Ṏ ṏ Ṑ ṑ Ṓ ṓ Ṕ ṕ Ṗ ṗ Ṙ ṙ Ṛ ṛ Ṝ ṝ Ṟ ṟ Ṡ ṡ Ṣ 
ṣ Ṥ ṥ Ṧ ṧ Ṩ ṩ Ṫ ṫ Ṭ ṭ Ṯ ṯ Ṱ ṱ Ṳ ṳ Ṵ ṵ Ṷ ṷ Ṹ ṹ Ṻ ṻ Ṽ ṽ Ṿ ṿ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ẇ 
ẇ Ẉ ẉ Ẋ ẋ Ẍ ẍ Ẏ ẏ Ẑ ẑ Ẓ ẓ Ẕ ẕ ẖ ẗ ẘ ẙ ẚ ẛ ẞ ẟ Ạ ạ Ả ả Ấ ấ Ầ ầ Ẩ ẩ Ẫ ẫ Ậ 
ậ Ắ ắ Ằ ằ Ẳ ẳ Ẵ ẵ Ặ ặ Ẹ ẹ Ẻ ẻ Ẽ ẽ Ế ế Ề ề Ể ể Ễ ễ Ệ ệ Ỉ ỉ Ị ị Ọ ọ Ỏ ỏ Ố 
ố Ồ ồ Ổ ổ Ỗ ỗ Ộ ộ Ớ ớ Ờ ờ Ở ở Ỡ ỡ Ợ ợ Ụ ụ Ủ ủ Ứ ứ Ừ ừ Ử ử Ữ ữ Ự ự Ỳ ỳ Ỵ 
ỵ Ỷ ỷ Ỹ ỹ K Å Ⅎ ⅎ Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ Ⅼ Ⅽ Ⅾ Ⅿ ⅰ ⅱ ⅲ ⅳ ⅴ 
ⅵ ⅶ ⅷ ⅸ ⅹ ⅺ ⅻ ⅼ ⅽ ⅾ ⅿ ff fi fl ffi ffl ſt st A B C D E F G H I
J K L M N O P Q R S T U V W X Y Z a b c d e f g h i
j k l m n o p q r s t u v w x y z

As you can see, those letters are scattered all over the place. Sure, it's not completely random, but it's not useful either, because it is full of arbitrary placement that makes no alphabetical sense. That's because it is not an alphabetic sort at all. However, with the special kind of sort I've just shown you above, the ones that call the sort method from the Unicode::Collate class, you do get an alphabetic sort. Using that method, the Latin letters I just showed you now come out in alphabetical order, which is like this:

a a A A  ᵃ ᴬ     ă Ă ắ Ắ ằ Ằ ẵ Ẵ ẳ Ẳ   ấ Ấ ầ Ầ ẫ Ẫ ẩ Ẩ ǎ Ǎ   
Å ǻ Ǻ   ǟ Ǟ   ȧ Ȧ ǡ Ǡ ą Ą ā Ā ả Ả ȁ Ȁ ȃ Ȃ ạ Ạ ặ Ặ ậ Ậ ḁ Ḁ   ᴭ ǽ Ǽ 
ǣ Ǣ ẚ ᴀ Ⱥ ᴁ ᴂ ᵆ ɐ ᵄ ɑ ᵅ ɒ b b B B ᵇ ᴮ ḃ Ḃ ḅ Ḅ ḇ Ḇ ʙ ƀ ᴯ ᴃ ᵬ ɓ Ɓ ƃ Ƃ c 
c ⅽ C C Ⅽ ć Ć ĉ Ĉ č Č ċ Ċ   ḉ Ḉ ᴄ ȼ Ȼ ƈ Ƈ ɕ d d ⅾ D D Ⅾ ᵈ ᴰ ď Ď ḋ 
Ḋ ḑ Ḑ ḍ Ḍ ḓ Ḓ ḏ Ḏ đ Đ   dz ʣ Dz DZ dž Dž DŽ ʥ ʤ ᴅ ᴆ ᵭ ɖ Ɖ ɗ Ɗ ƌ Ƌ ȡ ẟ e e E 
E ᵉ ᴱ     ĕ Ĕ   ế Ế ề Ề ễ Ễ ể Ể ě Ě   ẽ Ẽ ė Ė ȩ Ȩ ḝ Ḝ ę Ę ē Ē ḗ 
Ḗ ḕ Ḕ ẻ Ẻ ȅ Ȅ ȇ Ȇ ẹ Ẹ ệ Ệ ḙ Ḙ ḛ Ḛ ᴇ ǝ Ǝ ᴲ ə Ə ᵊ ɛ Ɛ ᵋ ɘ ɚ ɜ ᴈ ᵌ ɝ ɞ ʚ ɤ 
f f F F ḟ Ḟ ff ffi ffl fi fl ʩ ᵮ  Ƒ ⅎ Ⅎ g g G G ᵍ ᴳ ǵ Ǵ ğ Ğ ĝ Ĝ ǧ Ǧ ġ Ġ ģ 
Ģ ḡ Ḡ ɡ ɢ ǥ Ǥ ɠ Ɠ ʛ ɣ Ɣ h h H H ᴴ ĥ Ĥ ȟ Ȟ ḧ Ḧ ḣ Ḣ ḩ Ḩ ḥ Ḥ ḫ Ḫ ẖ ħ Ħ ʜ 
ƕ ɦ ɧ i i ⅰ I I Ⅰ ᵢ ᴵ     ĭ Ĭ   ǐ Ǐ   ḯ Ḯ ĩ Ĩ İ į Į ī Ī ỉ Ỉ ȉ 
Ȉ ȋ Ȋ ị Ị ḭ Ḭ ⅱ Ⅱ ⅲ Ⅲ ij IJ ⅳ Ⅳ ⅸ Ⅸ ı ɪ ᴉ ᵎ ɨ Ɨ ɩ Ɩ j j J J ᴶ ĵ Ĵ ǰ ȷ ᴊ 
ʝ ɟ ʄ k k K K K ᵏ ᴷ ḱ Ḱ ǩ Ǩ ķ Ķ ḳ Ḳ ḵ Ḵ ᴋ ƙ Ƙ ʞ l l ⅼ L L Ⅼ ˡ ᴸ ĺ Ĺ 
ľ Ľ ļ Ļ ḷ Ḷ ḹ Ḹ ḽ Ḽ ḻ Ḻ ł Ł ŀ Ŀ lj Lj LJ ʪ ʫ ʟ ᴌ ƚ Ƚ ɫ ɬ ɭ ȴ ɮ ƛ ʎ m m ⅿ M 
M Ⅿ ᵐ ᴹ ḿ Ḿ ṁ Ṁ ṃ Ṃ ᴍ ᵯ ɱ n n N N ᴺ ń Ń ǹ Ǹ ň Ň   ṅ Ṅ ņ Ņ ṇ Ṇ ṋ Ṋ ṉ 
Ṉ nj Nj NJ ɴ ᴻ ᴎ ᵰ ɲ Ɲ ƞ Ƞ ɳ ȵ ŋ Ŋ ᵑ o o O O  ᵒ ᴼ     ŏ Ŏ   ố Ố ồ 
Ồ ỗ Ỗ ổ Ổ ǒ Ǒ   ȫ Ȫ ő Ő   ṍ Ṍ ṏ Ṏ ȭ Ȭ ȯ Ȯ ȱ Ȱ   ǿ Ǿ ǫ Ǫ ǭ Ǭ ō Ō ṓ 
Ṓ ṑ Ṑ ỏ Ỏ ȍ Ȍ ȏ Ȏ ớ Ớ ờ Ờ ỡ Ỡ ở Ở ợ Ợ ọ Ọ ộ Ộ   ᴏ ᴑ ɶ ᴔ ᴓ p p P P ᵖ 
ᴾ ṕ Ṕ ṗ Ṗ ᴘ ᵱ ƥ Ƥ q q Q Q ʠ ĸ r r R R ᵣ ᴿ ŕ Ŕ ř Ř ṙ Ṙ ŗ Ŗ ȑ Ȑ ȓ Ȓ ṛ 
Ṛ ṝ Ṝ ṟ Ṟ ʀ Ʀ ᴙ ᵲ ɹ ᴚ ɺ ɻ ɼ ɽ ɾ ᵳ ɿ ʁ s s S S ˢ ś Ś ṥ Ṥ ŝ Ŝ   ṧ Ṧ ṡ 
Ṡ ş Ş ṣ Ṣ ṩ Ṩ ș Ș ſ ẛ  ẞ st ſt ᵴ ʂ ʃ ʅ ʆ t t T T ᵗ ᵀ ť Ť ẗ ṫ Ṫ ţ Ţ ṭ Ṭ 
ț Ț ṱ Ṱ ṯ Ṯ ʨ ƾ ʦ ʧ ᴛ ŧ Ŧ Ⱦ ᵵ ƫ ƭ Ƭ ʈ Ʈ ȶ ʇ u u U U ᵘ ᵤ ᵁ     ŭ Ŭ 
  ǔ Ǔ ů Ů   ǘ Ǘ ǜ Ǜ ǚ Ǚ ǖ Ǖ ű Ű ũ Ũ ṹ Ṹ ų Ų ū Ū ṻ Ṻ ủ Ủ ȕ Ȕ ȗ Ȗ ư Ư 
ứ Ứ ừ Ừ ữ Ữ ử Ử ự Ự ụ Ụ ṳ Ṳ ṷ Ṷ ṵ Ṵ ᴜ ᴝ ᵙ ᴞ ᵫ ʉ ɥ ɯ Ɯ ᵚ ᴟ ɰ ʊ Ʊ v v ⅴ V 
V Ⅴ ᵛ ᵥ ṽ Ṽ ṿ Ṿ ⅵ Ⅵ ⅶ Ⅶ ⅷ Ⅷ ᴠ ʋ Ʋ ʌ w w W W ᵂ ẃ Ẃ ẁ Ẁ ŵ Ŵ ẘ ẅ Ẅ ẇ Ẇ ẉ 
Ẉ ᴡ ʍ x x ⅹ X X Ⅹ ˣ ẍ Ẍ ẋ Ẋ ⅺ Ⅺ ⅻ Ⅻ y y Y Y   ỳ Ỳ ŷ Ŷ ẙ   ỹ Ỹ ẏ 
Ẏ ȳ Ȳ ỷ Ỷ ỵ Ỵ ʏ ƴ Ƴ z z Z Z ź Ź ẑ Ẑ   ż Ż ẓ Ẓ ẕ Ẕ ƍ ᴢ ƶ Ƶ ᵶ ȥ Ȥ ʐ ʑ 
ʒ Ʒ ǯ Ǯ ᴣ ƹ Ƹ ƺ ʓ ȝ Ȝ   ƿ Ƿ

Isn't that much nicer?

Romani Ite Domum

In case you're wondering what that last row of distinctly un‐Roman Latin letters might possibly be, they're called respectively ezh ʒ, yogh ȝ, thorn , and wynn ƿ. They had to go somewhere, so they ended up getting stuck after z

Some are still used in certain non‐English (but still Latin) alphabets today, such as Icelandic, and even though you probably won't bump into them in contemporary English texts, you might see some if you're reading the original texts of famous medieval English poems like Beowulf , Sir Gawain and the Green Knight , or Brut .

The last of those, Brut , was written by a fellow named Laȝamon , a name whose third letter is a yogh. Famous though he was, I wouldn't suggest changing your name to Laȝamon in his honor, as I doubt the phone company would be amused.

[Jun 18, 2017] Making Perl Reusable with Modules

Notable quotes:
"... Figure 1. Creating the resultant of 5 and 3 ..."
"... Music-Resultant ..."
"... Music-Resultant ..."
"... Music-Resultant/lib/Music/Resultant.pm ..."
"... Resultant.pm ..."
"... Music-Resultant ..."
"... Music-Resultant/t/00-load.t ..."
"... Music-Resultant/ ..."
Jun 18, 2017 | www.perl.com
By Andy Sylvester on August 7, 2007 12:00 AM
Perl software development can occur at several levels. When first developing the idea for an application, a Perl developer may start with a short program to flesh out the necessary algorithms. After that, the next step might be to create a package to support object-oriented development. The final work is often to create a Perl module for the package to make the logic available to all parts of the application. Andy Sylvester explores this topic with a simple mathematical function. Creating a Perl Subroutine

I am working on ideas for implementing some mathematical concepts for a method of composing music. The ideas come from the work of Joseph Schillinger . At the heart of the method is being able to generate patterns using mathematical operations and using those patterns in music composition. One of the basic operations described by Schillinger is creating a "resultant," or series of numbers, based on two integers (or "generators"). Figure 1 shows a diagram of how to create the resultant of the integers 5 and 3.

creating the resultant of 5 and 3
Figure 1. Creating the resultant of 5 and 3

Figure 1 shows two line patterns with units of 5 and units of 3. The lines continue until both lines come down (or "close") at the same time. The length of each line corresponds to the product of the two generators (5 x 3 = 15). If you draw dotted lines down from where each of the two generator lines change state, you can create a third line that changes state at each of the dotted line points. The lengths of the segments of the third line make up the resultant of the integers 5 and 3 (3, 2, 1, 3, 1, 2, 3).

Schillinger used graph paper to create resultants in his System of Musical Composition . However, another convenient way of creating a resultant is to calculate the modulus of a counter and then calculate a term in the resultant series based on the state of the counter. An algorithm to create the terms in a resultant might resemble:

Read generators from command line
Determine total number of counts for resultant
   (major_generator * minor_generator)
Initialize resultant counter = 0
For MyCounts from 1 to the total number of counts
   Get the modulus of MyCounts to the major and minor generators
   Increment the resultant counter
   If either modulus = 0
     Save the resultant counter to the resultant array
     Re-initialize resultant counter = 0
   End if
End for

From this design, I wrote a short program using the Perl modulus operator ( % ):

#!/usr/bin/perl
#*******************************************************
#
# FILENAME: result01.pl
#
# USAGE: perl result01.pl major_generator minor_generator
#
# DESCRIPTION:
#    This Perl script will generate a Schillinger resultant
#    based on two integers for the major generator and minor
#    generator.
#
#    In normal usage, the user will input the two integers
#    via the command line. The sequence of numbers representing
#    the resultant will be sent to standard output (the console
#    window).
#
# INPUTS:
#    major_generator - First generator for the resultant, input
#                      as the first calling argument on the
#                      command line.
#
#    minor_generator - Second generator for the resultant, input
#                      as the second calling argument on the
#                      command line.
#
# OUTPUTS:
#    resultant - Sequence of numbers written to the console window
#
#**************************************************************

   use strict;
   use warnings;

   my $major_generator = $ARGV[0];
   my $minor_generator = $ARGV[1];

   my $total_counts   = $major_generator * $minor_generator;
   my $result_counter = 0;
   my $major_mod      = 0;
   my $minor_mod      = 0;
   my $i              = 0;
   my $j              = 0;
   my @resultant;

   print "Generator Total = $total_counts\n";

   while ($i < $total_counts) {
       $i++;
       $result_counter++;
       $major_mod = $i % $major_generator;
       $minor_mod = $i % $minor_generator;
       if (($major_mod == 0) || ($minor_mod == 0)) {
          push(@resultant, $result_counter);
          $result_counter = 0;
       }
       print "$i \n";
       print "Modulus of $major_generator is $major_mod \n";
       print "Modulus of $minor_generator is $minor_mod \n";
   }

   print "\n";
   print "The resultant is @resultant \n";

Run the program with 5 and 3 as the inputs ( perl result01.pl 5 3 ):

Generator Total = 15
1
Modulus of 5 is 1
Modulus of 3 is 1
2
Modulus of 5 is 2
Modulus of 3 is 2
3
Modulus of 5 is 3
Modulus of 3 is 0
4
Modulus of 5 is 4
Modulus of 3 is 1
5
Modulus of 5 is 0
Modulus of 3 is 2
6
Modulus of 5 is 1
Modulus of 3 is 0
7
Modulus of 5 is 2
Modulus of 3 is 1
8
Modulus of 5 is 3
Modulus of 3 is 2
9
Modulus of 5 is 4
Modulus of 3 is 0
10
Modulus of 5 is 0
Modulus of 3 is 1
11
Modulus of 5 is 1
Modulus of 3 is 2
12
Modulus of 5 is 2
Modulus of 3 is 0
13
Modulus of 5 is 3
Modulus of 3 is 1
14
Modulus of 5 is 4
Modulus of 3 is 2
15
Modulus of 5 is 0
Modulus of 3 is 0

The resultant is 3 2 1 3 1 2 3

This result matches the resultant terms as shown in the graph in Figure 1, so it looks like the program generates the correct output.

Creating a Perl Package from a Program

With a working program, you can create a Perl package as a step toward being able to reuse code in a larger application. The initial program has two pieces of input data (the major generator and the minor generator). The single output is the list of numbers that make up the resultant. These three pieces of data could be combined in an object. The program could easily become a subroutine to generate the terms in the resultant. This could be a method in the class contained in the package. Creating a class implies adding a constructor method to create a new object. Finally, there should be some methods to get the major generator and minor generator from the object to use in generating the resultant (see the perlboot and perltoot tutorials for background on object-oriented programming in Perl).

From these requirements, the resulting package might be:

#!/usr/bin/perl
#*******************************************************
#
# Filename: result01a.pl
#
# Description:
#    This Perl script creates a class for a Schillinger resultant
#    based on two integers for the major generator and the
#    minor generator.
#
# Class Name: Resultant
#
# Synopsis:
#
# use Resultant;
#
# Class Methods:
#
#   $seq1 = Resultant ->new(5, 3)
#
#      Creates a new object with a major generator of 5 and
#      a minor generator of 3. These parameters need to be
#      initialized when a new object is created, as there
#      are no methods to set these elements within the object.
#
#   $seq1->generate()
#
#      Generates a resultant and saves it in the ResultList array
#
# Object Data Methods:
#
#   $major_generator = $seq1->get_major()
#
#      Returns the major generator
#
#   $minor_generator = $seq1->get_minor()
#
#      Returns the minor generator
#
#
#**************************************************************

{ package Resultant;
  use strict;
  sub new {
    my $class           = shift;
    my $major_generator = shift;
    my $minor_generator = shift;

    my $self = {Major => $major_generator,
                Minor => $minor_generator,
                ResultantList => []};

    bless $self, $class;
    return $self;
  }

  sub get_major {
    my $self = shift;
    return $self->{Major};
  }

  sub get_minor {
    my $self = shift;
    return $self->{Minor};
  }

  sub generate {
    my $self         = shift;
    my $total_counts = $self->get_major * $self->get_minor;
    my $i            = 0;
    my $major_mod;
    my $minor_mod;
    my @result;
    my $result_counter = 0;

   while ($i < $total_counts) {
       $i++;
       $result_counter++;
       $major_mod = $i % $self->get_major;
       $minor_mod = $i % $self->get_minor;

       if (($major_mod == 0) || ($minor_mod == 0)) {
          push(@result, $result_counter);
          $result_counter = 0;
       }
   }

   @{$self->{ResultList}} = @result;
  }
}

#
# Test code to check out class methods
#

# Counter declaration
my $j;

# Create new object and initialize major and minor generators
my $seq1 = Resultant->new(5, 3);

# Print major and minor generators
print "The major generator is ", $seq1->get_major(), "\n";
print "The minor generator is ", $seq1->get_minor(), "\n";

# Generate a resultant
$seq1->generate();

# Print the resultant
print "The resultant is ";
foreach $j (@{$seq1->{ResultList}}) {
  print "$j ";
}
print "\n";

Execute the file ( perl result01a.pl ):

The major generator is 5
The minor generator is 3
The resultant is 3 2 1 3 1 2 3

This output text shows the same resultant terms as produced by the first program.

Creating a Perl Module

From a package, you can create a Perl module to make the package fully reusable in an application. Also, you can modify our original test code into a series of module tests to show that the module works the same as the standalone package and the original program.

I like to use the Perl module Module::Starter to create a skeleton module for the package code. To start, install the Module::Starter module and its associated modules from CPAN, using the Perl Package Manager, or some other package manager. To see if you already have the Module::Starter module installed, type perldoc Module::Starter in a terminal window. If the man page does not appear, you probably do not have the module installed.

Select a working directory to create the module directory. This can be the same directory that you have been using to develop your Perl program. Type the following command (though with your own name and email address):

$ 
module-starter --module=Music::Resultant --author="John Doe" \
    --email=john@johndoe.com

Perl should respond with:

Created starter directories and files

In the working directory, you should see a folder or directory called Music-Resultant . Change your current directory to Music-Resultant , then type the commands:

$ 
perl Makefile.PL

$ 
make

These commands will create the full directory structure for the module. Now paste the text from the package into the module template at Music-Resultant/lib/Music/Resultant.pm . Open Resultant.pm in a text editor and paste the subroutines from the package after the lines:

=head1 FUNCTIONS

=head2 function1

=cut

When you paste the package source code, remove the opening brace from the package, so that the first lines appear as:

 package Resultant;
  sub new {
    use strict;
    my $class = shift;

and the last lines of the source appears without the the final closing brace as:

   @{$self->{ResultList}} = @result;
  }

After making the above changes, save Resultant.pm . This is all that you need to do to create a module for your own use. If you eventually release your module to the Perl community or upload it to CPAN , you should do some more work to prepare the module and its documentation (see the perlmod and perlmodlib documentation for more information).

After modifying Resultant.pm , you need to install the module to make it available for other Perl applications. To avoid configuration issues, install the module in your home directory, separate from your main Perl installation.

  1. In your home directory, create a lib/ directory, then create a perl/ directory within the lib/ directory. The result should resemble:
    /home/myname/lib/perl
    
    
  2. Go to your module directory ( Music-Resultant ) and re-run the build process with a directory path to tell Perl where to install the module:
    $ 
    perl Makefile.PL LIB=/home/myname/lib/perl
     $
    make install
    
    

    Once this is complete, the module will be installed in the directory.

The final step in module development is to add tests to the .t file templates created in the module directory. The Perl distribution includes several built-in test modules, such as Test::Simple and Test::More to help test Perl subroutines and modules.

To test the module, open the file Music-Resultant/t/00-load.t . The initial text in this file is:

#!perl -T

use Test::More tests => 1;

BEGIN {
    use_ok( 'Music::Resultant' );
}

diag( "Testing Music::Resultant $Music::Resultant::VERSION, Perl $], $^X" );

You can run this test file from the t/ directory using the command:

perl -I/home/myname/lib/perl -T 00-load.t

The -I switch tells the Perl interpreter to look for the module Resultant.pm in your alternate installation directory. The directory path must immediately follow the -I switch, or Perl may not search your alternate directory for your module. The -T switch is necessary because there is a -T switch in the first line of the test script, which turns on taint checking. (Taint checking only works when enabled at Perl startup; perl will exit with an error if you try to enable it later.) Your results should resemble the following(your Perl version may be different).

1..1
ok 1 - use Music::Resultant;
# Testing Music::Resultant 0.01, Perl 5.008006, perl

The test code from the second listing is easy to convert to the format used by Test::More . Change the number at the end of the tests line from 1 to 4, as you will be adding three more tests to this file. The template file has an initial test to show that the module exists. Next, add tests after the BEGIN block in the file:

# Test 2:
my $seq1 = Resultant->new(5, 3);  # create an object
isa_ok ($seq1, Resultant);        # check object definition

# Test 3: check major generator
my $local_major_generator = $seq1->get_major();
is ($local_major_generator, 5, 'major generator is correct' );

# Test 4: check minor generator
my $local_minor_generator = $seq1->get_minor();
is ($local_minor_generator, 3, 'minor generator is correct' );

To run the tests, retype the earlier command line in the Music-Resultant/ directory:

$ 
perl -I/home/myname/lib/perl -T t/00-load.t

You should see the results:

1..4
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
# Testing Music::Resultant 0.01, Perl 5.008006, perl

These tests create a Resultant object with a major generator of 5 and a minor generator of 3 (Test 2), and check to see that the major generator in the object is correct (Test 3), and that the minor generator is correct (Test 4). They do not cover the resultant terms. One way to check the resultant is to add the test code used in the second listing to the .t file:

# Generate a resultant
$seq1->generate();

# Print the resultant
my $j;
print "The resultant is ";
foreach $j (@{$seq1->{ResultList}}) {
  print "$j ";
}
print "\n";

You should get the following results:

1..4
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
The resultant is 3 2 1 3 1 2 3
# Testing Music::Resultant 0.01, Perl 5.008006, perl

That's not valid test output, so it needs a little bit of manipulation. To check the elements of a list using a testing function, install the Test::Differences module and its associated modules from CPAN, using the Perl Package Manager, or some other package manager. To see if you already have the Test::Differences module installed, type perldoc Test::Differences in a terminal window. If the man page does not appear, you probably do not have the module installed.

Once that module is part of your Perl installation, change the number of tests from 4 to 5 on the Test::More statement line and add a following statement after the use Test::More statement:

use Test::Differences;

Finally, replace the code that prints the resultant with:

# Test 5: (uses Test::Differences and associated modules)
$seq1->generate();
my @result   = @{$seq1->{ResultList}};
my @expected = (3, 2, 1, 3, 1, 2, 3);
eq_or_diff \@result, \@expected, "resultant terms are correct";

Now when the test file runs, you can confirm that the resultant is correct:

1..5
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
ok 5 - resultant terms are correct
# Testing Music::Resultant 0.01, Perl 5.008006, perl

Summary

There are multiple levels of Perl software development. Once you start to create modules to enable reuse of your Perl code, you will be able to leverage your effort into larger applications. By using Perl testing modules, you can ensure that your code works the way you expect and provide a way to ensure that the modules continue to work as you add more features.

Resources

Here are some other good resources on creating Perl modules:

Here are some good resources for using Perl testing modules like Test::Simple and Test::More :

[May 28, 2017] ELIZA - Wikipedia

Perl CPAN Module Chatbot::Eliza

[May 16, 2017] Perl - regex - Position of first nonmatching character - Stack Overflow

May 16, 2017 | stackoverflow.com

I think thats exactly what the pos function is for. NOTE: pos only works if you use the /g flag


my
 $x 
=
'abcdefghijklmnopqrstuvwxyz'
;
my
 $end 
=
0
;
if
(
 $x 
=~
/
$ARGV
[
0
]/
g 
)
{

    $end 
=
 pos
(
$x
);
}
print
"End of match is: $end\n"
;

Gives the following output


[
@centos5
~]
$ perl x
.
pl
End
 of match is
:
0
[
@centos5
~]
$ perl x
.
pl def
End
 of match is
:
6
[
@centos5
~]
$ perl x
.
pl xyz
End
 of match is
:
26
[
@centos5
~]
$ perl x
.
pl aaa
End
 of match is
:
0
[
@centos5
~]
$ perl x
.
pl ghi
End
 of match is
:
9

No, it only works when a match was successful. tripleee Oct 10 '11 at 15:24

Sorry, I misread the question. The actaul question is very tricky, especially if the regex is more complicated than just /gho/ , especially if it contains [ or ( . Should I delete my irrelevant answer? Sodved Oct 10 '11 at 15:27

I liked the possibility to see an example of how pos works, as I didn't know about it before - so now I can understand why it also doesn't apply to the question; so thanks for this answer! :) sdaau Jun 8 '12 at 18:26

[May 16, 2017] F>unction pos() and finding the postion at which the pattern matched the string

May 16, 2017 | perldoc.perl.org
function. For example,
  1. $x = "cat dog house" ; # 3 words
  2. while ( $x =~ /(\w+)/g ) {
  3. print "Word is $1, ends at position " , pos $x , "\n" ;
  4. }

prints

  1. Word is cat, ends at position 3
  2. Word is dog, ends at position 7
  3. Word is house, ends at position 13

A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the //c , as in /regex/gc .

In Perl, how do you find the position of a match in a string, if forced to use a foreach loop pos - Stack Overflow

favorite 2 I have to find all the positions of matching strings within a larger string using a while loop, and as a second method using a foreach loop. I have figured out the while loop method, but I am stuck on a foreach method. Here is the 'while' method:

....


my
 $sequence 
=
'AACAAATTGAAACAATAAACAGAAACAAAAATGGATGCGATCAAGAAAAAGATGC'
.
'AGGCGATGAAAATCGAGAAGGATAACGCTCTCGATCGAGCCGATGCCGCGGAAGA'
.
'AAAAGTACGTCAAATGACGGAAAAGTTGGAACGAATCGAGGAAGAACTACGTGAT'
.
'ACCCAGAAAAAGATGATGCNAACTGAAAATGATTTAGATAAAGCACAGGAAGATT'
.
'TATCTGTTGCAAATACCAACTTGGAAGATAAGGAAAAGAAAGTTCAAGAGGCGGA'
.
'GGCTGAGGTAGCANCCCTGAATCGTCGTATGACACTTCTGGAAGAGGAATTGGAA'
.
'CGAGCTGAGGAACGTTTGAAGATTGCAACGGATAAATTGGAAGAAGCAACACATA'
.
'CAGCTGATGAATCTGAACGTGTTCGCNAGGTTATGGAAA'
;
my
 $string 
=
<
STDIN
>;

chomp $string
;
while
(
$sequence 
=~
/
$string
/
gi 
)
{

 printf 
"Sequence found at position: %d\n"
,
 pos
(
$sequence
)-
 length
(
$string
);
}

Here is my foreach method:


foreach
(
$sequence 
=~
/
$string
/
gi 
)
 

 printf 
"Sequence found at position: %d\n"
,
 pos
(
$sequence
)
-
 length
(
$string
);
}

Could someone please give me a clue on why it doesn't work the same way? Thanks!

My Output if I input "aaca":


Part
1
 using a 
while
 loop
Sequence
 found at position
:
0
Sequence
 found at position
:
10
Sequence
 found at position
:
17
Sequence
 found at position
:
23
Sequence
 found at position
:
377
Part
2
 using a 
foreach
 loop
Sequence
 found at position
:
-
4
Sequence
 found at position
:
-
4
Sequence
 found at position
:
-
4
Sequence
 found at position
:
-
4
Sequence
 found at position
:
-
4

perl foreach
share | improve this question asked Jan 31 '11 at 21:38 user83598 66 3 9
2
Using raw input $string in a regexp will act weird if somebody types in special characters (accidentally or maliciously). Consider using /\Q$string/gi to avoid treating $string as a regexp. aschepler Jan 31 '11 at 22:15
add a comment | 2 Answers 2 active oldest votes
up vote 9 down vote accepted Your problem here is context. In the while loop, the condition is in scalar context. In scalar context, the match operator in g mode will sequentially match along the string. Thus checking pos within the loop does what you want.

In the foreach loop, the condition is in list context. In list context, the match operator in g mode will return a list of all matches (and it will calculate all of the matches before the loop body is ever entered). foreach is then loading the matches one by one into $_ for you, but you are never using the variable. pos in the body of the loop is not useful as it contains the result after the matches have ended.

The takeaway here is that if you want pos to work, and you are using the g modifier, you should use the while loop which imposes scalar context and makes the regex iterate across the matches in the string.

Sinan inspired me to write a few foreach examples:

  • This one is fairly succinct using split in separator retention mode:
    
    my
     $pos 
    =
    0
    ;
    foreach
    (
    split 
    /(
    $string
    )/
    i 
    =>
     $sequence
    )
    {
    print
    "Sequence found at position: $pos\n"
    if
     lc eq lc $string
    ;
    
        $pos 
    +=
     length
    ;
    }
    
    
  • A regex equivalent of the split solution:
    
    my
     $pos 
    =
    0
    ;
    foreach
    (
    $sequence 
    =~
    /(
    \Q$string\E
    |(?:(?!
    \Q$string\E
    ).)+)/
    gi
    )
    {
    print
    "Sequence found at position: $pos\n"
    if
     lc eq lc $string
    ;
    
        $pos 
    +=
     length
    ;
    }
    
    
  • But this is clearly the best solution for your problem:
    
    {
    package
    Dumb
    ::
    Homework
    ;
    sub
     TIEARRAY 
    {
    
            bless 
    {
    
                haystack 
    =>
     $_
    [
    1
    ],
    
                needle   
    =>
     $_
    [
    2
    ],
    
                size     
    =>
    2
    **
    31
    -
    1
    ,
    
                pos      
    =>
    [],
    }
    }
    sub
     FETCH 
    {
    my
    (
    $self
    ,
     $index
    )
    =
    @_
    ;
    my
    (
    $pos
    ,
     $needle
    )
    =
    @$self
    {
    qw
    (
    pos needle
    )};
    
    
            return $$pos
    [
    $index
    ]
    if
     $index 
    <
    @$pos
    ;
    while
    (
    $index 
    +
    1
    >=
    @$pos
    )
    {
    unless
    (
    $$self
    {
    haystack
    }
    =~
    /
    \Q$needle
    /
    gi
    )
    {
    
                    $$self
    {
    size
    }
    =
    @$pos
    ;
    last
    }
    
                push 
    @$pos
    ,
     pos 
    (
    $$self
    {
    haystack
    })
    -
     length $needle
    ;
    }
    
            $$pos
    [
    $index
    ]
    }
    sub
     FETCHSIZE 
    {
    $_
    [
    0
    ]{
    size
    }}
    }
    
    
    tie 
    my
    @pos
    ,
    'Dumb::Homework'
    =>
     $sequence
    ,
     $string
    ;
    print
    "Sequence found at position: $_\n"
    foreach
    @pos
    ;
    # look how clean it is
    
    

    The reason its the best is because the other two solutions have to process the entire global match first, before you ever see a result. For large inputs (like DNA) that could be a problem. The Dumb::Homework package implements an array that will lazily find the next position each time the foreach iterator asks for it. It will even store the positions so you can get to them again without reprocessing. (In truth it looks one match past the requested match, this allows it to end properly in the foreach , but still much better than processing the whole list)

  • Actually, the best solution is still to not use foreach as it is not the correct tool for the job.

[May 16, 2017] Perl - positions of regex match in string - Stack Overflow

May 16, 2017 | stackoverflow.com
Perl - positions of regex match in string Ask Question

if
(
my
@matches
=
 $input_string 
=~
/
$metadata
[
$_
]{
"pattern"
}/
g
)
{
print
 $
-[
1
]
.
"\n"
;
# this gives me error uninitialized ...
}

print scalar @matches; gaves me 4, that is ok, but if i use $-[1] to get start of first match, it gaves me error. Where is problem?

EDIT1: How i can get positions of each match in string? If i have string "ahoj ahoj ahoj" and regexp /ahoj/g, how i can get positions of start and end of each "ahoj" in string? perl regex

share | improve this question edited Feb 22 '13 at 20:40 asked Feb 22 '13 at 20:27 Krab 2,643 21 48
What error does it give you? user554546 Feb 22 '13 at 20:29
$-[1] is the position of the 1st subpattern (something in parentheses within the regular expression). You're probably looking for $-[0] , the position of the whole pattern? Scott Lamb Feb 22 '13 at 20:32
scott lamb: no i was thinking if i have string "ahoj ahoj ahoj", then i can get position 0, 5, 10 etc inside $-[n], if regex is /ahoj/g Krab Feb 22 '13 at 20:34
add a comment | 1 Answer 1 active oldest votes
up vote 8 down vote accepted The array @- contains the offset of the start of the last successful match (in $-[0] ) and the offset of any captures there may have been in that match (in $-[1] , $-[2] etc.).

There are no captures in your string, so only $-[0] is valid, and (in your case) the last successful match is the fourth one, so it will contain the offset of the fourth instance of the pattern.

The way to get the offsets of individual matches is to write


my
@matches
;
while
(
"ahoj ahoj ahoj"
=~
/(
ahoj
)/
g
)
{

  push 
@matches
,
 $1
;
print
 $
-[
0
],
"\n"
;
}

output


0
5
10

Or if you don't want the individual matched strings, then


my
@matches
;

push 
@matches
,
 $
-[
0
]
while
"ahoj ahoj ahoj"
=~
/
ahoj
/
g
;

[May 07, 2017] Example Code from Beginning Perl for Bioinformatics

While example are genome sequencing specific most code is good illustiontion of string processing in Perl and as such has a wider appeal. See also molecularevolution.org
May 07, 2017 | uwf.edu

This page contains an uncompressed copy of example code from your course text, downloaded on January 15, 2003. Please see the official Beginning Perl for Bioinformatics Website under the heading " Examples and Exercises " for any updates to this code.


General files
Chapter 4 NOTE: Examples 4-5 to 4-7 also require the protein sequence data file: NM_021964fragment.pep.txt - To match the example in your book, save the file out with the name: NM_021964fragment.pep

Chapter 5

NOTE: Example 5-3 also requires the protein sequence data file: NM_021964fragment.pep.txt - To match the example in your book, save the file out with the name: NM_021964fragment.pep
NOTE: Example 5-4, 5-6 and 5-7 also require the DNA file: small.dna.txt - To match the example in your book, save the file out with the name: small.dna

Chapter 6

NOTE: BeginPerlBioinfo.pm may be needed to execute some code examples from this chapter. Place this file in the same directory as your .pl files.

Chapter 7

NOTE: BeginPerlBioinfo.pm may be needed to execute some code examples from this chapter. Place this file in the same directory as your .pl files.

Chapter 8

NOTE: BeginPerlBioinfo.pm may be needed to execute some code examples from this chapter. Place this file in the same directory as your .pl files.

NOTE: Example 8-2,8-3 and 8-4 also require the DNA file: sample.dna.txt - To match the example in your book, save the file out with the name: sample.dna

Chapter 9

NOTE: BeginPerlBioinfo.pm may be needed to execute some code examples from this chapter. Place this file in the same directory as your .pl files.

[May 07, 2017] Why is Perl used so extensively in biology research

Jan 15, 2016 | stackoverflow.com

Lincoln Stein highlighted some of the saving graces of Perl for bioinformatics in his article: How Perl Saved the Human Genome Project .

From his analysis:

I think several factors are responsible:
  1. Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. Even DNA sequences are textlike. Interconverting incompatible data formats is a matter of text mangling combined with some creative guesswork. Perl's powerful regular expression matching and string manipulation operators simplify this job in a way that isn't equalled by any other modern language.
  2. Perl is forgiving. Biological data is often incomplete, fields can be missing, or a field that is expected to be present once occurs several times (because, for example, an experiment was run in duplicate), or the data was entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to pick up and correct a variety of common errors in data entry. Of course this flexibility can be also be a curse. I talk more about the problems with Perl below.
  3. Perl is component-oriented. Perl encourages people to write their software in small modules, either using Perl library modules or with the classic Unix tool-oriented approach. External programs can easily be incorporated into a Perl script using a pipe, system call or socket. The dynamic loader introduced with Perl5 allows people to extend the Perl language with C routines or to make entire compiled libraries available for the Perl interpreter. An effort is currently under way to gather all the world's collected wisdom about biological data into a set of modules called "bioPerl" (discussed at length in an article to be published later in the Perl Journal).
  4. Perl is easy to write and fast to develop in. The interpreter doesn't require you to declare all your function prototypes and data types in advance, new variables spring into existence as needed, calls to undefined functions only cause an error when the function is needed. The debugger works well with Emacs and allows a comfortable interactive style of development.
  5. Perl is a good prototyping language. Because Perl is quick and dirty, it often makes sense to prototype new algorithms in Perl before moving them to a fast compiled language. Sometimes it turns out that Perl is fast enough so that of the algorithm doesn't have to be ported; more frequently one can write a small core of the algorithm in C, compile it as a dynamically loaded module or external executable, and leave the rest of the application in Perl (for an example of a complex genome mapping application implemented in this way, see http://waldo.wi.mit.edu/ftp/distribution/software/rhmapper/ ).
  6. Perl is a good language for Web CGI scripting, and is growing in importance as more labs turn to the Web for publishing their data.

I use lots of Perl for dealing with qualitative and quantitative data in social science research. In terms of getting things done (largely with text) quickly, finding libraries on CPAN (nice central location), and generally just getting things done quickly, it can't be surpassed.

Perl is also excellent glue, so if you have some instrumental records, and you need to glue them to data analysis routines, then Perl is your language. Perl is very powerful when it comes to deal with text and it's present in almost every Linux/Unix distribution. In bioinformatics, not only are sequence data very easy to manipulate with Perl, but also most of the bionformatics algorithms will output some kind of text results.

Then, the biggest bioinformatics centers like the EBI had that great guy, Ewan Birney, who was leading the BioPerl project. That library has lots of parsers for every kind of popular bioinformatics algorithms' results, and for manipulating the different sequence formats used in major sequence databases.

Nowadays, however, Perl is not the only language used by bioinformaticians: along with sequence data, labs produce more and more different kinds of data types and other languages are more often used in those areas.

The R statistics programming language for example, is widely used for statistical analysis of microarray and qPCR data (among others). Again, why are we using it so much? Because it has great libraries for that kind of data (see bioconductor project).

Now when it comes to web development, CGI is not really state of the art today, but people who know Perl may stick to it. In my company though it is no longer used...

I hope this helps.

Bioinformatics deals primarily in text parsing and Perl is the best programming language for the job as it is made for string parsing. As the O'Reilly book (Beginning Perl for Bioinformatics) says that "With [Perl]s highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis." This seems to be a pretty comprehensive response. Perhaps one thing missing, however, is that most biologists (until recently, perhaps) don't have much programming experience at all. The learning curve for Perl is much lower than for compiled languages (like C or Java), and yet Perl still provides a ton of features when it comes to text processing. So what if it takes longer to run? Biologists can definitely handle that. Lab experiments routinely take one hour or more finish, so waiting a few extra minutes for that data processing to finish isn't going to kill them!

Just note that I am talking here about biologists that program out of necessity. I understand that there are some very skilled programmers and computer scientists out there that use Perl as well, and these comments may not apply to them.

===

People missed out DBI , the Perl abstract database interface that makes it really easy to work with bioinformatic databases.

There is also the one-liner angle. You can write something to reformat data in a single line in Perl and just use the -pe flag to embed that at the command line. Many people using AWK and sed moved to Perl. Even in full programs, file I/O is incredibly easy and quick to write, and text transformation is expressive at a high level compared to any engineering language around. People who use Java or even Python for one-off text transformation are just too lazy to learn another language. Java especially has a high dependence on the JVM implementation and its I/O performance.

At least you know how fast or slow Perl will be everywhere, slightly slower than C I/O. Don't learn grep , cut , sed , or AWK ; just learn Perl as your command line tool, even if you don't produce large programs with it. Regarding CGI, Perl has plenty of better web frameworks such as Catalyst and Mojolicious , but the mindshare definitely came from CGI and bioinformatics being one of the earliest heavy users of the Internet.

===

Perl is very easy to learn as compared to other languages. It can fully exploit the biological data which is becoming the big data. It can manipulate big data and perform good for manipulation data curation and all type of DNA programming, automation of biology has become easy due languages like Perl, Python and Ruby . It is very easy for those who are knowing biology, but not knowing how to program that in other programming languages.

Personally, and I know this will date me, but it's because I learned Perl first. I was being asked to take FASTA files and mix with other FASTA files. Perl was the recommended tool when I asked around.

At the time I'd been through a few computer science classes, but I didn't really know programming all that well.

Perl proved fairly easy to learn. Once I'd gotten regular expressions into my head I was parsing and making new FASTA files within a day.

As has been suggested, I was not a programmer. I was a biochemistry graduate working in a lab, and I'd made the mistake of setting up a Linux server where everyone could see me. This was back in the day when that was an all-day project.

Anyway, Perl became my goto for anything I needed to do around the lab. It was awesome, easy to use, super flexible, other Perl guys in other labs we're a lot like me.

So, to cut it short, Perl is easy to learn, flexible and forgiving, and it did what I needed.

Once I really got into bioinformatics I picked up R, Python, and even Java. Perl is not that great at helping to create maintainable code, mostly because it is so flexible. Now I just use the language for the job, but Perl is still one of my favorite languages, like a first kiss or something.

To reiterate, most bioinformatics folks learned coding by just kluging stuff together, and most of the time you're just trying to get an answer for the principal investigator (PI), so you can't spend days on code design. Perl is superb at just getting an answer, it probably won't work a second time, and you will not understand anything in your own code if you see it six months later; BUT if you need something now, then it is a good choice even though I mostly use Python now.

I hope that gives you an answer from someone who lived it.

[May 07, 2017] A useful capability of Perl substr function

Perl subst function can used as pseudo function on the left side of assignment, That allow to insert a substring into arbitrary point of the string

For example, the code fragment:

$test_string='<cite>xxx<blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>';
print "Before: $test_string\n"; 
substr($test_string,length('<cite>xxx'),0)='</cite>';
print "After: $test_string\n"; 
will print
Before: <cite>xxx<blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>
After:  <cite>xxx</cite><blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>

Please note that is you found the symbol of string bafore which you need to insert the string you need to substrac one from the found position

$pos=index($test_string,'<blockquote>;);
if( $pos > -1 ){
    substr($test_string,$pos-1,0)='</cite>';
}

[Mar 20, 2017] Cultured Perl Debugging Perl with ease

Mar 20, 2017 | www.ibm.com

Teodor Zlatanov
Published on November 01, 2000

The Perl debugger comes with its own help ('h' or 'h h', for the long and short help screens, respectively). The perldoc perldebug page (type "perldoc perldebug" at your prompt) has a more complete description of the Perl debugger.

So let's start with a buggy program and take a look at how the Perl debugger works. First, it'll attempt to print the first 20 lines in a file.

1 2 3 4 5 6 7 8 9 10 #!/usr/bin/perl -w use strict; foreach (0..20) { my $line = ; print "$_ : $line"; }

When run by itself, buggy.pl fails with the message: "Use of uninitialized value in concatenation (.) at ./buggy.pl line 8, line 9." More mysteriously, it prints "9:" on a line by itself and waits for user input.

Now what does that mean? You may already have spotted the bugbear that came along when we fired up the Perl debugger.

First let's simply make sure the bug is repeatable. We'll set an action on line 8 to print $line where the error occurred, and run the program.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 > perl -d ./buggy.pl buggy.pl Default die handler restored. Loading DB routines from perl5db.pl version 1.07 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(./buggy.pl:5): foreach (0..20) main::(./buggy.pl:6): { DB<1> use Data::Dumper DB<1> a 8 print 'The line variable is now ', Dumper $line

The Data::Dumper module loads so that the autoaction can use a nice output format. The autoaction is set to do a print statement every time line 8 is reached. Now let's watch the show.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 DB c The line variable is now $VAR1 = '#!/usr/bin/perl -w '; 0 : #!/usr/bin/perl -w The line variable is now $VAR1 = ' '; 1 : The line variable is now $VAR1 = 'use strict; '; 2 : use strict; The line variable is now $VAR1 = ' '; 3 : The line variable is now $VAR1 = 'foreach (0..20) '; 4 : foreach (0..20) The line variable is now $VAR1 = '{ '; 5 : { The line variable is now $VAR1 = ' my $line = ; '; 6 : my $line = ; The line variable is now $VAR1 = ' print "$_ : $line"; '; 7 : print "$_ : $line"; The line variable is now $VAR1 = '} '; 8 : } The line variable is now $VAR1 = undef; Use of uninitialized value in concatenation (.) at ./buggy.pl line 8, <> line 9. 9 :

It's clear now that the problem occurred when the line variable was undefined. Furthermore, the program waited for more input. And pressing the Return key eleven more times created the following output:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 The line variable is now $VAR1 = ' '; 10 : The line variable is now $VAR1 = ' '; 11 : The line variable is now $VAR1 = ' '; 12 : The line variable is now $VAR1 = ' '; 13 : The line variable is now $VAR1 = ' '; 14 : The line variable is now $VAR1 = ' '; 15 : The line variable is now $VAR1 = ' '; 16 : The line variable is now $VAR1 = ' '; 17 : The line variable is now $VAR1 = ' '; 18 : The line variable is now $VAR1 = ' '; 19 : The line variable is now $VAR1 = ' '; 20 : Debugged program terminated. Use q to quit or R to restart, use O inhibit_exit to avoid stopping after program termination, h q, h R or h O to get additional info. DB<3>

By now it's obvious that the program is buggy because it unconditionally waits for 20 lines of input, even though there are cases in which the lines will not be there. The fix is to test the $line after reading it from the filehandle:

1 2 3 4 5 6 7 8 9 #!/usr/bin/perl -w use strict; foreach (0..20) { my $line = ; last unless defined $line; # exit loop if $line is not defined print "$_ : $line"; }

As you see, the fixed program works properly in all cases!

Concluding notes on the Perl debugger

The Emacs editor supports the Perl debugger and makes using it a somewhat better experience. You can read more about the GUD Emacs mode inside Emacs with Info (type M-x info). GUD is a universal debugging mode that works with the Perl debugger (type M-x perldb while editing a Perl program in Emacs).

With a little work, the vi family of editors will also support the Perl debuggers. See the perldoc perldebug page for more information. For other editors, consult each editor's documentation.

The Perl built-in debugger is a powerful tool and can do much more than the simple usage we just looked at. It does, however, require a fair amount of Perl expertise. Which is why we are now going to look at some simpler tools that will better suit beginning and intermediate Perl programmers.

Devel::ptkdb

To use the Devel::ptkdb debugger you first have to download it from CPAN ( see Related topics below) and install it on your system. (Some of you may also need to install the Tk module, also in CPAN.) On a personal note, Devel::ptkdb works best on UNIX systems like Linux. (Although it's not theoretically limited to UNIX-compatible systems, I have never heard of anyone successfully using Devel::ptkdb on Windows. As the old saying goes, anything is possible except skiing through a revolving door.)

If you can't get your system administrator to perform the installation for you (because, for instance, you are the system administrator), you can try doing the following at your prompt (you may need to run this as root):

1 2 perl -MCPAN -e'install Tk' perl -MCPAN -e'install Devel::ptkdb'

After some initial questions, if this is your first time running the CPAN installation routines, you will download and install the appropriate modules automatically.

You can run a program with the ptkdb debugger as follows (using our old buggy.pl example):

1 perl -d:ptkdb buggy.pl buggy.pl

To read the documentation for the Devel::ptkdb modules, use the command "perldoc Devel::ptkdb". We are using version 1.1071 here. (Although updated versions may come out at any time, they should not look very different from the one we're using.)

A window will come up with the program's source code on the left and a list of watched expressions (initially empty) on the right. Enter the word "$line" in the "Enter Expr:" box. Then click on the "Step Over" button to watch the program execute.

The "Run" button will run the program until it finishes or hits a breakpoint. Clicking on the line number in the source-listing window sets or deletes breakpoints. If you select the "BrkPts" tab on the right, you can edit the list of breakpoints and make them conditional upon a variable or a function. (This is a very easy way to set up conditional breakpoints.)

Ptkdb also has File, Control, Data, Stack, and Bookmarks menus. These menus are all explained in the perldoc documentation. Because it's so easy to use, Ptkdb is an absolute must for beginner and intermediate Perl programmers. It can even be useful for Perl gurus (as long as they don't tell anyone that they're using those new-fangled graphical interfaces).

Writing your own Perl shell

Sometimes using a debugger is overkill. If, for example, you want to test something simple, in isolation from the rest of a large program, a debugger would be too complex for the task. This is where a Perl shell can come in handy.

While other valid approaches to a Perl shell certainly exist, we're going to look at a general solution that works well for most daily work (and I use it all the time). Once you understand the tool, you should feel free to tailor it to your own needs and preferences.

The following code requires the Term::ReadLine module. You can download it from CPAN and install it almost the same way as you did Devel::ptkdb.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 #!/usr/bin/perl -w use Term::ReadLine; use Data::Dumper; my $historyfile = $ENV{HOME} . '/.phistory'; my $term = new Term::ReadLine 'Perl Shell'; sub save_list { my $f = shift; my $l = shift; open F, $f; print F "$_\n" foreach @$l } if (open H, $historyfile) { @h = ; chomp @h; close H; $h{$_} = 1 foreach @h; $term->addhistory($_) foreach keys %h; } while ( defined ($_ = $term->readline("My Perl Shell> ")) ) { my $res = eval($_); warn $@ if $@; unless ($@) { open H, ">>$historyfile"; print H "$_\n"; close H; print "\n", Data::Dumper->Dump([$res], ['Result']); } $term->addhistory($_) if /\S/; }

This Perl shell does several things well, and some things decently.

First of all, it keeps a unique history of commands already entered in your home directory in a file called ".phistory". If you enter a command twice, only one copy will remain (see the function that opens $historyfile and reads history lines from it).

With each entry of a new command, the command list is saved to the .phistory file. So if you enter a command that crashes the shell, the history of your last session is not lost.

The Term::ReadLine module makes it easy to enter commands for execution. Because commands are limited to only one line at a time, it's possible to write good old buggy.pl as:

1 2 3 4 5 6 Da Perl Shell> use strict $Result = undef; Perl Shell> print "$_: " . foreach (0..20) 0: ... 1: ...

The problem of course is that the input operator ends up eating the shell's own input. So don't use or STDIN in the Perl shell, because they'll make things more difficult. Try this instead:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Perl Shell> open F, "buggy.pl" $Result = 1; Perl Shell> foreach (0..20) { last if eof(F); print "$_: " . ; } 0: #!/usr/bin/perl -w 1: 2: use strict; 3: 4: foreach (0..20) 5: { 6: my $line = ; 7: last unless defined $line; # exit loop if $line is not defined 8: print "$_ : $line"; 9: } $Result = undef;

As you can see, the shell works for cases where you can easily condense statements into one line. It's also a surprisingly common solution to isolating bugs and provides a great learning environment. Do a little exercise and see if you can write a Perl shell for debugging on your own, and see how much you learn!

Building an arsenal of tools

We've only covered the very basics here of the built-in Perl debugger, Devel::ptkdb, and related tools. There are many more ways to debug Perl. What's important is that you gain an understanding of the debugging process: how a bug is observed, solved, and then fixed. Of course the single most important thing is to make sure you have a comprehensive understanding of your program's requirements.

The Perl built-in debugger is very powerful, but it's not good for beginner or intermediate Perl programmers. (With the exception of Emacs, where it can be a useful tool even for beginners as long as they understand debugging under Emacs.)

The Devel::ptkdb module and debugger are (because of power and usability) by far the best choice for beginning and intermediate programmers. Perl shells, on the other hand, are personalized debugging solutions for isolated problems with small pieces of code.

Every software tester builds his own arsenal of debugging tools, whether it's the Emacs editor with GUD, or a Perl shell, or print statements throughout the code. Hopefully the tools we've looked at here will make your debugging experience a little easier.

[Mar 20, 2017] Cultured Perl One-liners 102

Mar 20, 2017 | www.ibm.com
One-liners 102

More one-line Perl scripts

Teodor Zlatanov
Published on March 12, 2003 Share this page

Facebook Twitter Linked In Google+ E-mail this page Comments 0

This article, as regular readers may have guessed, is the sequel to " One-liners 101 ," which appeared in a previous installment of "Cultured Perl". The earlier article is an absolute requirement for understanding the material here, so please take a look at it before you continue.

The goal of this article, as with its predecessor, is to show legible and reusable code, not necessarily the shortest or most efficient version of a program. With that in mind, let's get to the code!

Tom Christiansen's list

Tom Christiansen posted a list of one-liners on Usenet years ago, and that list is still interesting and useful for any Perl programmer. We will look at the more complex one-liners from the list; the full list is available in the file tomc.txt (see Related topics to download this file). The list overlaps slightly with the " One-liners 101 " article, and I will try to point out those intersections.

Awk is commonly used for basic tasks such as breaking up text into fields; Perl excels at text manipulation by design. Thus, we come to our first one-liner, intended to add two columns in the text input to the script.

Listing 1. Like awk?
1 2 3 4 # add first and penultimate columns # NOTE the equivalent awk script: # awk '{i = NF - 1; print $1 + $i}' perl -lane 'print $F[0] + $F[-2]'

So what does it do? The magic is in the switches. The -n and -a switches make the script a wrapper around input that splits the input on whitespace into the @F array; the -e switch adds an extra statement into the wrapper. The code of interest actually produced is:

Listing 2: The full Monty
1 2 3 4 5 while (<>) { @F = split(' '); print $F[0] + $F[-2]; # offset -2 means "2nd to last element of the array" }

Another common task is to print the contents of a file between two markers or between two line numbers.

Listing 3: Printing a range of lines
1 2 3 4 5 6 7 8 9 10 11 # 1. just lines 15 to 17 perl -ne 'print if 15 .. 17' # 2. just lines NOT between line 10 and 20 perl -ne 'print unless 10 .. 20' # 3. lines between START and END perl -ne 'print if /^START$/ .. /^END$/' # 4. lines NOT between START and END perl -ne 'print unless /^START$/ .. /^END$/'

A problem with the first one-liner in Listing 3 is that it will go through the whole file, even if the necessary range has already been covered. The third one-liner does not have that problem, because it will print all the lines between the START and END markers. If there are eight sets of START/END markers, the third one-liner will print the lines inside all eight sets.

Preventing the inefficiency of the first one-liner is easy: just use the $. variable, which tells you the current line. Start printing if $. is over 15 and exit if $. is greater than 17.

Listing 4: Printing a numeric range of lines more efficiently
1 2 # just lines 15 to 17, efficiently perl -ne 'print if $. >= 15; exit if $. >= 17;'

Enough printing, let's do some editing. Needless to say, if you are experimenting with one-liners, especially ones intended to modify data, you should keep backups. You wouldn't be the first programmer to think a minor modification couldn't possibly make a difference to a one-liner program; just don't make that assumption while editing the Sendmail configuration or your mailbox.

Listing 5: In-place editing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # 1. in-place edit of *.c files changing all foo to bar perl -p -i.bak -e 's/\bfoo\b/bar/g' *.c # 2. delete first 10 lines perl -i.old -ne 'print unless 1 .. 10' foo.txt # 3. change all the isolated oldvar occurrences to newvar perl -i.old -pe 's{\boldvar\b}{newvar}g' *.[chy] # 4. increment all numbers found in these files perl -i.tiny -pe 's/(\d+)/ 1 + $1 /ge' file1 file2 .... # 5. delete all but lines between START and END perl -i.old -ne 'print unless /^START$/ .. /^END$/' foo.txt # 6. binary edit (careful!) perl -i.bak -pe 's/Mozilla/Slopoke/g' /usr/local/bin/netscape

Why does 1 .. 10 specify line numbers 1 through 10? Read the "perldoc perlop" manual page. Basically, the .. operator iterates through a range. Thus, the script does not count 10 lines , it counts 10 iterations of the loop generated by the -n switch (see "perldoc perlrun" and Listing 2 for an example of that loop).

The magic of the -i switch is that it replaces each file in @ARGV with the version produced by the script's output on that file. Thus, the -i switch makes Perl into an editing text filter. Do not forget to use the backup option to the -i switch. Following the i with an extension will make a backup of the edited file using that extension.

Note how the -p and -n switch are used. The -n switch is used when you want explicitly to print out data. The -p switch implicitly inserts a print $_ statement in the loop produced by the -n switch. Thus, the -p switch is better for full processing of a file, while the -n switch is better for selective file processing, where only specific data needs to be printed.

Examples of in-place editing can also be found in the " One-liners 101 " article.

Reversing the contents of a file is not a common task, but the following one-liners show than the -n and -p switches are not always the best choice when processing an entire file.

Listing 6: Reversal of files' fortunes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # 1. command-line that reverses the whole input by lines # (printing each line in reverse order) perl -e 'print reverse <>' file1 file2 file3 .... # 2. command-line that shows each line with its characters backwards perl -nle 'print scalar reverse $_' file1 file2 file3 .... # 3. find palindromes in the /usr/dict/words dictionary file perl -lne '$_ = lc $_; print if $_ eq reverse' /usr/dict/words # 4. command-line that reverses all the bytes in a file perl -0777e 'print scalar reverse <>' f1 f2 f3 ... # 5. command-line that reverses each paragraph in the file but prints # them in order perl -00 -e 'print reverse <>' file1 file2 file3 ....

The -0 (zero) flag is very useful if you want to read a full paragraph or a full file into a single string. (It also works with any character number, so you can use a special character as a marker.) Be careful when reading a full file in one command ( -0777 ), because a large file will use up all your memory. If you need to read the contents of a file backwards (for instance, to analyze a log in reverse order), use the CPAN module File::ReadBackwards. Also see " One-liners 101 ," which shows an example of log analysis with File::ReadBackwards.

Note the similarity between the first and second scripts in Listing 6. The first one, however, is completely different from the second one. The difference lies in using <> in scalar context (as -n does in the second script) or list context (as the first script does).

The third script, the palindrome detector, did not originally have the $_ = lc $_; segment. I added that to catch those palindromes like "Bob" that are not the same backwards.

My addition can be written as $_ = lc; as well, but explicitly stating the subject of the lc() function makes the one-liner more legible, in my opinion.

Paul Joslin's list

Paul Joslin was kind enough to send me some of his one-liners for this article.

Listing 7: Rewrite with a random number
1 2 # replace string XYZ with a random number less than 611 in these files perl -i.bak -pe "s/XYZ/int rand(611)/e" f1 f2 f3

This is a filter that replaces XYZ with a random number less than 611 (that number is arbitrarily chosen). Remember the rand() function returns a random number between 0 and its argument.

Note that XYZ will be replaced by a different random number every time, because the substitution evaluates "int rand(611)" every time.

Listing 8: Revealing the files' base nature
1 2 3 4 5 6 7 8 9 10 11 # 1. Run basename on contents of file perl -pe "s@.*/@@gio" INDEX # 2. Run dirname on contents of file perl -pe 's@^(.*/)[^/]+@$1\n@' INDEX # 3. Run basename on contents of file perl -MFile::Basename -ne 'print basename $_' INDEX # 4. Run dirname on contents of file perl -MFile::Basename -ne 'print dirname $_' INDEX

One-liners 1 and 2 came from Paul, while 3 and 4 were my rewrites of them with the File::Basename module. Their purpose is simple, but any system administrator will find these one-liners useful.

Listing 9: Moving or renaming, it's all the same in UNIX
1 2 3 4 5 6 # 1. write command to mv dirs XYZ_asd to Asd # (you may have to preface each '!' with a '\' depending on your shell) ls | perl -pe 's!([^_]+)_(.)(.*)!mv $1_$2$3 \u$2\E$3!gio' # 2. Write a shell script to move input from xyz to Xyz ls | perl -ne 'chop; printf "mv $_ %s\n", ucfirst $_;'

For regular users or system administrators, renaming files based on a pattern is a very common task. The scripts above will do two kinds of job: either remove the file name portion up to the _ character, or change each filename so that its first letter is uppercased according to the Perl ucfirst() function.

There is a UNIX utility called "mmv" by Vladimir Lanin that may also be of interest. It allows you to rename files based on simple patterns, and it's surprisingly powerful. See the Related topics section for a link to this utility.

Some of mine

The following is not a one-liner, but it's a pretty useful script that started as a one-liner. It is similar to Listing 7 in that it replaces a fixed string, but the trick is that the replacement itself for the fixed string becomes the fixed string the next time.

The idea came from a newsgroup posting a long time ago, but I haven't been able to find original version. The script is useful in case you need to replace one IP address with another in all your system files -- for instance, if your default router has changed. The script includes $0 (in UNIX, usually the name of the script) in the list of files to rewrite.

As a one-liner it ultimately proved too complex, and the messages regarding what is about to be executed are necessary when system files are going to be modified.

Listing 10: Replace one IP address with another one
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #!/usr/bin/perl -w use Regexp::Common qw/net/; # provides the regular expressions for IP matching my $replacement = shift @ARGV; # get the new IP address die "You must provide $0 with a replacement string for the IP 111.111.111.111" unless $replacement; # we require that $replacement be JUST a valid IP address die "Invalid IP address provided: [$replacement]" unless $replacement =~ m/^$RE{net}{IPv4}$/; # replace the string in each file foreach my $file ($0, qw[/etc/hosts /etc/defaultrouter /etc/ethers], @ARGV) { # note that we know $replacement is a valid IP address, so this is # not a dangerous invocation my $command = "perl -p -i.bak -e 's/111.111.111.111/$replacement/g' $file"; print "Executing [$command]\n"; system($command); }

Note the use of the Regexp::Common module, an indispensable resource for any Perl programmer today. Without Regexp::Common, you will be wasting a lot of time trying to match a number or other common patterns manually, and you're likely to get it wrong.

Conclusion

Thanks to Paul Joslin for sending me his list of one-liners. And in the spirit of conciseness that one-liners inspire, I'll refer you to " One-liners 101 " for some closing thoughts on one-line Perl scripts.
Articles by Teodor Zlatanov

Git gets demystified and Subversion control (Aug 27, 2009)

Build simple photo-sharing with Amazon cloud and Perl (Apr 06, 2009)

developerWorks: Use IMAP with Perl, Part 2 (May 26, 2005)

developerWorks: Complex Layered Configurations with AppConfig (Apr 11, 2005)

developerWorks: Perl 6 Grammars and Regular Expressions (Nov 09, 2004)

developerWorks: Genetic Algorithms Simulate a Multi-Celled Organism (Oct 28, 2004)

developerWorks: Cultured Perl: Managing Linux Configuration Files (Jun 15, 2004)

developerWorks: Cultured Perl: Fun with MP3 and Perl, Part 2 (Feb 09, 2004)

developerWorks: Cultured Perl: Fun with MP3 and Perl, Part 1 (Dec 16, 2003)

developerWorks: Inversion Lists with Perl (Oct 27, 2003)

developerWorks: Cultured Perl: One-Liners 102 (Mar 21, 2003)

developerWorks: Developing cfperl, From the Beginning (Jan 22, 2003)

IBM developerWorks: Using the xinetd program for system administration (Nov 28, 2001)

IBM developerWorks: Reading and writing Excel files with Perl (Sep 30, 2001)

IBM developerWorks: Automating UNIX system administration with Perl (Jul 22, 2001)

IBM developerWorks: A programmer's Linux-oriented setup - Optimizing your machine for your needs (Mar 25, 2001)

IBM developerWorks: Cultured Perl: Debugging Perl with ease (Nov 23, 2000)

IBM developerWorks: Cultured Perl: Review of Programming Perl, Third Edition (Sep 17, 2000)

IBM developerWorks: Cultured Perl: Writing Perl programs that speak English Using Parse::RecDescent (Aug 05, 2000)

IBM developerWorks: Perl: Small observations about the big picture (Jul 02, 2000)

IBM developerWorks: Parsing with Perl modules (Apr 30, 2000)

[Dec 27, 2016] Perl is a great choice for a variety of industries

Dec 27, 2016 | opensource.com

Opensource.com

Earlier this year, ActiveState conducted a survey of users who had downloaded our distribution of Perl over the prior year and a half. We received 356 responses99 commercial users and 257 individual users. I've been using Perl for a long time, and I expected that lengthy experience would be typical of the Perl community. Our survey results, however, tell a different story.

Almost one-third of the respondents have three or fewer years of experience. Nearly half of all respondents reported using Perl for fewer than five years, a statistic that could be attributed to Perl's outstanding, inclusive community. The powerful and pragmatic nature of Perl and its supportive community make it a great choice for a wide array of uses across a variety of industries.

For a deeper dive, check out this video of my talk at YAPC North America this year.

Perl careers

Right now you can search online and find Perl jobs related to Amazon and BBC, not to mention several positions at Boeing. A quick search on Dice.com, an IT and engineering career website, yielded 3,575 listings containing the word Perl at companies like Amazon, Athena Health, and Northrop Grumman. Perl is also found in the finance industry, where it's primarily used to pull data from databases and process it.

Perl benefits

Perl's consistent utilization is the result of myriad factors, but its open source background is a powerful attribute.

Projects using Perl reduce upfront costs and downstream risks, and when you factor in how clean and powerful Perl is, it becomes quite a compelling option. Add to this that Perl sees yearly releases (more than that, even, as Perl has seen seven releases since 2012), and you can begin to understand why Perl still runs large parts of the web.

Mojolicious, Dancer, and Catalyst are just a few of the powerful web frameworks built for Perl. Designed for simplicity and scalability, these frameworks provide aspiring Perl developers an easy entry point to the language, which might explain some of the numbers from the survey I mentioned above. The inclusive nature of the Perl community draws developers, as well. It's hard to find a more welcoming or active community, and you can see evidence of that in the online groups, open source projects, and regular worldwide conferences and workshops.

Perl modules

Perl also has a mature installation tool chain and a strong testing culture. Anyone who wants to create automated test suites for Perl projects has the assistance of the over 400 testing and quality modules available on CPAN (Comprehensive Perl Archive Network). They won't have to sort through all 400 to choose the best, though: Test::Most is a one-stop shop for the most commonly used test modules. CPAN is one of Perl's biggest advantages over other programming languages. The archive hosts tens of thousands of ready-to-use modules for Perl, and the breadth and variety of those modules is astounding.

Even with a quick search you can find hardcore numerical modules, ODE (ordinary differential equations) solvers, and countless other types of modules written over the last 20 years by thousands of contributors. This contribution-based archive network helps keep Perl fresh and relevant, proliferating modules like pollen that will blow around to the incredible number of Perl projects out in the world.

You might think that community modules aren't the most reliable, but every distribution of modules on CPAN has been tested on myriad platforms and Perl configurations. As a testament to the determination of Perl users, the community has constructed a testing network and they spend time to make sure each Perl module works well on every available platform. They also maintain extensively-checked libraries that help Perl developers with big data projects.

What we're seeing today is a significant, dedicated community of Perl developers. This is not only because the language is pragmatic, effective, and powerful, but also because of the incredible community that these developers compose. The Perl community doesn't appear to be going anywhere, which means neither is Perl.

[Dec 26, 2016] Perl Advent Calendar Enters Its 17th Year

Dec 26, 2016 | developers.slashdot.org
(perladvent.org)

Posted by EditorDavid on Saturday December 03, 2016 @10:34AM

An anonymous reader writes: Thursday brought this year's first new posts on the Perl Advent Calendar , a geeky tradition first started back in 2000.

Friday's post described Santa's need for fast, efficient code, and the day that a Christmas miracle occurred during Santa's annual code review (involving the is_hashref subroutine from Perl's reference utility library). And for the last five years, the calendar has also had its own Twitter feed .

But in another corner of the North Pole, you can also unwrap the Perl 6 Advent Calendar , which this year celebrates the one-year anniversary of the official launch of Perl 6. Friday's post was by brian d foy, a writer on the classic Perl textbooks Learning Perl and Intermediate Perl (who's now also crowdfunding his next O'Reilly book , Learning Perl 6 ).

foy's post talked about Perl 6's object hashes, while the calendar kicked off its new season Thursday with a discussion about creating Docker images using webhooks triggered by GitHub commits as an example of Perl 6's "whipupitude".

[Nov 18, 2015] Beginning Perl

**** the author tried to cover way too much for the introductory book. If you skip some chapters this might be book introductory book. Otherwise it is tilted toward intermediate. Most material is well written and it is clear that the author is knowledgabe in the subject he is trying to cover.
Sept 19, 2012 | Amazon.com
Athelbert Z. Athelstanon July 31, 2014

Nice attempt; flawed implementation

Utterly inadequate editing. e.g. In the references chapter, where a backslash is essential to the description at hand, the backslashes don't show. There are numerous other less critical editing failures.

The result makes the book useless as a training aid.

Craig Treptow, June 13, 2013

out of 5 starsA Great Book to Learn About Perl

Preface
I have been dabbling in Perl on and off since about 1993. For a decade or so, it was mostly "off", and then I took a position programming Perl full time about a year ago. We currently use perl 5.8.9, and I spend part of my time teaching Perl to old school mainframe COBOL programmers. Dare I say, I am the target market for this book?

Chapter 1
The author takes the time, to explain that you should ever use `PERL', since it's not an acronym. I find it funny that the section headings utilize an "all caps" font, so the author does end up using `PERL'. That's not even a quibble, I just chuckle at such things.

The author covers the perlbrew utility. Fantastic! What about all of us schmucks that are stuck with Windows at work, or elsewhere? Throw us a bone!! Ok, I don't think there is a bone to throw us, but the author does a great job of covering the options for Windows.

He covers the community! Amazing! Wonderful! Of all things a beginner should know, this is one of them, and it's great that the author has taken some time to describe what's out there.

One other note are the...notes. I love the fact that the author has left little breadcrumbs in the book (each starts with "NOTE" in a grey box), warning you about things that could ultimately hurt you. Case in point, the warning on page 13 regarding the old OO docs that came with 5.8 and 5.10. Wonderful.

Chapter 2
An entire chapter on CPAN? Yes!!! CPAN is a great resource, and part of what makes Perl so great. The author even has some advice regarding how to evaluate a module. Odd, though, there is no mention of the wonderful http://metacpan.org site. That is quickly becoming the favorite of a lot of people.

It is great that the author covers the various cpan clients. However, if you end up in a shop like mine, that ends up being useless as you have to beg some sysadmin for every module you want installed.

Chapter 3
The basics of Perl are covered here in a very thorough way. The author takes you from "What is programming?" to package variables and some of the Perl built-in variables in short order.

Chapter 4
Much more useful stuff is contained in this chapter. I mean I wish pack() and unpack() were made known to me when I first saw Perl, but hey, Perl is huge and I can understand leaving such things out, but I'm happy the author left a lot of them in.

Herein lies another one of those wonderful grey boxes. On page 106 you'll find the box labeled `What is "TRUTH"?' So many seem to stumble over this, so it is great that it's in the book and your attention is drawn to it.

Chapter 5
Here you'll find the usual assortment of control-flow discussion including the experimental given/when, which most will know as a "switch" or "case" statement. The author even has a section to warn you against your temptation to use the "Switch" module. That's good stuff.

Chapter 6
Wow references so early in the book!?!? Upon reflecting a bit, I think this is a good move. They allow so much flexibility with Perl, that I'm happy the author has explored them so early.

Chapter 7
I do find it odd that a chapter on subroutines comes after a chapter on references, though. It seems like subroutines are the obvious choice to get a beginning programmer to start organizing their code. Hence, it should have come earlier.

Having said that, I love the authors technique of "Named Arguments" and calling the hash passed in "%arg_for". It reads so well! I'm a fan and now tend to use this. Of course, it is obvious now that references needed to be discussed first, or this technique would just be "black magic" to a new Perl person.

There are so many other good things in this chapter: Carp, Try::Tiny, wantarray, Closures, recursion, etc. This is definitely a good chapter to read a couple of times and experiment with the code.

Chapter 8
As the author points out, an entire book has been written on the topic of regular expressions (perhaps even more than one book). The author does a good job of pulling out the stuff you're most likely to use and run across in code.

Chapter 9
Here's one that sort of depends on what you do. It's good to know, but if you spend your days writing web apps that never interact with the file system, you'll never use this stuff. Of course thinking that will mean that you'll use it tomorrow, so read the chapter today anyway. :)

Chapter 10
A chapter on just sort, map, and grep? Yes, yes there is, and it is well worth reading. This kind of stuff is usually left for some sort of "intermediate" level book, but it's good to read about it now and try to use them to see how they can help.

Chapter 11
Ah, yes, a good chapter for when you've gotten past a single file with 100 subroutines and want to organize that in a more manageable way. I find it a bit odd that POD comes up in this chapter, rather than somewhere else. I guess it makes sense here, but would you really not document until you got to this point? Perhaps, but hey, at least you're documenting now. :)

Chapter 12 and 13
I like the author's presentation of OO. I think you get a good feel for the "old school" version that you are likely to see in old code bases with a good comparison of how that can be easier by using Moose. These two chapters are worth reading a few times and playing with some code.

Chapter 14
Unit testing for the win! I loved seeing this chapter. I walked into a shop with zero unit tests and have started the effort. Testing has been part of the Perl culture since the beginning. Embrace it. We can't live in a world without unit tests. I've been doing that and it hurts, don't do that to yourself.

Chapter 15
"The Interwebs", really? I don't know what I would have called this chapter, but I'm happy it exists. Plack is covered, yay!!! Actually, this is a good overview of "web programming", and just "how the web works". Good stuff.

Chapter 16
A chapter on DBI? Yes! This is useful. If you work in almost any shop, data will be in a database and you'll need to get to it.

Chapter 17
"Plays well with others"...hmmm....another odd title, yet I can't think of a more appropriate one. How about "The chapter about STDIN, STDOUT, and STDERR". That's pretty catchy, right?

Chapter 18
A chapter on common tasks, yet I've only had to do one of those things ( parsing and manipulating dates). I think my shop is weird, or I just haven't gotten involved with projects that required any of the other activities, such as reading/writing XML.

Including the debugger and a profiler is good. However, how do you use the debugger with a web app? I don't know. Perhaps one day I'll figure it out. That's a section I wish was in the book. The author doesn't mention modulinos, but I think that's the way to use the debugger for stepping through module. I could be wrong. In any case, a little more on debugger scenarios would have been helpful. A lot of those comments also apply to profiling. I hope I just missed that stuff in this chapter. :)

Chapter 19
Wow, the sort of "leftover" chapter, yet still useful. It is good to know about ORMs for instance, even if you are like me and can't use them at work (yet).

Quick coverage of templates and web frameworks? Yes, and Yes! I love a book that doesn't mention CGI.pm, since it is defunct now. Having said that, there are probably tons of shops that use it (like mine) until their employees demand that it be deleted from systems without remorse. So, it probably should have been given at least some lip service.

I am an admitted "fanboy" of Ovid. Given that, I can see how you might think I got paid for this or something. I didn't. I just think that he did a great job covering Perl with this book. He gives you stuff here that other authors have separated into multiple books. So much, in fact, that you won't even miss the discussion of what was improved with Perl's past v5.10.

All in all, if you buy this book, I think you'll be quite happy with it.

[Nov 16, 2015] undef can be used as a dummy variable in split function

Instead of

($id, $not_used, credentials, $home_dir, $shell ) = split /:/;

You can write

($id, undef, credentials, $home_dir, $shell ) = split /:/;

In Perl 22 they even did pretty fancy (and generally useless staff). Instead of

my(undef, $card_num, undef, undef, undef, $count) = split /:/;

You can write

use v5.22; 
my(undef, $card_num, (undef)x3, $count) = split /:/;

[Nov 15, 2015] Web Basics with LWP

Aug 20, 2002 | Perl.com

LWP (short for "Library for WWW in Perl") is a popular group of Perl modules for accessing data on the Web. Like most Perl module-distributions, each of LWP's component modules comes with documentation that is a complete reference to its interface. However, there are so many modules in LWP that it's hard to know where to look for information on doing even the simplest things.

Introducing you to using LWP would require a whole book--a book that just happens to exist, called Perl & LWP. This article offers a sampling of recipes that let you perform common tasks with LWP.

Getting Documents with LWP::Simple

If you just want to access what's at a particular URL, the simplest way to do it is to use LWP::Simple's functions.

In a Perl program, you can call its get($url) function. It will try getting that URL's content. If it works, then it'll return the content; but if there's some error, it'll return undef.

  my $url = 'http://freshair.npr.org/dayFA.cfm?todayDate=current';
    # Just an example: the URL for the most recent /Fresh Air/ show
  use LWP::Simple;
  my $content = get $url;
  die "Couldn't get $url" unless defined $content;

  # Then go do things with $content, like this:

  if($content =~ m/jazz/i) {
    print "They're talking about jazz today on Fresh Air!\n";
  } else {
    print "Fresh Air is apparently jazzless today.\n";
  }

The handiest variant on get is getprint, which is useful in Perl one-liners. If it can get the page whose URL you provide, it sends it to STDOUT; otherwise it complains to STDERR.


  % perl -MLWP::Simple -e "getprint 'http://cpan.org/RECENT'"

This is the URL of a plain-text file. It lists new files in CPAN in the past two weeks. You can easily make it part of a tidy little shell command, like this one that mails you the list of new Acme:: modules:


  % perl -MLWP::Simple -e "getprint 'http://cpan.org/RECENT'"  \
     | grep "/by-module/Acme" | mail -s "New Acme modules! Joy!" $USER

There are other useful functions in LWP::Simple, including one function for running a HEAD request on a URL (useful for checking links, or getting the last-revised time of a URL), and two functions for saving and mirroring a URL to a local file. See the LWP::Simple documentation for the full details, or Chapter 2, "Web Basics" of Perl & LWP for more examples.

The Basics of the LWP Class Model

LWP::Simple's functions are handy for simple cases, but its functions don't support cookies or authorization; they don't support setting header lines in the HTTP request; and generally, they don't support reading header lines in the HTTP response (most notably the full HTTP error message, in case of an error). To get at all those features, you'll have to use the full LWP class model.

While LWP consists of dozens of classes, the two that you have to understand are LWP::UserAgent and HTTP::Response. LWP::UserAgent is a class for "virtual browsers," which you use for performing requests, and HTTP::Response is a class for the responses (or error messages) that you get back from those requests.

The basic idiom is $response = $browser->get($url), or fully illustrated:


  # Early in your program:
  
  use LWP 5.64; # Loads all important LWP classes, and makes
                #  sure your version is reasonably recent.

  my $browser = LWP::UserAgent->new;
  
  ...
  
  # Then later, whenever you need to make a get request:
  my $url = 'http://freshair.npr.org/dayFA.cfm?todayDate=current';
  
  my $response = $browser->get( $url );
  die "Can't get $url -- ", $response->status_line
   unless $response->is_success;

  die "Hey, I was expecting HTML, not ", $response->content_type
   unless $response->content_type eq 'text/html';
     # or whatever content-type you're equipped to deal with

  # Otherwise, process the content somehow:
  
  if($response->content =~ m/jazz/i) {
    print "They're talking about jazz today on Fresh Air!\n";
  } else {
    print "Fresh Air is apparently jazzless today.\n";
  }
There are two objects involved: $browser, which holds an object of the class LWP::UserAgent, and then the $response object, which is of the class HTTP::Response. You really need only one browser object per program; but every time you make a request, you get back a new HTTP::Response object, which will have some interesting attributes: Adding Other HTTP Request Headers

The most commonly used syntax for requests is $response = $browser->get($url), but in truth, you can add extra HTTP header lines to the request by adding a list of key-value pairs after the URL, like so:


  $response = $browser->get( $url, $key1, $value1, $key2, $value2, ... );

For example, here's how to send more Netscape-like headers, in case you're dealing with a site that would otherwise reject your request:


  my @ns_headers = (
   'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
   'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, 
        image/pjpeg, image/png, */*',
   'Accept-Charset' => 'iso-8859-1,*,utf-8',
   'Accept-Language' => 'en-US',
  );

  ...
  
  $response = $browser->get($url, @ns_headers);

If you weren't reusing that array, you could just go ahead and do this:



  $response = $browser->get($url,
   'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
   'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, 
        image/pjpeg, image/png, */*',
   'Accept-Charset' => 'iso-8859-1,*,utf-8',
   'Accept-Language' => 'en-US',
  );

If you were only going to change the 'User-Agent' line, you could just change the $browser object's default line from "libwww-perl/5.65" (or the like) to whatever you like, using LWP::UserAgent's agent method:


   $browser->agent('Mozilla/4.76 [en] (Win98; U)');
Enabling Cookies

A default LWP::UserAgent object acts like a browser with its cookies support turned off. There are various ways of turning it on, by setting its cookie_jar attribute. A "cookie jar" is an object representing a little database of all the HTTP cookies that a browser can know about. It can correspond to a file on disk (the way Netscape uses its cookies.txt file), or it can be just an in-memory object that starts out empty, and whose collection of cookies will disappear once the program is finished running.

To give a browser an in-memory empty cookie jar, you set its cookie_jar attribute like so:


  $browser->cookie_jar({});

To give it a copy that will be read from a file on disk, and will be saved to it when the program is finished running, set the cookie_jar attribute like this:


  use HTTP::Cookies;
  $browser->cookie_jar( HTTP::Cookies->new(
    'file' => '/some/where/cookies.lwp',
        # where to read/write cookies
    'autosave' => 1,
        # save it to disk when done
  ));

That file will be an LWP-specific format. If you want to access the cookies in your Netscape cookies file, you can use the HTTP::Cookies::Netscape class:


  use HTTP::Cookies;
    # yes, loads HTTP::Cookies::Netscape too
  
  $browser->cookie_jar( HTTP::Cookies::Netscape->new(
    'file' => 'c:/Program Files/Netscape/Users/DIR-NAME-HERE/cookies.txt',
        # where to read cookies
  ));

You could add an 'autosave' => 1 line as we did earlier, but at time of writing, it's uncertain whether Netscape might discard some of the cookies you could be writing back to disk.

Posting Form Data

Many HTML forms send data to their server using an HTTP POST request, which you can send with this syntax:


 $response = $browser->post( $url,
   [
     formkey1 => value1, 
     formkey2 => value2, 
     ...
   ],
 );
Or if you need to send HTTP headers:

 $response = $browser->post( $url,
   [
     formkey1 => value1, 
     formkey2 => value2, 
     ...
   ],
   headerkey1 => value1, 
   headerkey2 => value2, 
 );

For example, the following program makes a search request to AltaVista (by sending some form data via an HTTP POST request), and extracts from the HTML the report of the number of matches:


  use strict;
  use warnings;
  use LWP 5.64;
  my $browser = LWP::UserAgent->new;
  
  my $word = 'tarragon';
  
  my $url = 'http://www.altavista.com/sites/search/web';
  my $response = $browser->post( $url,
    [ 'q' => $word,  # the Altavista query string
      'pg' => 'q', 'avkw' => 'tgz', 'kl' => 'XX',
    ]
  );
  die "$url error: ", $response->status_line
   unless $response->is_success;
  die "Weird content type at $url -- ", $response->content_type
   unless $response->content_type eq 'text/html';

  if( $response->content =~ m{AltaVista found ([0-9,]+) results} ) {
    # The substring will be like "AltaVista found 2,345 results"
    print "$word: $1\n";
  } else {
    print "Couldn't find the match-string in the response\n";
  }
Sending GET Form Data

Some HTML forms convey their form data not by sending the data in an HTTP POST request, but by making a normal GET request with the data stuck on the end of the URL. For example, if you went to imdb.com and ran a search on Blade Runner, the URL you'd see in your browser window would be:


  http://us.imdb.com/Tsearch?title=Blade%20Runner&restrict=Movies+and+TV

To run the same search with LWP, you'd use this idiom, which involves the URI class:


  use URI;
  my $url = URI->new( 'http://us.imdb.com/Tsearch' );
    # makes an object representing the URL
  
  $url->query_form(  # And here the form data pairs:
    'title'    => 'Blade Runner',
    'restrict' => 'Movies and TV',
  );
  
  my $response = $browser->get($url);

See Chapter 5, "Forms" of Perl & LWP for a longer discussion of HTML forms and of form data, as well as Chapter 6 through Chapter 9 for a longer discussion of extracting data from HTML.

Absolutizing URLs

The URI class that we just mentioned above provides all sorts of methods for accessing and modifying parts of URLs (such as asking sort of URL it is with $url->scheme, and asking what host it refers to with $url->host, and so on, as described in the docs for the URI class. However, the methods of most immediate interest are the query_form method seen above, and now the new_abs method for taking a probably relative URL string (like "../foo.html") and getting back an absolute URL (like "http://www.perl.com/stuff/foo.html"), as shown here:


  use URI;
  $abs = URI->new_abs($maybe_relative, $base);

For example, consider this program that matches URLs in the HTML list of new modules in CPAN:


  use strict;
  use warnings;
  use LWP 5.64;
  my $browser = LWP::UserAgent->new;
  
  my $url = 'http://www.cpan.org/RECENT.html';
  my $response = $browser->get($url);
  die "Can't get $url -- ", $response->status_line
   unless $response->is_success;
  
  my $html = $response->content;
  while( $html =~ m/<A HREF=\"(.*?)\"/g ) {    
      print "$1\n";  
  }

When run, it emits output that starts out something like this:


  MIRRORING.FROM
  RECENT
  RECENT.html
  authors/00whois.html
  authors/01mailrc.txt.gz
  authors/id/A/AA/AASSAD/CHECKSUMS
  ...

However, if you actually want to have those be absolute URLs, you can use the URI module's new_abs method, by changing the while loop to this:


  while( $html =~ m/<A HREF=\"(.*?)\"/g ) {    
      print URI->new_abs( $1, $response->base ) ,"\n";
  }

(The $response->base method from HTTP::Message is for returning the URL that should be used for resolving relative URLs--it's usually just the same as the URL that you requested.)

That program then emits nicely absolute URLs:


  http://www.cpan.org/MIRRORING.FROM
  http://www.cpan.org/RECENT
  http://www.cpan.org/RECENT.html
  http://www.cpan.org/authors/00whois.html
  http://www.cpan.org/authors/01mailrc.txt.gz
  http://www.cpan.org/authors/id/A/AA/AASSAD/CHECKSUMS
  ...

See Chapter 4, "URLs", of Perl & LWP for a longer discussion of URI objects.

Of course, using a regexp to match hrefs is a bit simplistic, and for more robust programs, you'll probably want to use an HTML-parsing module like HTML::LinkExtor, or HTML::TokeParser, or even maybe HTML::TreeBuilder.

Other Browser Attributes

LWP::UserAgent objects have many attributes for controlling how they work. Here are a few notable ones:

For more options and information, see the full documentation for LWP::UserAgent.

Writing Polite Robots

If you want to make sure that your LWP-based program respects robots.txt files and doesn't make too many requests too fast, you can use the LWP::RobotUA class instead of the LWP::UserAgent class.

LWP::RobotUA class is just like LWP::UserAgent, and you can use it like so:


  use LWP::RobotUA;
  my $browser = LWP::RobotUA->new(
    'YourSuperBot/1.34', 'you@yoursite.com');
    # Your bot's name and your email address

  my $response = $browser->get($url);

But HTTP::RobotUA adds these features:

For more options and information, see the full documentation for LWP::RobotUA.

Using Proxies

In some cases, you will want to (or will have to) use proxies for accessing certain sites or for using certain protocols. This is most commonly the case when your LWP program is running (or could be running) on a machine that is behind a firewall.

To make a browser object use proxies that are defined in the usual environment variables (HTTP_PROXY), just call the env_proxy on a user-agent object before you go making any requests on it. Specifically:


  use LWP::UserAgent;
  my $browser = LWP::UserAgent->new;
  
  # And before you go making any requests:
  $browser->env_proxy;

For more information on proxy parameters, see the LWP::UserAgent documentation, specifically the proxy, env_proxy, and no_proxy methods.

HTTP Authentication

Many Web sites restrict access to documents by using "HTTP Authentication". This isn't just any form of "enter your password" restriction, but is a specific mechanism where the HTTP server sends the browser an HTTP code that says "That document is part of a protected 'realm', and you can access it only if you re-request it and add some special authorization headers to your request".

For example, the Unicode.org administrators stop email-harvesting bots from harvesting the contents of their mailing list archives by protecting them with HTTP Authentication, and then publicly stating the username and password (at http://www.unicode.org/mail-arch/)--namely username "unicode-ml" and password "unicode".

For example, consider this URL, which is part of the protected area of the Web site:


  http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html

If you access that with a browser, you'll get a prompt like "Enter username and password for 'Unicode-MailList-Archives' at server 'www.unicode.org'", or in a graphical browser, something like this:

In LWP, if you just request that URL, like this:


  use LWP 5.64;
  my $browser = LWP::UserAgent->new;

  my $url =
   'http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html';
  my $response = $browser->get($url);

  die "Error: ", $response->header('WWW-Authenticate') || 
    'Error accessing',
    #  ('WWW-Authenticate' is the realm-name)
    "\n ", $response->status_line, "\n at $url\n Aborting"
   unless $response->is_success;

Then you'll get this error:


  Error: Basic realm="Unicode-MailList-Archives"
   401 Authorization Required
   at http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html
   Aborting at auth1.pl line 9.  [or wherever]

because the $browser doesn't know any the username and password for that realm ("Unicode-MailList-Archives") at that host ("www.unicode.org"). The simplest way to let the browser know about this is to use the credentials method to let it know about a username and password that it can try using for that realm at that host. The syntax is:


  $browser->credentials(
    'servername:portnumber',
    'realm-name',
    'username' => 'password'
  );

In most cases, the port number is 80, the default TCP/IP port for HTTP; and you usually call the credentials method before you make any requests. For example:


  $browser->credentials(
    'reports.mybazouki.com:80',
    'web_server_usage_reports',
    'plinky' => 'banjo123'
  );

So if we add the following to the program above, right after the $browser = LWP::UserAgent->new; line:


  $browser->credentials(  # add this to our $browser 's "key ring"
    'www.unicode.org:80',
    'Unicode-MailList-Archives',
    'unicode-ml' => 'unicode'
  );

and then when we run it, the request succeeds, instead of causing the die to be called.

Accessing HTTPS URLs

When you access an HTTPS URL, it'll work for you just like an HTTP URL would--if your LWP installation has HTTPS support (via an appropriate Secure Sockets Layer library). For example:


  use LWP 5.64;
  my $url = 'https://www.paypal.com/';   # Yes, HTTPS!
  my $browser = LWP::UserAgent->new;
  my $response = $browser->get($url);
  die "Error at $url\n ", $response->status_line, "\n Aborting"
   unless $response->is_success;
  print "Whee, it worked!  I got that ",
   $response->content_type, " document!\n";

If your LWP installation doesn't have HTTPS support set up, then the response will be unsuccessful, and you'll get this error message:


  Error at https://www.paypal.com/
   501 Protocol scheme 'https' is not supported
   Aborting at paypal.pl line 7.   [or whatever program and line]

If your LWP installation does have HTTPS support installed, then the response should be successful, and you should be able to consult $response just like with any normal HTTP response.

For information about installing HTTPS support for your LWP installation, see the helpful README.SSL file that comes in the libwww-perl distribution.

Getting Large Documents

When you're requesting a large (or at least potentially large) document, a problem with the normal way of using the request methods (like $response = $browser->get($url)) is that the response object in memory will have to hold the whole document--in memory. If the response is a 30-megabyte file, this is likely to be quite an imposition on this process's memory usage.

A notable alternative is to have LWP save the content to a file on disk, instead of saving it up in memory. This is the syntax to use:


  $response = $ua->get($url,
                         ':content_file' => $filespec,
                      );

For example,


  $response = $ua->get('http://search.cpan.org/',
                         ':content_file' => '/tmp/sco.html'
                      );

When you use this :content_file option, the $response will have all the normal header lines, but $response->content will be empty.

Note that this ":content_file" option isn't supported under older versions of LWP, so you should consider adding use LWP 5.66; to check the LWP version, if you think your program might run on systems with older versions.

If you need to be compatible with older LWP versions, then use this syntax, which does the same thing:


  use HTTP::Request::Common;
  $response = $ua->request( GET($url), $filespec );

Resources

Remember, this article is just the most rudimentary introduction to LWP--to learn more about LWP and LWP-related tasks, you really must read from the following:


Copyright 2002, Sean M. Burke. You can redistribute this document and/or modify it, but only under the same terms as Perl itself.

[Nov 15, 2015] Unescaped left brace in regex is deprecated

Here maintainers went in wrong direction. Those guys are playing dangerous games and keeping users hostage. I wonder why this warning is installed but in any case it is implemented incorrectly. It raised the warning in $zone =~/^(\d{4})\/(\d{1,2})\/(\d{1,2})$/ which breaks compatibility with huge mass of Perl scripts and Perl books. Is not this stupid? I think this is a death sentence for version 5.22. Reading Perl delta it looks like developers do not have any clear ideas how version 5 of the language should develop, do not write any documents about it that could be discussed.
Notable quotes:
"... A literal { should now be escaped in a pattern ..."
www.perlmonks.org

in reply to "Unescaped left brace in regex is deprecated"

From the perldelta for Perl v5.22.0:

A literal { should now be escaped in a pattern

If you want a literal left curly bracket (also called a left brace) in a regular expression pattern, you should now escape it by either preceding it with a backslash (\{) or enclosing it within square brackets [{], or by using \Q; otherwise a deprecation warning will be raised. This was first announced as forthcoming in the v5.16 release; it will allow future extensions to the language to happen.

[Nov 15, 2015] Perl LWP

LWP is a better deal that CGI.pm.
June 30, 2002 | Amazon.com

Read sample chapters online...

The LWP (Library for WWW in Perl) suite of modules lets your programs download and extract information from the Web. Perl & LWP shows how to make web requests, submit forms, and even provide authentication information, and it demonstrates using regular expressions, tokens, and trees to parse HTML. This book is a must have for Perl programmers who want to automate and mine the Web.

Gavin

Excellent coverage of LWP, packed full of useful examples, on July 16, 2002

I was definitely interested when I first heard that O'Reilly were publishing a book on LWP. LWP is a definitive collection of perl modules covering everything you could think of doing with URIs, HTML, and HTTP. While 'web services' are the buzzword friendly technology of the day, sometimes you need to roll your sleeves up and get a bit dirty scraping screens and hacking at HTML. For such a deep subject, this book weighs in at a slim 242 pages. This is a very good thing. I'm far too busy to read these massive shelf-destroying tomes that seem to be churned out recently.

It covers everything you need to know with concise examples, which is what makes this book really shine. You start with the basics using LWP::Simple through to more advanced topics using LWP::UserAgent, HTTP::Cookies, and WWW::RobotRules. Sean shows finger saving tips and shortcuts that take you more than a couple notches above what you can learn from the lwpcook manpage, with enough depth to satisfy somebody who is an experienced LWP hacker.

This book is a great reference, just flick through and you'll find a relevant chapter with an example to save the day. Chapters include filling in forms and extracting data from HTML using regular expressions, then more advanced topics using HTML::TokeParser, and then my preferred tool, the author's own HTML::TreeBuilder. The book ends with a chapter on spidering, with excellent coverage of design and warnings to get your started on your web trawling.

[Nov 01, 2015] Stupid open() tricks

perltricks.com

Create an anonymous temporary file

If I give open a filename of an explicit undef and the read-write mode (+> or +<), Perl opens an anonymous temporary file:

 open   my   $fh  ,     '+>'  ,     undef  ; 

Perl actually creates a named file and opens it, but immediately unlinks the name. No one else will be able to get to that file because no one else has the name for it. If I had used File::Temp, I might leave the temporary file there, or something else might be able to see it while I'm working with it.

Print to a string

If my perl is compiled with PerlIO (it probably is), I can open a filehandle on a scalar variable if the filename argument is a reference to that variable.

 open   my   $fh  ,     '>'  ,   \   my   $string  ; 

This is handy when I want to capture output for an interface that expects a filehandle:

 something_that_prints  (   $fh   ); 

Now $string contains whatever was printed by the function. I can inspect it by printing it:

 say   "I captured:\n$string"  ; 

Read lines from a string

I can also read from a scalar variable by opening a filehandle on it.

 open   my   $fh  ,     '<'  ,   \ $string  ; 

Now I can play with the string line-by-line without messing around with regex anchors or line endings:

 while  (     <  $fh  >     )     {     ...     } 

I write about these sorts of filehandle-on-string tricks in Effective Perl Programming.

Make a pipeline

Most Unix programmers probably already know that they can read the output from a command as the input for another command. I can do that with Perl's open too:

 use   v5  .  10  ;  

open   my   $pipe  ,     '-|'  ,     'date'  ;  
  while  (     <  $pipe  >     )     {  
  say   "$_"  ;  
    } 

This reads the output of the date system command and prints it. But, I can have more than one command in that pipeline. I have to abandon the three-argument form which purposely prevents this nonsense:

 open   my   $pipe  ,   qq  (  cat   '$0'     |   sort   |);  
  while  (     <  $pipe  >     )     {  
    print     "$.: $_"  ;  
    } 

This captures the text of the current program, sorts each line alphabetically and prints the output with numbered lines. I might get a Useless Use of cat Award for that program that sorts the lines of the program, but it's still a feature.

gzip on the fly

In Gzipping data directly from Perl, I showed how I could compress data on the fly by using Perl's gzip IO layer. This is handy when I have limited disk space:

 open   my   $fh  ,     '>:gzip'   $filename 
  or   die     "Could not write to $filename: $!"  ;  

  while  (   $_   =   something_interesting  ()     )     {  
    print     {   $fh   }   $_  ;  
  } 

I can go the other direction as well, reading directly from compressed files when I don't have enough space to uncompress them first:

 open   my   $fh  ,     '<:gzip'   $filename 
  or   die     "Could not read from $filename: $!"  ;  

  while  (     <  $fh  >     )     {  
    print  ;  
    } 

Change STDOUT

I can change the default output filehandle with select if I don't like standard output, but I can do that in another way. I can change STDOUT for the times when the easy way isn't fun enough. David Farrell showed some of this in How to redirect and restore STDOUT.

First I can say the "dupe" the standard output filehandle with the special &mode:

 use   v5  .  10  ;  

open   my   $STDOLD  ,     '>&'  ,   STDOUT  ; 

Any of the file modes will work there as long as I append the & to it.

I can then re-open STDOUT:

 open STDOUT  ,     '>>'  ,     'log.txt'  ;  
say   'This should be logged to log.txt.'  ; 

When I'm ready to change it back, I do the same thing:

 open STDOUT  ,     '>&'  ,   $STDOLD  ;  
say   'This should show in the terminal'  ; 

If I only have the file descriptor, perhaps because I'm working with an old Unix programmer who thinks vi is a crutch, I can use that:

 open   my   $fh  ,     "<&=$fd"   
  or   die     "Could not open filehandle on $fd\n"  ; 

This file descriptor has a three-argument form too:

 open   my   $fh  ,     '<&='  ,   $fd
  or   die     "Could not open filehandle on $fd\n"  ; 

I can have multiple filehandles that go to the same place since they are different names for the same file descriptor:

 use   v5  .  10  ;  

open   my   $fh  ,     '>>&='  ,   fileno  (  STDOUT  );  

say           'Going to default'  ;  
say $fh       'Going to duped version. fileno '     .   fileno  (  $fh  );  
say STDOUT    'Going to STDOUT. fileno '     .   fileno  (  $fh  ); 

All of these print to STDOUT.

[Oct 31, 2015] A preview of Perl 5.222 by brian d foy

The release is quite disappointing, not to say worse ... Warning about non escaped { in regex is a SNAFU as it is implemented completely incorrectly and does no distinguish the important cases like \d{3} or .{3} (in this case no backslash should never be required).
April 10, 2015 | perltricks.com

Perl v5.22 is bringing myriad new features and ways of doing things, making its perldelta file much more interesting than most releases. While I normally wait until after the first stable release to go through these features over at The Effective Perler, here's a preview of some of the big news.

A safer ARGV

The line input operator, <> looks at the @ARGV array for filenames to open and read through the ARGV filehandle. It has the same meta-character problem as the two-argument open. Special characters in the filename might do shell things. To get around this unintended feature (which I think might be useful if that's what you want), there's a new line-input operator, <<>>, that doesn't treat any character as special:

while( <<>> ) {  # new, safe line input operator
	...;
	}
CGI.pm and Module::Build disappear from core

The Perl maintainers have been stripping modules from the Standard Library. Sometimes that's because no one uses (or should use) that module anymore, no one wants to maintain that module, or it's better to get it from CPAN where the maintainer can update it faster than the Perl release cycle. You can still find these modules on CPAN, though.

The CGI.pm module, only one of Lincoln Stein's amazing contributions to the Perl community, is from another era. It was light years ahead of its Perl 4 predecessor, cgi.pl. It did everything, including HTML generation. This was the time before robust templating systems came around, and CGI.pm was good. But, they've laid it to rest.

Somehow, Module::Build fell out of favor. Before then, building and installing Perl modules depended on a non-perl tool, make. That's a portability problem. However, we already know they have Perl, so if there were a pure Perl tool that could do the same thing we could solve the portability problem. We could also do much more fancy things. It was the wave of the future. I didn't really buy into Module::Build although I had used it for a distributions, but I'm still a bit sad to see it go. It had some technical limitations and was unmaintained for a bit, and now it's been cut loose. David Golden explains more about that in Paying respect to Module::Build.

This highlights a long-standing and usually undiscovered problem with modules that depend on modules in the Standard Library. For years, most authors did not bother to declare those dependencies because Perl was there and its modules must be there too. When those modules move to a CPAN-only state, they end up with an undeclared dependencies. This also shows up in some linux distributions that violate the Perl license by removing some modules or putting them in a different package. Either way, always declare a dependency on everything you use despite its provenance.

Hexadecimal floating point values

Have you always felt too constrained by ten digits, but were also stuck with non-integers? Now your problems are solved with hexadecimal floating point numbers.

We already have the exponential notation with uses the e to note the exponent, as in 1.23e4. But that e is a hexadecimal digit, so we can't use that to denote the exponent. Instead, we use p and an exponent that's a power of two:

use v5.22;

my $num = 0.p2;
Variable aliases

We can now assign to the reference version of a non-reference variable. This creates an alias for the referenced value.

use v5.22;
use feature qw(refaliasing);

\%other_hash = \%hash;

I think we'll discover many interesting uses for this, and probably some dangerous ones, but the use case in the docs looks interesting. We can now assign to something other than a scalar for the foreach control variable:

use v5.22;
use feature qw(refaliasing);

foreach \my %hash ( @array_of_hashes ) { # named hash control variable
	foreach my $key ( keys %hash ) { # named hash now!
		...;
		}
	}

I don't think I'll use that particular pattern since I'm comfortable with references, but if you really hate the dereferencing arrow, this might be for you. Note that v5.12 allows us to write keys $hash_ref without the dereferencing %. See my Effective Perl items Use array references with the array operators, but also Dont use auto-dereferencing with each or keys.

Repetition in list assignment

Perl can assign one list of scalars to another. In Learning Perl we show assigning to undef. I could make dummy variables:

my($name, $card_num, $addr, $home, $work, $count) = split /:/;

But if I don't need all of those variable, I can put placeholder undefs in the assignment list:

my(undef, $card_num, undef, undef, undef, $count) = split /:/;

Those consecutive undefs can be a problem, as well as ugly. I don't have to count out separate undefs now:

use v5.22;

my(undef, $card_num, (undef)x3, $count) = split /:/;
List pipe opens on Win32

The three-argument open can take a pipe mode, which didn't previously work on Windows. Now it does, to the extent that the list form of system works on Win32:

open my $fh, '-|', 'some external command' or die;

I always have to check my notes to remember that the - in the pipe mode goes on the side of the pipe that has the pipe. Those in the unix world know - as a special filename for standard input in many commands.

Various small fixes

We also get many smaller fixes I think are worth a shout out. Many of these are clean ups to warts and special cases:

[Oct 31, 2015] starting with perl 5.14 local($_) will always strip all magic from $_, to make it possible to safely reuse $_ in a subroutine.

[Oct 06, 2015] Larry Wall Unveils Perl 6.0.0

October 06, 2015 | developers.slashdot.org

An anonymous reader writes: Last night Larry Wall unveiled the first development release of Perl 6, joking that now a top priority was fixing bugs that could be mistaken for features. The new language features meta-programming the ability to define new bits of syntax on your own to extend the language, and even new infix operators. Larry also previewed what one reviewer called "exotic and new" features, including the sequence operator and new control structures like "react" and "gather and take" lists. "We don't want their language to run out of steam," Larry told the audience. "It might be a 30- or 40-year language. I think it's good enough."

Can't find independent verification cruff (171569)
Neither perl6.org and its mailing lists seem to mention anything about this. The links in TFA are blocked by OpenDNS too.
Re:Can't find independent verification (Score:5, Informative) Tuesday October 06, 2015 @06:33PM (#50674655)
It's a development released (timed to coincide with Larry's birthday in September, according to Wikipedia).

Here's URLs where the event was announced.

http://www.meetup.com/SVPerl/e... [meetup.com]

http://perl6releasetalk.ticket... [ticketleap.com]

mbkennel (97636) on Tuesday October 06, 2015

Bugs mistaken as features? (Score:4, Funny)

Last night Larry Wall unveiled the first development release of Perl 6, joking that now a top priority was fixing bugs that could be mistaken for features.

Sounds good.

Anonymous Coward on Tuesday October 06, 2015 @07:20PM (#50675019)

No Coroutines (Score:5, Insightful)

No coroutines. So sad. That still leaves Lua and Stackless Python as the only languages with symmetric, transparent coroutines without playing games with the C stack.

Neither Lua nor Stackless Python implement recursion on the C stack. Python and apparently Perl6/More implement recursion on the C stack, which means that they can't easily create multiple stacks for juggling multiple flows of control. That's why in Python and Perl6 you have the async/await take/gather syntax, whereas in Lua coroutine.resume and coroutine.yield can be called from any function, regardless of where it is in the stack call frame, without having to adorn the function definition. Javascript is similarly crippled. All the promise/future/lambda stuff could be made infinitely more elegant with coroutines, but all modern Javascript implementations assume a single call stack, so the big vendors rejected coroutines.

In Lua a new coroutine has the same allocation cost as a new lambda/empty closure. And switching doesn't involving dumping or restoring CPU registers. So in Lua you can use coroutines to implement great algorithms without thinking twice. Not as just a fancy green threading replacement, but for all sorts of algorithms where the coroutine will be quickly discarded (just as coroutines' little brothers, generators, are typically short lived). Kernel threads and "fibers" are comparatively heavy weight, both in terms of performance and memory, compared to VM-level coroutines.

The only other language with something approximating cheap coroutines is Go.

I was looking forward to Perl 6. But I think I'll skip it. The top two language abstractions I would have loved to see were coroutines and lazy evaluation. Perl6 delivered poor approximations of those things. Those approximations are okay for the most-used design patterns, but aren't remotely composable to the same degree. And of course the "most used" patterns are that way because of existing language limitations.

These days I'm mostly a Lua and C guy who implements highly concurrent network services. I was really looking forward to Perl6 (I always liked Perl 5), but it remains the case that the only interesting language alternative in my space are Go and Rust. But Rust can't handle out-of-memory (OOM). (Impossible to, e.g., catch OOM when creating a Box). Apparently Rust developers think that it's okay to crash a service because a request failed, unless you want to create 10,000 kernel threads, which is possible but stupid. Too many Rust developers are desktop GUI and game developers with a very narrow, skewed experience about dealing with allocation and other resource failures. Even Lua can handle OOM trivially and cleanly without trashing the whole VM or unwinding the entire call stack. (Using protected calls, which is what Rust _should_ add.) So that basically just leaves Go, which is looking better and better. Not surprising given how similar Go and Lua are.

But the problem with Go is that you basically have to leave the C world behind for large applications (you can't call out to a C library from a random goroutine because it has to switch to a special C stack; which means you don't want to have 10,000 concurrent goroutines each calling into a third-party C library), whereas Lua is designed to treat C code as a first-class environment. (And you have to meet it half way.

To make Lua coroutines integrate with C code which yields, you have to implement your own continuation logic because the C stack isn't preserved when yielding. It's not unlike chaining generators in Python, which requires a little effort. A tolerable issue but doable in the few cases it's necessary in C, whereas in Python and now Perl6 it's _always_ an issue and hinderance.

Greyfox (87712) on Tuesday October 06, 2015 @08:45PM (#50675635) Homepage Journal

Re:Oh no (Score:4, Insightful)

Well you CAN write maintainable code in perl, you just have to use some discipline. Turn "use strict;" on everywhere, break your project up into packages across functional lines and have unit tests on everything. You know, all that stuff that no companies ever do. Given the choice between having to maintain a perl project and a ruby one, I'd take the perl project every time. At least you'll have some chance that the developers wrote some decent code, if only in self defense since they usually end up maintaining it themselves for a few years.

murr (214674) on Tuesday October 06, 2015 @09:13PM (#50675793)

"First Development Release" ? (Score:4, Interesting)

If that was the first development release, what on earth was the 2010 release of Rakudo Star?

The problem with Perl 6 was never a lack of development releases, it's 15 years of NOTHING BUT development releases.

hummassa (157160) on Wednesday October 07, 2015 @12:06AM (#50676619) Homepage Journal

Re:Perl? LOL. (Score:1)


DBI is stable, it just works. I have lots of headache in Every Single One of the database middleware. Except DBI.

Lisandro (799651) on Wednesday October 07, 2015 @03:01AM (#50677185)

Re: Perl? LOL. (Score:3)

Same experience here. Say what you want about Perl 5 but it is still one of the fastest interpreted languages around.

I do a lot of prototyping in Python. But if i want speed, i usually use Perl.

randalware (720317) on Tuesday October 06, 2015 @10:03PM (#50676095) Journal

Perl (Score:5, Insightful)

I used perl a lot over the years.

comparing it to a compiled language (C, Ada, Fortran, etc) or a web centric (java, java script, php, etc) language is not a good comparison.

when I need something done (and needed more than the shell) and I had to maintain it I wrote it in perl all sorts of sysadmin widgets.many are still being used today (15+ years later)
I wrote clean decent code with comments & modules.

  • finding the cpu & disk hogs, by the day, week & month.
  • who was running what when the system crashed.
  • cgi code for low volume web server tasks
  • updating DNS
  • queueing outgoing faxes & saving history
  • rotating log files and saving a limited number of copies.

how much code have you written ? and had it stay running for decades ?

the people that took over my positions when I changed jobs never had a problem updating the code or using it.

bytesex (112972) on Wednesday October 07, 2015 @05:48AM (#50677687) Homepage

Re:Perl? LOL. (Score:5, Insightful)

The cool kids jumped on the python bandwagon saying perl was old, but in all this time they have yet failed to:

  • - created a language that has libraries like perl has,
  • - created a scripting language that can execute sql safely like perl can,
  • - created a language that has regular expression support as part of the syntax (so you don't have to enter in yet another level of indirection and escape all the whatevers ' " \ / when you're trying to simply match some string easily),
  • - created a scripting language that is also fast.

Which are all the reasons I love and use perl.

Persistent variables via state()

perlsub - perldoc.perl.org

Beginning with Perl 5.10.0, you can declare variables with the state keyword in place of my. For that to work, though, you must have enabled that feature beforehand, either by using the feature pragma, or by using -E on one-liners (see feature). Beginning with Perl 5.16, the CORE::state form does not require the feature pragma.

The state keyword creates a lexical variable (following the same scoping rules as my) that persists from one subroutine call to the next. If a state variable resides inside an anonymous subroutine, then each copy of the subroutine has its own copy of the state variable. However, the value of the state variable will still persist between calls to the same copy of the anonymous subroutine. (Don't forget that sub { ... } creates a new subroutine each time it is executed.)

For example, the following code maintains a private counter, incremented each time the gimme_another() function is called:

 
  1. use feature 'state';
  2. sub gimme_another { state $x; return ++$x }
 

And this example uses anonymous subroutines to create separate counters:

 
  1. use feature 'state';
  2. sub create_counter {
  3. return sub { state $x; return ++$x }
  4. }
 

Also, since $x is lexical, it can't be reached or modified by any Perl code outside.

When combined with variable declaration, simple scalar assignment to state variables (as in state $x = 42 ) is executed only the first time. When such statements are evaluated subsequent times, the assignment is ignored. The behavior of this sort of assignment to non-scalar variables is undefined.

Persistent variables with closures

Just because a lexical variable is lexically (also called statically) scoped to its enclosing block, eval, or do FILE, this doesn't mean that within a function it works like a C static. It normally works more like a C auto, but with implicit garbage collection.

Unlike local variables in C or C++, Perl's lexical variables don't necessarily get recycled just because their scope has exited. If something more permanent is still aware of the lexical, it will stick around. So long as something else references a lexical, that lexical won't be freed--which is as it should be. You wouldn't want memory being free until you were done using it, or kept around once you were done. Automatic garbage collection takes care of this for you.

This means that you can pass back or save away references to lexical variables, whereas to return a pointer to a C auto is a grave error. It also gives us a way to simulate C's function statics. Here's a mechanism for giving a function private variables with both lexical scoping and a static lifetime. If you do want to create something like C's static variables, just enclose the whole function in an extra block, and put the static variable outside the function but in the block.

 
  1. {
  2. my $secret_val = 0;
  3. sub gimme_another {
  4. return ++$secret_val;
  5. }
  6. }
  7. # $secret_val now becomes unreachable by the outside
  8. # world, but retains its value between calls to gimme_another
 

If this function is being sourced in from a separate file via require or use, then this is probably just fine. If it's all in the main program, you'll need to arrange for the my to be executed early, either by putting the whole block above your main program, or more likely, placing merely a BEGIN code block around it to make sure it gets executed before your program starts to run:

 
  1. BEGIN {
  2. my $secret_val = 0;
  3. sub gimme_another {
  4. return ++$secret_val;
  5. }
  6. }
 

See BEGIN, UNITCHECK, CHECK, INIT and END in perlmod about the special triggered code blocks, BEGIN , UNITCHECK , CHECK , INIT and END .

If declared at the outermost scope (the file scope), then lexicals work somewhat like C's file statics. They are available to all functions in that same file declared below them, but are inaccessible from outside that file. This strategy is sometimes used in modules to create private variables that the whole module can see.

[Jun 11, 2015] The Fall Of Perl, The Web's Most Promising Language By Conor Myhrvold

Rumors about Perl death are greatly exaggerated ;-). some people like me are not attracted to Python -- for many reasons. Perl is more flexible and in Unix scripting area more powerful, higher level language. Some aspects of Python I like, but all-in-all I stay with Perl.
Fast Company Business + Innovation

And the rise of Python. Does Perl have a future?

I first heard of Perl when I was in middle school in the early 2000s. It was one of the worlds most versatile programming languages, dubbed the Swiss army knife of the Internet. But compared to its rival Python, Perl has faded from popularity. What happened to the web's most promising language?

Perl's low entry barrier compared to compiled, lower level language alternatives (namely, C) meant that Perl attracted users without a formal CS background (read: script kiddies and beginners who wrote poor code). It also boasted a small group of power users ("hardcore hackers") who could quickly and flexibly write powerful, dense programs that fueled Perls popularity to a new generation of programmers.

A central repository (the Comprehensive Perl Archive Network, or CPAN) meant that for every person who wrote code, many more in the Perl community (the Programming Republic of Perl) could employ it. This, along with the witty evangelism by eclectic creator Larry Wall, whose interest in language ensured that Perl led in text parsing, was a formula for success during a time in which lots of text information was spreading over the Internet.

As the 21st century approached, many pearls of wisdom were wrought to move and analyze information on the web. Perl did have a learning curveoften meaning that it was the third or fourth language learned by adoptersbut it sat at the top of the stack.

"In the race to the millennium, it looks like C++ will win, Java will place, and Perl will show," Wall said in the third State of Perl address in 1999. "Some of you no doubt will wish we could erase those top two lines, but I don't think you should be unduly concerned. Note that both C++ and Java are systems programming languages. They're the two sports cars out in front of the race. Meanwhile, Perl is the fastest SUV, coming up in front of all the other SUVs. It's the best in its class. Of course, we all know Perl is in a class of its own."

Then came the upset.

The Perl vs. Python Grudge Match

Then Python came along. Compared to Perls straight-jacketed scripting, Python was a lopsided affair. It even took after its namesake, Monty Pythons Flying Circus. Fittingly, most of Walls early references to Python were lighthearted jokes at its expense.

Well, the millennium passed, computers survived Y2K, and my teenage years came and went. I studied math, science, and humanities but kept myself an arms distance away from typing computer code. My knowledge of Perl remained like the start of a new text file: cursory, followed by a lot of blank space to fill up.

In college, CS friends at Princeton raved about Python as their favorite language (in spite of popular professor Brian Kernighan on campus, who helped popularize C). I thought Python was new, but I later learned it was around when I grew up as well, just not visible on the charts.

By the late 2000s Python was not only the dominant alternative to Perl for many text parsing tasks typically associated with Perl (i.e. regular expressions in the field of bioinformatics) but it was also the most proclaimed popular language, talked about with elegance and eloquence among my circle of campus friends, who liked being part of an up-and-coming movement.

Side By Side Comparison: Binary Search

Despite Python and Perls well documented rivalry and design decision differenceswhich persist to this daythey occupy a similar niche in the programming ecosystem. Both are frequently referred to as "scripting languages," even though later versions are retro-fitted with object oriented programming (OOP) capabilities.

Stylistically, Perl and Python have different philosophies. Perls best known mottos is " Theres More Than One Way to Do It". Python is designed to have one obvious way to do it. Pythons construction gave an advantage to beginners: A syntax with more rules and stylistic conventions (for example, requiring whitespace indentations for functions) ensured newcomers would see a more consistent set of programming practices; code that accomplished the same task would look more or less the same. Perls construction favors experienced programmers: a more compact, less verbose language with built-in shortcuts which made programming for the expert a breeze.

During the dotcom era and the tech recovery of the mid to late 2000s, high-profile websites and companies such as Dropbox (Python) and Amazon and Craigslist (Perl), in addition to some of the worlds largest news organizations (BBC, Perl) used the languages to accomplish tasks integral to the functioning of doing business on the Internet.

But over the course of the last 15 years, not only how companies do business has changed and grown, but so have the tools they use to have grown as well, unequally to the detriment of Perl. (A growing trend that was identified in the last comparison of the languages, "A Perl Hacker in the Land of Python," as well as from the Python side a Pythonistas evangelism aggregator, also done in the year 2000.)

Perl's Slow Decline

Today, Perls growth has stagnated. At the Orlando Perl Workshop in 2013, one of the talks was titled "Perl is not Dead, It is a Dead End," and claimed that Perl now existed on an island. Once Perl programmers checked out, they always left for good, never to return. Others point out that Perl is left out of the languages to learn firstin an era where Python and Java had grown enormously, and a new entrant from the mid-2000s, Ruby, continues to gain ground by attracting new users in the web application arena (via Rails), followed by the Django framework in Python (PHP has remained stable as the simplest option as well).

In bioinformatics, where Perls position as the most popular scripting language powered many 1990s breakthroughs like genetic sequencing, Perl has been supplanted by Python and the statistical language R (a variant of S-plus and descendent of S, also developed in the 1980s).

In scientific computing, my present field, Python, not Perl, is the open source overlord, even expanding at Matlabs expense (also a child of the 1980s, and similarly retrofitted with OOP abilities). And upstart PHP grew in size to the point where it is now arguably the most common language for web development (although its position is dynamic, as Ruby and Python have quelled PHPs dominance and are now entrenched as legitimate alternatives.)

While Perl is not in danger of disappearing altogether, it is in danger of losing cultural relevance, an ironic fate given Walls love of language. How has Perl become the underdog, and can this trend be reversed? (And, perhaps more importantly, will Perl 6 be released!?)

How I Grew To Love Python

Why Python, and not Perl? Perhaps an illustrative example of what happened to Perl is my own experience with the language.

In college, I still stuck to the contained environments of Matlab and Mathematica, but my programming perspective changed dramatically in 2012. I realized lacking knowledge of structured computer code outside the "walled garden" of a desktop application prevented me from fully simulating hypotheses about the natural world, let alone analyzing data sets using the web, which was also becoming an increasingly intellectual and financially lucrative skill set.

One year after college, I resolved to learn a "real" programming language in a serious manner: An all-in immersion taking me over the hump of knowledge so that, even if I took a break, I would still retain enough to pick up where I left off. An older alum from my college who shared similar interestsand an experienced programmer since the late 1990sconvinced me of his favorite language to sift and sort through text in just a few lines of code, and "get things done": Perl. Python, he dismissed, was what "what academics used to think." I was about to be acquainted formally.

Before making a definitive decision on which language to learn, I took stock of online resources, lurked on PerlMonks, and acquired several used OReilly books, the Camel Book and the Llama Book, in addition to other beginner books. Yet once again, Python reared its head, and even Perl forums and sites dedicated to the language were lamenting the digital siege their language was succumbing to. What happened to Perl? I wondered. Ultimately undeterred, I found enough to get started (quality over quantity, I figured!), and began studying the syntax and working through examples.

But it was not to be. In trying to overcome the engineered flexibility of Perls syntax choices, I hit a wall. I had adopted Perl for text analysis, but upon accepting an engineering graduate program offer, switched to Python to prepare.

By this point, CPANs enormous advantage had been whittled away by ad hoc, hodgepodge efforts from uncoordinated but overwhelming groups of Pythonistas that now assemble in Meetups, at startups, and on college and corporate campuses to evangelize the Zen of Python. This has created a lot of issues with importing (pointed out by Wall), and package download synchronizations to get scientific computing libraries (as I found), but has also resulted in distributions of Python such as Anaconda that incorporate the most important libraries besides the standard library to ease the time tariff on imports.

As if to capitalize on the zeitgiest, technical book publisher O'Reilly ran this ad, inflaming Perl devotees.

By 2013, Python was the language of choice in academia, where I was to return for a year, and whatever it lacked in OOP classes, it made up for in college classes. Python was like Google, who helped spread Python and employed van Rossum for many years. Meanwhile, its adversary Yahoo (largely developed in Perl) did well, but comparatively fell further behind in defining the future of programming. Python was the favorite and the incumbent; roles had been reversed.

So after six months of Perl-making effort, this straw of reality broke the Perl camels back and caused a coup that overthrew the programming Republic which had established itself on my laptop. I sheepishly abandoned the llama. Several weeks later, the tantalizing promise of a new MIT edX course teaching general CS principles in Python, in addition to numerous n00b examples, made Perls syntax all too easy to forget instead of regret.

Measurements of the popularity of programming languages, in addition to friends and fellow programming enthusiasts I have met in the development community in the past year and a half, have confirmed this trend, along with the rise of Ruby in the mid-2000s, which has also eaten away at Perls ubiquity in stitching together programs written in different languages.

While historically many arguments could explain away any one of these studiesperhaps Perl programmers do not cheerlead their language as much, since they are too busy productively programming. Job listings or search engine hits could mean that a programming language has many errors and issues with it, or that there is simply a large temporary gap between supply and demand.

The concomitant picture, and one that many in the Perl community now acknowledge, is that Perl is now essentially a second-tier language, one that has its place but will not be the first several languages known outside of the Computer Science domain such as Java, C, or now Python.

The Future Of Perl (Yes, It Has One)

I believe Perl has a future, but it could be one for a limited audience. Present-day Perl is more suitable to users who have worked with the language from its early days, already dressed to impress. Perls quirky stylistic conventions, such as using $ in front to declare variables, are in contrast for the other declarative symbol $ for practical programmers todaythe money that goes into the continued development and feature set of Perls frenemies such as Python and Ruby. And the high activation cost of learning Perl, instead of implementing a Python solution.

Ironically, much in the same way that Perl jested at other languages, Perl now finds itself at the receiving end. Whats wrong with Perl, from my experience? Perls eventual problem is that if the Perl community cannot attract beginner users like Python successfully has, it runs the risk of become like Children of Men, dwindling away to a standstill; vast repositories of hieroglyphic code looming in sections of the Internet and in data center partitions like the halls of the Mines of Moria. (Awe-inspiring and historical? Yes. Lively? No.)

Perl 6 has been an ongoing development since 2000. Yet after 14 years it is not officially done, making it the equivalent of Chinese Democracy for Guns N Roses. In Larry Walls words: "We're not trying to make Perl a better language than C++, or Python, or Java, or JavaScript. We're trying to make Perl a better language than Perl. That's all." Perl may be on the same self-inflicted path to perfection as Axl Rose, underestimating not others but itself. "All" might still be too much.

Absent a game-changing Perl release (which still could be "too little, too late") people who learn to program in Python have no need to switch if Python can fulfill their needs, even if it is widely regarded as second or third best in some areas. The fact that you have to import a library, or put up with some extra syntax, is significantly easier than the transactional cost of learning a new language and switching to it. So over time, Pythons audience stays young through its gateway strategy that van Rossum himself pioneered, Computer Programming for Everybody. (This effort has been a complete success. For example, at MIT Python replaced Scheme as the first language of instruction for all incoming freshman, in the mid-2000s.)

Python Plows Forward

Python continues to gain footholds one by one in areas of interest, such as visualization (where Python still lags behind other language graphics, like Matlab, Mathematica, or the recent d3.js), website creation (the Django framework is now a mainstream choice), scientific computing (including NumPy/SciPy), parallel programming (mpi4py with CUDA), machine learning, and natural language processing (scikit-learn and NLTK) and the list continues.

While none of these efforts are centrally coordinated by van Rossum himself, a continually expanding user base, and getting to CS students first before other languages (such as even Java or C), increases the odds that collaborations in disciplines will emerge to build a Python library for themselves, in the same open source spirit that made Perl a success in the 1990s.

As for me? Im open to returning to Perl if it can offer me a significantly different experience from Python (but "being frustrating" doesnt count!). Perhaps Perl 6 will be that release. However, in the interim, I have heeded the advice of many others with a similar dilemma on the web. Ill just wait and C

[Feb 28, 2012] Perl Books for modern Perl programming by Chromatic

February 28, 2012

We've just put letter and A4 sized PDFs of Modern Perl: the Book online. This is the new edition, updated for 5.14 and 2011-2012.

As usual, these electronic versions are free to download. Please do. Please share them with friends, family, colleagues, coworkers, and interested people.

Of course we're always thrilled if you buy a printed copy of Modern Perl: the book. Yet even if you don't, please share a copy with your friends, tell other people about it, and (especially) post kind reviews far and wide.

We'd love to see reviews on places like Slashdot, LWN, any bookseller, and any other popular tech site.

We're working on other forms, like ePub and Kindle. That's been the delay (along with personal business); the previous edition's Kindle formatting didn't have the quality we wanted, so we're doing it ourselves to get things right. I hope to have those available in the next couple of weeks, but that depends on how much more debugging we have to do.

Thanks, as always, for reading.

[Jan 18, 2012] I am looking forward to learn perl

LinkedIn

Q: Hi, I'm looking forward to learn perl, I m a systems administrator ( unix ) .I'm intrested in an online course, any recommendations would be highly appreciated. Syed

A: I used to teach sysadmins Perl in corporate environment and I can tell you that the main danger of learning Perl for system administrator is overcomplexity that many Perl books blatantly sell. In this sense anything written by Randal L. Schwartz is suspect and Learning Perl is a horrible book to start. I wonder how many sysadmins dropped Perl after trying to learn from this book

See http://www.softpanorama.org/Bookshelf/perl.shtml

It might be that the best way is to try first to replace awk in your scripts with Perl. And only then gradually start writing full-blown Perl scripts. For inspiration you can look collection on Perl one-liners but please beware that some (many) of them are way too clever to be useful. Useless overcomplexity rules here too.

I would also recommend to avoid OO features on Perl that many books oversell. A lot can be done using regular Algol-style programming with subroutines and by translating awk into Perl. OO has it uses but like many other programming paradigms it is oversold.

Perl is very well integrated in Unix (better then any of the competitors) and due to this it opens for sysadmin levels of productivity simply incomparable with those levels that are achievable using shell. You can automate a lot of routine work and enhance existing monitoring systems and such with ease if you know Perl well.

[Jul 27, 2011] PAC

freshmeat.net
PAC provides a GUI to configure SSH and Telnet connections, including usernames, passwords, EXPECT regular expressions, and macros.

It is similar in function to SecureCRT or Putty. It is intended for people who connect to many servers through SSH.

It can automate logins and command executions.

Tags Perl GTK+ SSH Telnet GNOME Ubuntu Expect
Licenses GPLv3
Operating Systems Linux Ubuntu Debian
Implementation Perl GTK+ Expect
Translations English

[Mar 18, 2011] New Perl news site launches by Ranguard

March 17, 2011
http://perlnews.org/ has just launched and will be providing a source for major announcements related to The Perl Programming Language (http://www.perl.org/). Find out more at http://perlnews.org/about/ - or if you have a story submit it http://perlnews.org/submit/.

All stories are approved to ensure relevance.

Thanks, The Perl News Team.

[Mar 17, 2011] Stupid "make" Tricks: Workflow Control with "make" by Mark Leighton Fisher

March 16, 2011 | blogs.perl.org

Following up Stupid Unix Tricks: Workflow Control with GNU Make -- this trick works on any platform with a make(1) program, including Windows, QNX, VMS, and z/OS.

It also serves to de-couple dependency checking and the workflow execution engine from the rest of your program (with the caveat that your program may need to interpret the output from make(1).)

[Feb 16, 2011] Perl-Critic freshmeat.net

The problem is that Damian Conway's book Perl Best Practices contains a lot of questionable advice ;-)

Perl::Critic is an extensible framework for creating and applying coding standards to Perl source code.

Essentially, it is a static source code analysis engine. It is distributed with a number of Perl::Critic::Policy modules that attempt to enforce various coding guidelines.

Most Policy modules are based on Damian Conway's book Perl Best Practices. However, Perl::Critic is not limited to PBP, and will even support Policies that contradict Conway. You can enable, disable, and customize those Polices through the Perl::Critic interface. You can also create new Policy modules that suit your own tastes

Tags: Perl Admin Tools, Perl, Programming style, Program Understanding,

[Feb 15, 2011] PAC 2.5.5.4

Used for implementation Perl GTK+ SSH Telnet GNOME & Expect

PAC provides a GUI to configure SSH and Telnet connections, including usernames, passwords, EXPECT regular expressions, and macros. It is similar in function to SecureCRT or Putty.

It is intended for people who connect to many servers through SSH.

It can automate logins and command executions.

Selected Comments

archenroot

Well the PAC is really just nice piece of software, but I experience 2 issues: - no support of using the ssh keys, but I can live without that - when I open terminal and switch to another application just opened on the desktop (Gnome type), I cann't directly use PAC for some time (about 10-20 seconds) then it itself refreshes and I can continue using opened terminal. Do you have any suggestion what this schould be about?

Thank you .

[Jan 21, 2011] Which language is best for system admin except for shell LinkedIn

While I personally prefer Perl, paradoxically the answer is "it does not matter", if and only if the aspiring sysadmin pays sufficient attention to Unix philosophy. See also Unix philosophy - Wikipedia, the free encyclopedia
Lisa Penland

@Garry - I began my career on sys5 rel 3. I'm rather partial to ksh. Have to say though - I tend to avoid AIX like the plague..and THAT's a personal preference.

@robin - there's only one topic more likely to incite a flame war - and that's "what's the best editor"

If you are really interested in becoming a *nix sysadmin be certain to check out the unix philosophy as well. It's not enough to understand the mechanical how...a good admin understands the whys. Anyone on this list can help you learn the steps to perform x function. It's much harder to teach someone how to think like an admin. http://www.faqs.org/docs/artu/ch01s06.html

Nikolai Bezroukov

Many years ago I wrote a small tutorial "Introduction to Perl for Unix System Administrators"
which covers basic features of the language. Might be useful for some people here.

http://www.softpanorama.org/Scripting/Perlbook/index.shtml

As for Perl vs Python vs Ruby this kind of religious discussion is funny and entertaining but Lisa Penland made a very important point: "If you are really interested in becoming a *nix sysadmin be certain to check out the Unix philosophy as well."

-- Nikolai

P.S. There is one indisputable fact on the ground that we need to remember discussing this topic in context of enterprise Unix environment:

  • Perl is installed by default on all enterprise flavors of Unix (I mean RHEL, Suse, Solaris 9 & 10, HP-UX 10 & 11, AIX 5 & 6
  • Python is installed by default on Linux (that means RHEL and Suse).
  • Ruby is not installed by default on any enterprise Unix

Typically in a large corporation sysadmin need to support two or more flavors of Unix. In many organizations installation of additional scripting language on all Unix boxes is a formidable task that requires political support on pretty high level of hierarchy. My God, even to make bash available on all boxes is an almost impossible task :-).

[Jan 15, 2011] When

When is an extremely simple personal calendar program, aimed at the Unix geek who wants something minimalistic.

It can keep track of things you need to do on particular dates.

It's a very short and simple program, so you can easily tinker with it yourself.

It doesn't depend on any libraries, so it's easy to install. You should be able to install it on any system where Perl is available, even if you don't have privileges for installing libraries. Its file format is a simple text file, which you can edit in your favorite editor.

[Jan 14, 2011] Brian Kernighan - Random thoughts on scripting languages

PDF. Pretty superficial. I would expect better from the famous author...

Other scripting languages

[Dec 25, 2010] 23 Years of Culture Hacking With Perl -

December 25 | Slashdot

Greg Lindahl:

Blekko's search engine and NoSQL database are written in Perl. We haven't had problems hiring experienced, smart Perl people, and we've also had no trouble getting experienced Python people to learn Perl.

grcumb: Re:Rambling, barely coherent, self-indulgent.

I suppose I learned a lot about the Perl community though.

Larry may sound glib most of the time, but if you took the time to look, you'd see method in his madness. He chooses to make his points lightly, because that's an important part of the message. Perl as a language is designed to reflect the idiosyncrasies of the human brain. It treats dogmatism as damage and routes around it. As Larry wrote, it is trollish in its nature. But its friendly, playful brand of trollishness is what allows it to continue to evolve as a culture.

Strip away the thin veneer of sillyness and you'll see that everything I've written has been lifted directly from Larry's missive. Just because he likes to act a little silly doesn't mean he's wrong.

One of the worst things a programmer can do is invest too much ego, pride or seriousness in his work. That is the path to painfully over-engineered, theoretically correct but practically useless software that often can't survive a single revision.

Perl as a language isn't immune to any of these sins, but as a culture, it goes to some lengths to mitigate against them.

[Dec 25, 2010] Day 24 Yule the Ancient Troll-tide Carol

Perl 6 Advent Calendar

Perl is not just a technology; its a culture. Just as Perl is a technology of technological hacking, so too Perl is a culture of cultural hacking. Giving away a free language implementation with community support was the first cultural hack in Perls history, but there have been many others since, both great and small. You can see some of those hacks in that mirror you are holding. Erthat is holding you.

The second big cultural hack was demonstrating to Unix culture that its reductionistic ideas could be subverted and put to use in a non-reductionistic setting.

Dual licensing was a third cultural hack to make Perl acceptable both to businesses and the FSF.

Yet another well-known hack was writing a computer book that was not just informative but also, gasp, entertaining! But these are all shallow hacks. The deep hack was to bootstrap an entire community that is continually hacking on itself recursively in a constructive way (well, usually).

[Dec 20, 2010] What hurts you the most in Perl

Overcomplexity junkies in Perl. That's a very important threat that can undermine the language future.
LinkedIn

Steve Carrobis

Perl is a far better applications type language than JAVA/C/C#. Each has their niche. Threads were always an issue in Perl, and like OO, if you don't need it or know it don't use it. My issues with Perl is when people get Overly Obfuscated with their code because the person thinks that less characters and a few pointers makes the code faster. Unless you do some real smart OOesque building all you are doing is making it harder to figure out what you were thinking about. and please perl programmers, don't by into the "self documenting code" i am an old mainframer and self documenting code was as you wrote you added comments to the core parts of the code ... i can call my subroutine "apple" to describe it.. but is it really an apple? or is it a tomato or pomegranate. If written properly Perl is very efficient code. and like all the other languages if written incorrectly its' HORRIBLE. I have been writing perl since almost before 3.0 ;-)

That's my 3 cents.. Have a HAPPY and a MERRY!

Nikolai Bezroukov

@steve Thanks for a valuable comment about the threat of overcomplexity junkies in Perl. That's a very important threat that can undermine the language future.

@Gabor: A well know fact is that PHP, which is a horrible language both as for general design and implementation of most features you mentioned is very successful and is widely used on for large Web applications with database backend (Mediawiki is one example). Also if we think about all dull, stupid and unrelaible Java coding of large business applications that we see on the marketplace the question arise whether we want this type of success ;-)

@Douglas: Mastering Perl requires slightly higher level of qualification from developers then "Basic-style" development in PHP or commercial Java development (where Java typically plays the role of Cobol) which is mainstream those days. Also many important factors are outside technical domain: ecosystem for Java is tremendous and is supported by players with deep pockets. Same is true for Python. Still Perl has unique advantages, is universally deployed on Unix and as such is and always will be attractive for thinking developers

I think that for many large business applications which in those days often means Web application with database backend one can use virtual appliance model and use OS facilities for multitasking. Nothing wrong with this approach on modern hardware. Here Perl provides important advantages due to good integration with Unix.

Also structuring of a large application into modules using pipes and sockets as communication mechanism often provides very good maintainability. Pointers are also very helpful and unique for Perl. Typically scripting languages do not provide pointers. Perl does and as such gives the developer unique power and flexibility (with additional risks as an inevitable side effect).

Another important advantage of Perl is that it is a higher level language then Python (to say nothing about Java ) and stimulates usage of prototyping which is tremendously important for large projects as the initial specification is usually incomplete and incorrect. Also despite proliferation of overcomplexity junkies in Perl community, some aspects of Perl prevent excessive number of layers/classes, a common trap that undermines large projects in Java. Look at IBM fiasco with Lotus Notes 8.5.

I think that Perl is great in a way it integrates with Unix and promote thinking of complex applications as virtual appliances. BTW this approach also permits usage of a second language for those parts of the system for which Perl does not present clear advantages.

Also Perl provide an important bridge to system administrators who often know the language and can use subset of it productively. That makes it preferable for large systems which depend on customization such as monitoring systems.

Absence of bytecode compiler hurts development of commercial applications in Perl in more ways than one but that's just question of money. I wonder why ActiveState missed this opportunity to increase its revenue stream. I also agree that the quality of many CPAN modules can be improved but abuse of CPAN along with fixation on OO is a typical trait of overcomplexity junkies so this has some positive aspect too :-).

I don't think that OO is a problem for Perl, if you use it where it belongs: in GUI interfaces. In many cases OO is used when hierarchical namespaces are sufficient. Perl provides a clean implementation of the concept of namespaces. The problem is that many people are trained in Java/C++ style of OO and as we know for hummer everything looks like a nail. ;-)

Allan Bowhill:

I think the original question Gabor posed implies there is a problem 'selling' Perl to companies for large projects. Maybe it's a question of narrowing its role.

It seems to me that if you want an angle to sell Perl on, it would make sense to cast it (in a marketing sense) into a narrower role that doesn't pretend to be everything to everyone. Because, despite what some hard-core Perl programmers might say, the language is somewhat dated. It hasn't really changed all that much since the 1990s.

Perl isn't particularly specialized so it has been used historically for almost every kind of application imaginable. Since it was (for a long time in the dot-com era) a mainstay of IT development (remember the 'duct tape' of the internet?) it gained high status among people who were developing new systems in short time-frames. This may in fact be one of the problems in selling it to people nowadays.

The FreeBSD OS even included Perl as part of their main (full) distribution for some time and if I remember correctly, Perl scripts were included to manage the ports/packaging system for all the 3rd party software. It was taken out of the OS shortly after the bust and committee reorganization at FreeBSD, where it was moved into third-party software. The package-management scripts were re-written in C. Other package management utilities were effectively displaced by a Ruby package.

A lot of technologies have come along since the 90s which are more appealing platforms than Perl for web development, which is mainly what it's about now.

If you are going to build modern web sites these days, you'll more than likely use some framework that utilizes object-oriented languages. I suppose the Moose augmentation of Perl would have some appeal with that, but CPAN modules and addons like Moose are not REALLY the Perl language itself. So if we are talking about selling the Perl language alone to potential adopters, you have to be honest in discussing the merits of the language itself without all the extras.

Along those lines I could see Perl having special appeal being cast in a narrower role, as a kind of advanced systems batching language - more capable and effective than say, NT scripting/batch files or UNIX shell scripts, but less suitable than object-oriented languages, which pretty much own the market for web and console utilities development now.

But there is a substantial role for high-level batching languages, particularly in systems that build data for consumption by other systems. These are traditionally implemented in the highest-level batching language possible. Such systems build things like help files, structured (non-relational) databases (often used on high-volume commercial web services), and software. Not to mention automation many systems administration tasks.

There are not too many features or advantages to Perl that are unique in itself in the realm of scripting languages, as they were in the 90s. The simplicity of built-in Perl data structures and regular expression capabilities are reflected almost identically in Ruby, and are at least accessible in other strongly-typed languages like Java and C#.

The fact that Perl is easy to learn, and holds consistent with the idea that "everything is a string" and there is no need to formalize things into an object-oriented model are a few of its selling points. If it is cast as an advanced batching language, there are almost no other languages that could compete with it in that role.

Dean Hamstead:

@Pascal: bytecode is nasty for the poor Sysadmin/Devop who has to run your code. She/he can never fix it when bugs arise. There is no advantage to bytecode over interpreted.

Which infact leads me to a good point.

All the 'selling points' of Java have all failed to be of any real substance.

  • Cross-platform? vendor applications are rarely supported on more than one platform, and rarely will work on any other platform.
  • Bytecode - hasnt proved to provide any performance advantage, but merely made peoples lives more difficult.
  • Object Oriented - it was new and cool, but even Java fails to be a 'pure' OO language.

In truth, Java is popular because it is popular.

Lots of people don't like perl because its not popular any more. Similar to how lots of people hate Mac's but have no logical reason for doing so.

Douglas is almost certainly right, that Python is rapidly becoming the new fad language.

I'm not sure how perl OO is a 'hack'. When you bless a reference in to an object it becomes and object... I can see that some people are confused by perl's honesty about what an object is. Other languages attempt to hide away how they have implemented objects in their compiler - who cares? Ultimately the objects are all converted in to machine code and executed.

In general perl objects are more object oriented than java objects. They are certainly more polymorphic.

Perl objects can fully hide their internals if thats something you want to do. Its not even hard, and you don't need to use moose. But does it afford any real benefit? Not really.

At the end of the day, if you want good software you need to hire good programmers it has nothing to do with the language. Even though some languages try to force the code to be neat (Python) and try to force certain behaviours (Java?) you can write complete garbage in any of them, then curse that language for allowing the author to do so.

A syntactic argument is pointless. As is something oriented around OO. What benefits perl brings to a business are...

- massive centralised website of libraries (CPAN)
- MVC's
- DBI
- POE
- Other frameworks etc
- automated code review (perlcritic)
- automated code formatting and tidying (perltidy)
- document as you code (POD)
- natural test driven development (Test::More etc)
- platform independence
- perl environments on more platforms than java
- perl comes out of the box on every unix
- excellent canon of printed literature, from beginner to expert
- common language with Sysadmin/Devops and traditional developers roles (with source code always available to *fix* them problem quickly, not have to try to set up an ant environment with and role a new War file)
- rolled up perl applications (PAR files)
- Perl can use more than 3.6gig of ram (try that in java)

Brian Martin

Well said Dean.

Personally, I don't really care if a system is written in Perl or Python or some other high level language, I don't get religious about which high level language is used.

There are many high level languages, any one of them is vastly more productive & consequently less buggy than developing in a low level language like C or Java. Believe me, I have written more vanilla C code in my career than Perl or Python, by a factor of thousands, yet I still prefer Python or Perl as quite simply a more succinct expression of the intended algorithm.

If anyone wants to argue the meaning of "high level", well basically APL wins ok. In APL, to invert a matrix is a single operator. If you've never had to implement a matrix inversion from scratch, then you've never done serious programming. Meanwhile, Python or Perl are pretty convenient.

What I mean by a "high level language" is basically how many pages of code does it take to play a decent game of draughts (chequers), or chess ?

  • In APL you can write a reasonable draughts player in about 2 pages.
  • In K&R C (not C++) you can write a reasonable Chess player in about 10-20 pages.

[Jun 10, 2010] Deep-protocol analysis of UNIX networks

Jun 08, 2010 | developerWorks
Parsing the raw data to understand the content

Another way to process the content from tcpdump is to save the raw network packet data to a file and then process the file to find and decode the information that you want.

There are a number of modules in different languages that provide functionality for reading and decoding the data captured by tcpdump and snoop. For example, within Perl, there are two modules: Net::SnoopLog (for snoop) and Net::TcpDumpLog (for tcpdump). These will read the raw data content. The basic interfaces for both of these modules is the same.

To start, first you need to create a binary record of the packets going past on the network by writing out the data to a file using either snoop or tcpdump. For this example, we'll use tcpdump and the Net::TcpDumpLog module: $ tcpdump -w packets.raw.

Once you have amassed the network data, you can start to process the network data contents to find the information you want. The Net::TcpDumpLog parses the raw network data saved by tcpdump. Because the data is in it's raw binary format, parsing the information requires processing this binary data. For convenience, another suite of modules, NetPacket::*, provides decoding of the raw data.

For example, Listing 8 shows a simple script that prints out the IP address information for all of the packets.


Listing 8. Simple script that prints out the IP address info for all packets
use Net::TcpDumpLog;
    
use NetPacket::Ethernet;
    
use NetPacket::IP;

    
my $log = Net::TcpDumpLog->new();
 
$log->read("packets.raw");
 
 
foreach my $index ($log->indexes)
       
{
    
    my $packet = $log->data($index);
           

    my $ethernet = NetPacket::Ethernet->decode($packet);

  
    if ($ethernet->{type} == 0x0800)
       
    {
    
        my $ip = NetPacket::IP->decode($ethernet->{data});
          

    
        printf("  %s to %s protocol %s \n",
               $ip->{src_ip},$ip->{dest_ip},$ip->{proto});
   }

} 
The first part is to extract each packet. The Net::TcpDumpLog module serializes each packet, so that we can read each packet by using the packet ID. The data() method then returns the raw data for the entire packet.

As with the output from snoop, we have to extract each of the blocks of data from the raw network packet information. So in this example, we first need to extract the ethernet packet, including the data payload, from the raw network packet. The NetPacket::Ethernet module does this for us.

Since we are looking for IP packets, we can check for IP packets by looking at the Ethernet packet type. IP packets have an ID of 0x0800.

The NetPacket::IP module can then be used to extract the IP information from the data payload of the Ethernet packet. The module provides the source IP, destination IP and protocol information, among others, which we can then print.

Using this basic framework you can perform more complex lookups and decoding that do not rely on the automated solutions provided by tcpdump or snoop. For example, if you suspect that there is HTTP traffic going past on a non-standard port (i.e., not port 80), you could look for the string HTTP on ports other than 80 from the suspected host IP using the script in Listing 9.


Listing 9. Looking for strong HHTP on ports other than 80
use Net::TcpDumpLog;
    
use NetPacket::Ethernet;
    
use NetPacket::IP;
    
use NetPacket::TCP;
    

    
my $log = Net::TcpDumpLog->new();
       
$log->read("packets.raw");
       

    
foreach my $index ($log->indexes)
       
{
    
    my $packet = $log->data($index);
       

    
    my $ethernet = NetPacket::Ethernet->decode($packet);
       

    
    if ($ethernet->{type} == 0x0800)
       
    {
    
        my $ip = NetPacket::IP->decode($ethernet->{data});
          

    
        if ($ip->{src_ip} eq '192.168.0.2')
       
        {
    
            if ($ip->{proto} == 6)
       
            {
    
                my $tcp = NetPacket::TCP->decode($ip->{data});
       
                if (($tcp->{src_port} != 80) &&
               
                    ($tcp->{data} =~ m/HTTP/))
       
                {
    
                    print("Found HTTP traffic on non-port 80\n");
    
                    printf("%s (port: %d) to %s (port: %d)\n%s\n",
    
                           $ip->{src_ip},
       
                           $tcp->{src_port},
       
                           $ip->{dest_ip},
       
                           $tcp->{dest_port},
       
                           $tcp->{data});
 
                }
    
            }
    
        }
    
   }
    
}

Running the above script on a sample packet set returned the following shown in Listing 10.

Listing 10. Running the script on a sample packet set
$ perl http-non80.pl
Found HTTP traffic on non-port 80
192.168.0.2 (port: 39280) to 168.143.162.100 (port: 80)
GET /statuses/user_timeline.json HTTP/1.1
Found HTTP traffic on non-port 80
192.168.0.2 (port: 39282) to 168.143.162.100 (port: 80)
GET /statuses/friends_timeline.json HTTP/1

In this particular case we're seeing traffic from the host to an external website (Twitter).

Obviously, in this example, we are dumping out the raw data, but you could use the same basic structure to decode and the data in any format using any public or proprietary protocol structure. If you are using or developing a protocol using this method, and know the protocol format, you could extract and monitor the data being transferred.

[Jun 10, 2010] ack -- better than grep, a power search tool for programmers

Latest version of ack: 1.92, December 11, 2009

ack is a tool like grep, designed for programmers with large trees of heterogeneous source code.

ack is written purely in Perl, and takes advantage of the power of Perl's regular expressions.

How to install ack

It can be installed any number of ways:

Ack in Project for Textmate users

Users of TextMate, the programmer's editor for the Mac, can use the Ack in Project plugin by Trevor Squires:

TextMate users know just how slow its Find in Project can be with large source trees. Thats why you need "ack-in-project" a TextMate bundle that uses the super-speedy ack tool to search your code FAST. It gives you beautiful, clickable results just as fast as "ack" can find them. Check it out at: http://github.com/protocool/ack-tmbundle/tree/master

Testimonials

"Whoa, this is *so* much better than grep it's not even funny." -- Jacob Kaplan-Moss, creator of Django.

"Thanks for creating ack and sharing it with the world. It makes my days just a little more pleasant. I'm glad to have it in my toolbox. That installation is as simple as downloading the standalone version and chmodding is a nice touch." -- Alan De Smet

"I came across ack today, and now grep is sleeping outside. It's very much like grep, except it assumes all the little things that you always wanted grep to remember, but that it never did. It actually left the light on for you, and put the toilet seat down." -- Samuel Huckins

"ack is the best tool I have added to my toolbox in the past year, hands down." -- Bill Mill on reddit

"I use it all the time and I can't imagine how I managed with only grep." -- Thomas Thurman

"This has been replacing a Rube Goldberg mess of find/grep/xargs that I've been using to search source files in a fairly large codebase." -- G. Wade Johnson

"You had me at --thpppt." -- John Gruber, Daring Fireball

"Grepping of SVN repositories was driving me crazy until I found ack. It fixes all of my grep annoyances and adds features I didn't even know I wanted." -- Paul Prescod

"I added ack standalone to our internal devtools project at work. People are all over it." -- Jason Gessner

"I just wanted to send you my praise for this wonderful little application. It's in my toolbox now and after one day of use has proven itself invaluable." -- Benjamin W. Smith

"ack has replaced grep for me for 90% of what I used it for. Obsoleted most of my 'grep is crippled' wrapper scripts, too." -- Randall Hansen

"ack's powerful search facilities are an invaluable tool for searching large repositories like Parrot. The ability to control the search domain by filetype--and to do so independent of platform--has made one-liners out of many complex queries previously done with custom scripts. Parrot developers are hooked on ack." -- Jerry Gay

"That thing is awesome. People see me using it and ask what the heck it is." -- Andrew Moore

Top 10 reasons to use ack instead of grep.

  1. It's blazingly fast because it only searches the stuff you want searched.
  2. ack is pure Perl, so it runs on Windows just fine.
  3. The standalone version uses no non-standard modules, so you can put it in your ~/bin without fear.
  4. Searches recursively through directories by default, while ignoring .svn, CVS and other VCS directories.
    • Which would you rather type?
      $ grep pattern $(find . -type f | grep -v '\.svn')
      $ ack pattern
  5. ack ignores most of the crap you don't want to search
    • VCS directories
    • blib, the Perl build directory
    • backup files like foo~ and #foo#
    • binary files, core dumps, etc
  6. Ignoring .svn directories means that ack is faster than grep for searching through trees.
  7. Lets you specify file types to search, as in --perl or --nohtml.
    • Which would you rather type?
      $ grep pattern $(find . -name '*.pl' -or -name '*.pm' -or -name '*.pod' | grep -v .svn)
      $ ack --perl pattern
    Note that ack's --perl also checks the shebang lines of files without suffixes, which the find command will not.
  8. File-filtering capabilities usable without searching with ack -f. This lets you create lists of files of a given type.
    $ ack -f --perl > all-perl-files
  9. Color highlighting of search results.
  10. Uses real Perl regular expressions, not a GNU subset.
  11. Allows you to specify output using Perl's special variables
    • Example: ack '(Mr|Mr?s)\. (Smith|Jones)' --output='$&'
  12. Many command-line switches are the same as in GNU grep:
    -w does word-only searching
    -c shows counts per file of matches
    -l gives the filename instead of matching lines
    etc.
  13. Command name is 25% fewer characters to type! Save days of free-time! Heck, it's 50% shorter compared to grep -r.

ack's command flags

$ ack --help
Usage: ack [OPTION]... PATTERN [FILE]

Search for PATTERN in each source file in the tree from cwd on down.
If [FILES] is specified, then only those files/directories are checked.
ack may also search STDIN, but only if no FILE are specified, or if
one of FILES is "-".

Default switches may be specified in ACK_OPTIONS environment variable or
an .ackrc file. If you want no dependency on the environment, turn it
off with --noenv.

Example: ack -i select

Searching:
  -i, --ignore-case     Ignore case distinctions in PATTERN
  --[no]smart-case      Ignore case distinctions in PATTERN,
                        only if PATTERN contains no upper case
                        Ignored if -i is specified
  -v, --invert-match    Invert match: select non-matching lines
  -w, --word-regexp     Force PATTERN to match only whole words
  -Q, --literal         Quote all metacharacters; PATTERN is literal

Search output:
  --line=NUM            Only print line(s) NUM of each file
  -l, --files-with-matches
                        Only print filenames containing matches
  -L, --files-without-matches
                        Only print filenames with no matches
  -o                    Show only the part of a line matching PATTERN
                        (turns off text highlighting)
  --passthru            Print all lines, whether matching or not
  --output=expr         Output the evaluation of expr for each line
                        (turns off text highlighting)
  --match PATTERN       Specify PATTERN explicitly.
  -m, --max-count=NUM   Stop searching in each file after NUM matches
  -1                    Stop searching after one match of any kind
  -H, --with-filename   Print the filename for each match
  -h, --no-filename     Suppress the prefixing filename on output
  -c, --count           Show number of lines matching per file
  --column              Show the column number of the first match

  -A NUM, --after-context=NUM
                        Print NUM lines of trailing context after matching
                        lines.
  -B NUM, --before-context=NUM
                        Print NUM lines of leading context before matching
                        lines.
  -C [NUM], --context[=NUM]
                        Print NUM lines (default 2) of output context.

  --print0              Print null byte as separator between filenames,
                        only works with -f, -g, -l, -L or -c.

File presentation:
  --pager=COMMAND       Pipes all ack output through COMMAND.  For example,
                        --pager="less -R".  Ignored if output is redirected.
  --nopager             Do not send output through a pager.  Cancels any
                        setting in ~/.ackrc, ACK_PAGER or ACK_PAGER_COLOR.
  --[no]heading         Print a filename heading above each file's results.
                        (default: on when used interactively)
  --[no]break           Print a break between results from different files.
                        (default: on when used interactively)
  --group               Same as --heading --break
  --nogroup             Same as --noheading --nobreak
  --[no]color           Highlight the matching text (default: on unless
                        output is redirected, or on Windows)
  --[no]colour          Same as --[no]color
  --color-filename=COLOR
  --color-match=COLOR   Set the color for matches and filenames.
  --flush               Flush output immediately, even when ack is used
                        non-interactively (when output goes to a pipe or
                        file).

File finding:
  -f                    Only print the files found, without searching.
                        The PATTERN must not be specified.
  -g REGEX              Same as -f, but only print files matching REGEX.
  --sort-files          Sort the found files lexically.

File inclusion/exclusion:
  -a, --all-types       All file types searched;
                        Ignores CVS, .svn and other ignored directories
  -u, --unrestricted    All files and directories searched
  --[no]ignore-dir=name Add/Remove directory from the list of ignored dirs
  -r, -R, --recurse     Recurse into subdirectories (ack's default behavior)
  -n, --no-recurse      No descending into subdirectories
  -G REGEX              Only search files that match REGEX

  --perl                Include only Perl files.
  --type=perl           Include only Perl files.
  --noperl              Exclude Perl files.
  --type=noperl         Exclude Perl files.
                        See "ack --help type" for supported filetypes.

  --type-set TYPE=.EXTENSION[,.EXT2[,...]]
                        Files with the given EXTENSION(s) are recognized as
                        being of type TYPE. This replaces an existing
                        definition for type TYPE.
  --type-add TYPE=.EXTENSION[,.EXT2[,...]]
                        Files with the given EXTENSION(s) are recognized as
                        being of (the existing) type TYPE

  --[no]follow          Follow symlinks.  Default is off.

  Directories ignored by default:
    autom4te.cache, blib, _build, .bzr, .cdv, cover_db, CVS, _darcs, ~.dep,
    ~.dot, .git, .hg, ~.nib, nytprof, .pc, ~.plst, RCS, SCCS, _sgbak and
    .svn

  Files not checked for type:
    /~$/           - Unix backup files
    /#.+#$/        - Emacs swap files
    /[._].*\.swp$/ - Vi(m) swap files
    /core\.\d+$/   - core dumps

Miscellaneous:
  --noenv               Ignore environment variables and ~/.ackrc
  --help                This help
  --man                 Man page
  --version             Display version & copyright
  --thpppt              Bill the Cat

Exit status is 0 if match, 1 if no match.

This is version 1.92 of ack.

[Apr 25, 2010] The Perl Review Archives

Download Volume 0 Issue 0 (February 2002) (394k PDF) Download Volume 0 Issue 1 (March 2002) (429k PDF) Download Volume 0 Issue 2 (April 2002) (485k PDF) Download Volume 0 Issue 3 (May 2002) (433k PDF) Download Volume 0 Issue 4 (July 2002) (332k PDF) Download Volume 0 Issue 5 (September 2002) (363k PDF) Download Volume 0 Issue 6 (November 2002) (352k PDF) Download Volume 0 Issue 7 (January 2003) (263k PDF)

[Apr 25, 2010] What Perl got right (Jan 03)

... one of the many things that I think Perl got right: Perl's easy access to low-level operating system functionality.

Let's take a look at what this means. Perl gives you unlink() and rename() to remove and rename files. These calls pass nearly directly to the underlying "section 2'' Unix system calls, without hiding the call behind a confusing abstraction layer. In fact, the name "unlink'' is a direct reflection of that. Many beginners look for a "file delete'' operation, without stumbling across "unlink'' because of its peculiar name.

But the matchup doesn't stop there. Perl's file and directory operations include such entries as chdir(), chmod(), chown(), chroot(), fcntl(), ioctl(), link(), mkdir(), readlink(), rmdir(), stat(), symlink(), umask(), and utime(). All of these are mapped nearly directly to the corresponding system call. This means that file-manipulating programs don't have to call out to a shell just to perform the heavy lifting.

And if you want process control, Perl gives you alarm(), exec(), fork(), get/setpgrp(), getppid(), get/setpriority(), kill(), pipe(), sleep(), wait(), and waitpid(). With fork and pipe, you can create any feasible piping configuration, again not limited to a particular process abstraction provided by a more limited scripting language. And you can manage and modify those processes directly as well.

Let's not forget those socket functions, like accept(), bind(), connect(), getpeername(), getsockname(), get/setsockopt(), listen(), recv(), send(), shutdown(), socket(), and socketpair(). Although most people usually end up using the higher level modules that wrap around these calls (like LWP or Net::SMTP), they in turn can call these operations to set up the interprocess communication. And if a protocol isn't provided by a readily accessible library, you can get down near the metal and tweak to your heart's content.

Speaking of interprocess communication, you've also got the "System V'' interprocess communications, like msgctl(), msgget(), msgrcv(), msgsnd(), semctl(), semget(), semop(), shmctl(), shmget(), shmread() and shmwrite(). Again, each of these calls maps nearly directly to the underlying system call, making existing C-based literature a ready source of examples and explanation, rather than providing a higher-level abstraction layer. Then again, if you don't want to deal with the low-level interfaces, common CPAN modules hide away the details if you wish.

And then there's the user and group info (getpwuid() and friends), network info (like gethostbyname()). Even opening a file can be modified using all of the flags directly available to the open system call, like O_NONBLOCK, O_CREAT or O_EXCL.

Hopefully, you can see from these lists that Perl provides a rich set of interfaces to low-level operating system details. Why is this "what Perl got right''?

It means that while Perl provides a decent high-level language for text wrangling and object-oriented programming, we can still get "down in the dirt'' to precisely control, create, modify, manage, and maintain our systems and data. For example, if our application requires a "write to temp file, then close and rename atomically'' to keep other applications from seeing a partially written file, we can spell it out as if we were in a systems implementation language like C:

        open TMP, ">ourfile.$$" or die "...";
        print TMP @our_new_data;
        close TMP;
        chmod 0444, "ourfile.$$" or die "...";
        rename "ourfile.$$", "ourfile" or die "...";

By keeping the system call names the same (or similar), we can leverage off existing examples, documentation, and knowledge.

In a scripting language without these low-level operations, we're forced to accept a world as presented by the language designer, not the world in which we live as a practicality. Eric Raymond gave as examples an old LISP system which provided many layers of abstraction (sometimes buggy) before you got to actual file input/output system calls, and the classic Smalltalk image, which provides a world unto itself, but very few hooks out into the real world. As a modern example, Java seems to be somewhat painful about "real world'' connections, preferring instead to have its users implement the ideal world for it rather than it adapting to its world.

[Apr 25, 2010] Perl Programming (DTP-250)

The Perl Programming Language Scalars Control Structures Arrays Hashes Basic I/O and Regular Expressions Filehandles and Files Subroutines and Modules File and Directory Operations Overview of CGI Programming

[Apr 24, 2010] Free Perl Books - freeprogrammingresources.com

[Nov 8, 2009] Perl far from dead, more popular than you think

Perl is a mature language and as such level of press coverage does not reflects real use of language. It is now included in all Unix and Linux distributions so it become real alternative to shell for complex scripts. It takes many years and man-hours to reach this status. Neither Python not Ruby are close (Python is more or less common in all Linux distributions).
November 6, 2009 | Royal Pingdom

... Here are some of the more popular sites that use Perl extensively today:

More sites (and apps) using Perl

When the subject of Perl was brought up here at the Pingdom office, we were not sure how widely used it is now in 2009, especially on the Web. Thats why decided to dig around a bit, which in turn led to this article. The above websites are just the tip of the iceberg, though. Here are even more examples of sites making extensive use of Perl:

Add to this all blogs using the Movable Type blogging software from Six Apart, which uses Perl. Prominent examples include The Huffington Post, Kottke.org, Boing Boing and ReadWriteWeb. And of course all blogs on the Typepad blogging service, which uses a special version of Movable Type.

Comments

Colin

Im admittedly biased, but Perl is not dead for new development. Its being used all over the place. Heres a few of my favorite perl based website:

http://www.thegamecrafter.com
Print on demand board game creation

http://www.hiveminder.com
Online TODO list tracker

Dont believe the hype, Perl is alive and kicking.

Alexandr Ciornii

>November 6th, 2009 at 5:14 pm

Slideshow about real Perl popularity: http://www.slideshare.net/Tim.Bunce/perl-myths-200909

prakash

Add Multiply (http://multiply.com) third fastest growing social network in the US (http://www.wired.com/geekdad/2009/05/multiply-is-there-room-for-another-player-in-the-social-media-space/) to the list. Its all Perl there.

Me

http://www.optuszoo.com.au all perl

Ask Bjrn Hansen

http://www.weblocal.ca/ (on the http://yellowbot.com/ platform) is the 3rd largest local search site in Canada and is all Perl (and monitored by Pingdom, of course 100% uptime last month). :-)

ask

Nilson Santos F. Jr.

I think most people think Perl is dying because those who use it dont usually make such a fuss about at *those* other dynamic language folks.

Ive been a Perl developer, mostly doing new development for several years (new modules for existing products or completely brand new products).

Inside the Perl community, people know it is widely used. At least one Alexa top 100 website is fully coded in Perl and several other top 100 sites partially use it.

Lee Doolan

To the list of sites using Perl, you can add

http://www.sfgate.com

Steve

There are fashions in computer languages in much the same way as there are fashions in clothing. Programmers dont like to acknowledge that. Just look as the example code included in the comments here. Not much to choose between them really (save your proselytizing). There is little that can be done in Python or Ruby which cannot be done in Perl and there is such a vast library of Perl modules available in CPAN that to be honest, I doubt that I wonder how many times Python and Ruby coders have reinvented the wheel (Lets face so have Perl coders).

Next year you will see the first bundled releases of Perl6 in the mainstream distributions. Where there was once ridicule aimed at a still unreleased Perl6, there will be surprise, relief, interest and adopters. mod_perl6 is already being ported. Books will be published and take up shelf space in the computer departments of book shops. It will be noticed and newbies will want to give it a try. Old Perl hands will take a look at it, and some Python and Ruby diehards will also consider their options (though most will stay put because that is how people are). Next year interest in Perl will grow. Just take a look at Google trends; the interest is already building.

Next year, and increasingly, it will be Perl6 which is the new kid on the block, and Python and Ruby will be the oldies who struggle to keep up.

The author of the article is right. Perl is far from dead.

Could be wrong though. You can never tell :-)

Shawn

November 8th, 2009 at 9:26 am

Not only is Perl still popular for web sites, it is gaining in the field of bio-analysis. Not only is it not dead, its growing.

[Apr 21, 2009] Why you should upgrade to Perl 5.10

External links

Articles

Perl Tips on Perl 5.10

First Look Perl 5.10 is a Pearl Compiler from Wired.com

By Scott Gilbertson January 02, 2008 | 10:56:58 AM

As most Perl fans are no doubt aware, the Perl Foundation released version 5.10 last month and introduced a number of significant upgrades for the popular programming language. Perl 5.10 is the first significant feature upgrade since the 5.8 release back in 2002.

First the good news, AKA why you should go ahead and upgrade: the major new language features are turned off by default which means you can upgrade without breaking existing scripts, and take advantage of the new features for new scripts. Even cooler is ability to progressively upgrade scripts using the use syntax.

For instance, add the line use feature 'switch'; prior to a block of code where youd like to take advantage of the new switch statement in Perl 5.10 and then turn it off after upgrading that block of code using the statement no feature 'switch';. New features can be enabled by name or as a collective group using the statement use feature ':5.10';.

In addition to the switch statement, theres a new say statement which acts like print() but adds a newline character and a state feature, which enables a new class of variables with very explicit scope control.

But perhaps the most interesting of 5.10s new features is the new or operator, //, which is a defined or construct. For instance the following statements are syntactically equivalent:

$foo // $bar defined $foo ? $foo : $bar 

Obviously the first line is much more compact and (I would argue) readable i.e. is $foo defined? If not, give it the value $bar. You can also add an equal sign like so:

$bar //= $foo; 

Which is the same as writing:

$bar = $foo unless defined $bar; 

Another noteworthy new feature is the smart match operator, which the Perl Foundation explains as a new kind of comparison, the specifics of which are contextual based on the inputs to the operator. For example, to find if scalar $needle is in array @haystack, simply use the new ~~ operator

if ( $needle ~~ @haystack ) ... 

Perl 5.10 also finally gains support for named regex statements, which means you can avoid the dreaded lines of $1 $2 etc, which often make Perl regex hard to decipher. Finally I might be able to understand whats going on in complex regex scripts like Markdown.

Other improvements include a faster interpreter with a smaller memory footprint, better error messages and more. For full details on the new release check out the notes.

Ill confess I abandoned Perl for Python some time ago, but after playing with 5.10 I may have to rethink that decision, Perl 5.10s new features are definitely worth the upgrade and a must have for anyone who uses Perl on a daily basis.

A Beginner's Introduction to Perl 5.10, part three By chromatic

The first two articles in this series (A Beginner's Introduction to Perl 5.10 and A Beginner's Introduction to Files and Strings in Perl 5.10) covered flow control, math and string operations, and files. (A Beginner's Introduction to Perl Web Programming demonstrates how to write secure web programs.) This is a Beginner's Introduction to Perl 5.10, part three
June 26, 2008 | O'Reilly News

Simple matching

The simplest regular expressions are matching expressions. They perform tests using keywords like if, while and unless. If you want to be really clever, you can use them with and and or. A matching regexp will return a true value if whatever you try to match occurs inside a string. To match a regular expression against a string, use the special =~ operator:

use 5.010;

my $user_location = "I see thirteen black cats under a ladder.";
say "Eek, bad luck!" if $user_location =~ /thirteen/;

Notice the syntax of a regular expression: a string within a pair of slashes. The code $user_location =~ /thirteen/ asks whether the literal string thirteen occurs anywhere inside $user_location. If it does, then the test evaluates true; otherwise, it evaluates false.

Metacharacters

A metacharacter is a character or sequence of characters that has special meaning. You may remember metacharacters in the context of double-quoted strings, where the sequence \n means the newline character, not a backslash and the character n, and where \t means the tab character.

Regular expressions have a rich vocabulary of metacharacters that let you ask interesting questions such as, "Does this expression occur at the end of a string?" or "Does this string contain a series of numbers?"

The two simplest metacharacters are ^ and $. These indicate "beginning of string" and "end of string," respectively. For example, the regexp /^Bob/ will match "Bob was here," "Bob", and "Bobby." It won't match "It's Bob and David," because Bob doesn't occur at the beginning of the string. The $ character, on the other hand, matches at the end of a string. The regexp /David$/ will match "Bob and David," but not "David and Bob." Here's a simple routine that will take lines from a file and only print URLs that seem to indicate HTML files:

for my $line (<$urllist>) {
    # "If the line starts with http: and ends with html...."
    print $line if $line =~ /^http:/ and $line =~ /html$/;
}

Another useful set of metacharacters is called wildcards. If you've ever used a Unix shell or the Windows DOS prompt, you're familiar with wildcards characters such * and ?. For example, when you type ls a*.txt, you see all filenames that begin with the letter a and end with .txt. Perl is a bit more complex, but works on the same general principle.

In Perl, the generic wildcard character is .. A period inside a regular expression will match any character, except a newline. For example, the regexp /a.b/ will match anything that contains a, another character that's not a newline, followed by b -- "aab," "a3b," "a b," and so forth.

To match a literal metacharacter, escape it with a backslash. The regex /Mr./ matches anything that contains "Mr" followed by another character. If you only want to match a string that actually contains "Mr.," use /Mr\./.

On its own, the . metacharacter isn't very useful, which is why Perl provides three wildcard quantifiers: +, ? and *. Each quantifier means something different.

The + quantifier is the easiest to understand: It means to match the immediately preceding character or metacharacter one or more times. The regular expression /ab+c/ will match "abc," "abbc," "abbbc", and so on.

The * quantifier matches the immediately preceding character or metacharacter zero or more times. This is different from the + quantifier! /ab*c/ will match "abc," "abbc," and so on, just like /ab+c/ did, but it'll also match "ac," because there are zero occurences of b in that string.

Finally, the ? quantifier will match the preceding character zero or one times. The regex /ab?c/ will match "ac" (zero occurences of b) and "abc" (one occurence of b). It won't match "abbc," "abbbc", and so on.

The URL-matching code can be more concise with these metacharacters. This'll make it more concise. Instead of using two separate regular expressions (/^http:/ and /html$/), combine them into one regular expression: /^http:.+html$/. To understand what this does, read from left to right: This regex will match any string that starts with "http:" followed by one or more occurences of any character, and ends with "html". Now the routine is:

for my $line (<$urllist>) {
    print $line if $line =~ /^http:.+html$/;
}

Remember the /^something$/ construction -- it's very useful!

Character classes

The special metacharacter, ., matches any character except a newline. It's common to want to match only specific types of characters. Perl provides several metacharacters for this. \d matches a single digit, \w will match any single "word" character (a letter, digit or underscore), and \s matches a whitespace character (space and tab, as well as the \n and \r characters).

These metacharacters work like any other character: You can match against them, or you can use quantifiers like + and *. The regex /^\s+/ will match any string that begins with whitespace, and /\w+/ will match a string that contains at least one word. (Though remember that Perl's definition of "word" characters includes digits and the underscore, so whether you think _ or 25 are words, Perl does!)

One good use for \d is testing strings to see whether they contain numbers. For example, you might need to verify that a string contains an American-style phone number, which has the form 555-1212. You could use code like this:

use 5.010;

say "Not a phone number!" unless $phone =~ /\d\d\d-\d\d\d\d/;

All those \d metacharacters make the regex hard to read. Fortunately, Perl can do better. Use numbers inside curly braces to indicate a quantity you want to match:

use 5.010;

say "Not a phone number!" unless $phone =~ /\d{3}-\d{4}/;

The string \d{3} means to match exactly three numbers, and \d{4} matches exactly four digits. To use a range of numbers, you can separate them with a comma; leaving out the second number makes the range open-ended. \d{2,5} will match two to five digits, and \w{3,} will match a word that's at least three characters long.

You can also invert the \d, \s and \w metacharacters to refer to anything but that type of character. \D matches nondigits; \W matches any character that isn't a letter, digit, or underscore; and \S matches anything that isn't whitespace.

If these metacharacters won't do what you want, you can define your own. You define a character class by enclosing a list of the allowable characters in square brackets. For example, a class containing only the lowercase vowels is [aeiou]. /b[aeiou]g/ will match any string that contains "bag," "beg," "big," "bog", or "bug". Use dashes to indicate a range of characters, like [a-f]. (If Perl didn't give us the \d metacharacter, we could do the same thing with [0-9].) You can combine character classes with quantifiers:

use 5.010;
 say "This string contains at least two vowels in a row."
    if $string =~ /[aeiou]{2}/;

You can also invert character classes by beginning them with the ^ character. An inverted character class will match anything you don't list. [^aeiou] matches every character except the lowercase vowels. (Yes, ^ can also mean "beginning of string," so be careful.)

Flags

By default, regular expression matches are case-sensitive (that is, /bob/ doesn't match "Bob"). You can place flags after a regexp to modify their behaviour. The most commonly used flag is i, which makes a match case-insensitive:

use 5.010;

my $greet = "Hey everybody, it's Bob and David!";
    say "Hi, Bob!" if $greet =~ /bob/i;

Subexpressions

You might want to check for more than one thing at a time. For example, you're writing a "mood meter" that you use to scan outgoing e-mail for potentially damaging phrases. Use the pipe character | to separate different things you are looking for:

use 5.010;

# In reality, @email_lines would come from your email text,
# but here we'll just provide some convenient filler.
my @email_lines = ("Dear idiot:",
                   "I hate you, you twit.  You're a dope.",
                   "I bet you mistreat your llama.",
                   "Signed, Doug");

for my $check_line (@email_lines) {
   if ($check_line =~ /idiot|dope|twit|llama/) {
       say "Be careful!  This line might contain something offensive:\n$check_line";
   }

The matching expression /idiot|dope|twit|llama/ will be true if "idiot," "dope," "twit" or "llama" show up anywhere in the string.

One of the more interesting things you can do with regular expressions is subexpression matching, or grouping. A subexpression is another, smaller regex buried inside your larger regexp within matching parentheses. The string that caused the subexpression to match will be stored in the special variable $1. This can make your mood meter more explicit about the problems with your e-mail:

for my $check_line (@email_lines) {
   if ($check_line =~ /(idiot|dope|twit|llama)/) {
       say "Be careful!  This line contains the offensive word '$1':\n$check_line";
   }

Of course, you can put matching expressions in your subexpression. Your mood watch program can be extended to prevent you from sending e-mail that contains more than three exclamation points in a row. The special {3,} quantifier will make sure to get all the exclamation points.

for my $check_line (@email_lines) {
    if ($check_line =~ /(!{3,})/) {
        say "Using punctuation like '$1' is the sign of a sick mind:\n$check_line";
    }
}

If your regex contains more than one subexpression, the results will be stored in variables named $1, $2, $3 and so on. Here's some code that will change names in "lastname, firstname" format back to normal:

my $name = 'Wall, Larry';
$name =~ /(\w+), (\w+)/;
# $1 contains last name, $2 contains first name

$name = "$2 $1";
# $name now contains "Larry Wall"

You can even nest subexpressions inside one another -- they're ordered as they open, from left to right. Here's an example of how to retrieve the full time, hours, minutes and seconds separately from a string that contains a timestamp in hh:mm:ss format. (Notice the use of the {1,2} quantifier to match a timestamp like "9:30:50".)

my $string = "The time is 12:25:30 and I'm hungry.";
if ($string =~ /((\d{1,2}):(\d{2}):(\d{2}))/) {
    my @time = ($1, $2, $3, $4);
}

Here's a hint that you might find useful: You can assign to a list of scalar values whenever you're assigning from a list. If you prefer to have readable variable names instead of an array, try using this line instead:

my ($time, $hours, $minutes, $seconds) = ($1, $2, $3, $4);

Assigning to a list of variables when you're using subexpressions happens often enough that Perl gives you a handy shortcut. In list context, a successful regular expression match returns its captured variables in the order in which they appear within the regexp:

my ($time, $hours, $minutes, $seconds) = $string =~ /((\d{1,2}):(\d{2}):(\d{2}))/;

Counting parentheses to see where one group begins and another group ends is troublesome though. Perl 5.10 added a new feature, lovingly borrowed from other languages, where you can give names to capture groups and access the captured values through the special hash %+. This is most obvious by example:

my $name = 'Wall, Larry';
$name =~ /(?<last>\w+), (?<first>\w+)/;
# %+ contains all named captures

$name = "$+{last} $+{first}";
# $name now contains "Larry Wall"

There's a common mistake related to captures, namely assuming that $1 and %+ et al will hold meaningful values if the match failed:

my $name = "Damian Conway";
# no comma, so the match will fail!
$name =~ /(?<last>\w+), (?<first>\w+)/;

# and there's nothing in the capture buffers
$name = "$+{last} $+{first}";

# $name now contains a blank space

Always check the success or failure of your regular expression when working with captures!

my $name = "Damian Conway";
$name = "$+{last} $+{first}" if $name =~ /(?<last>\w+), (?<first>\w+)/;

Watch out!

Regular expressions have two othertraps that generate bugs in your Perl programs: They always start at the beginning of the string, and quantifiers always match as much of the string as possible.

Here's some simple code for counting all the numbers in a string and showing them to the user. It uses while to loop over the string, matching over and over until it has counted all the numbers.

use 5.010;
my $number       = "Look, 200 5-sided, 4-colored pentagon maps.";
my $number_count = 0;

while ($number =~ /(\d+)/) {
    say "I found the number $1.\n";
    $number_count++;
}

say "There are $number_count numbers here.\n";

This code is actually so simple it doesn't work! When you run it, Perl will print I found the number 200 over and over again. Perl always begins matching at the beginning of the string, so it will always find the 200, and never get to the following numbers.

You can avoid this by using the g flag with your regex. This flag will tell Perl to remember where it was in the string when it returns to it (due to a while loop). When you insert the g flag, the code becomes:

use 5.010;
my $number       = "Look, 200 5-sided, 4-colored pentagon maps.";
my $number_count = 0;

while ($number =~ /(\d+)/g) {
    say "I found the number $1.\n";
    $number_count++;
}

say "There are $number_count numbers here.\n";

Now you get the expected results:

I found the number 200.
I found the number 5.
I found the number 4.
There are 3 numbers here.

The second trap is that a quantifier will always match as many characters as it can. Look at this example code, but don't run it yet:

use 5.010;
my $book_pref = "The cat in the hat is where it's at.\n";
say $+{match} if $book_pref =~ /(?<match>cat.*at)/;

Take a guess: What's in $+{match} right now? Now run the code. Does this seem counterintuitive?

The matching expression cat.*at is greedy. It contains cat in the hat is where it's at because that's the longest string that matches. Remember, read left to right: "cat," followed by any number of characters, followed by "at." If you want to match the string cat in the hat, you have to rewrite your regexp so it isn't as greedy. There are two ways to do this:

Search and replace

Regular expressions can do something else for you: replacing.

If you've ever used a text editor or word processor, you've probably used its search-and-replace function. Perl's regexp facilities include something similar, the s/// operator: s/regex/replacement string/. If the string you're testing matches regex, then whatever matched is replaced with the contents of replacement string. For instance, this code will change a cat into a dog:

use 5.010;
my $pet = "I love my cat.";
$pet =~ s/cat/dog/;
say $pet;

You can also use subexpressions in your matching expression, and use the variables $1, $2 and so on, that they create. The replacement string will substitute these, or any other variables, as if it were a double-quoted string. Remember the code for changing Wall, Larry into Larry Wall? It makes a fine single s/// statement!

my $name = 'Wall, Larry';
$name =~ s/(\w+), (\w+)/$2 $1/;
# "Larry Wall"

You don't have to worry about using captures if the match fails; the substitution won't take place. Of course, named captures work equally well:

my $name = 'Wall, Larry';
$name =~ s/(?<last>\w+), (?<first>\w+)/$+{first} $+{last}/;
# "Larry Wall"

s/// can take flags, just like matching expressions. The two most important flags are g (global) and i (case-insensitive). Normally, a substitution will only happen once, but specifying the g flag will make it happen as long as the regex matches the string. Try this code with and without the g flag:

use 5.010;

my $pet = "I love my cat Sylvester, and my other cat Bill.\n";
$pet =~ s/cat/dog/g;
say $pet;

Notice that without the g flag, Bill avoids substitution-related polymorphism.

The i flag works just as it does in matching expressions: It forces your matching search to be case-insensitive.

Maintainability

Once you start to see how patterns describe text, everything so far is reasonably simple. Regexps may start simple, but often they grow in to larger beasts. There are two good techniques for making regexps more readable: adding comments and factoring them into smaller pieces.

The x flag allows you to use whitespace and comments within regexps, without it being significant to the pattern:

my ($time, $hours, $minutes, $seconds) =
    $string =~ /(                 # capture entire match
                    (\d{1,2})     # one or two digits for the hour
                    :
                    (\d{2})       # two digits for the minutes
                    :
                    (\d{2})       # two digits for the seconds
                )
    /x;

That may be a slight improvement for the previous version of this regexp, but this technique works even better for complex regexps. Be aware that if you do need to match whitespace within the pattern, you must use \s or an equivalent.

Adding comments is helpful, but sometimes giving a name to a particular piece of code is sufficient clarification. The qr// operator compiles but does not execute a regexp, producing a regexp object that you can use inside a match or substitution:

my $two_digits = qr/\d{2}/;

my ($time, $hours, $minutes, $seconds) =
    $string =~ /(                 # capture entire match
                    (\d{1,2})     # one or two digits for the hour
                    :
                    ($two_digits) # minutes
                    :
                    ($two_digits) # seconds
                )
    /x;

Of course, you can use all of the previous techniques as well:

use 5.010;

my $two_digits        = qr/\d{2}/;
my $one_or_two_digits = qr/\d{1,2}/;

my ($time, $hours, $minutes, $seconds) =
    $string =~ /(?<time>
                    (?<hours> $one_or_two_digits)
                    :
                    (?<minutes> $two_digits)
                    :
                    (?<seconds> $two_digits)
                )
    /x;

Note that the captures are available through %+ as well as in the list of values returned from the match.

Putting it all together

Regular expressions have many practical uses. Consider a httpd log analyzer for an example. One of the play-around items in the previous article was to write a simple log analyzer. You can make it more interesting; how about a log analyzer that will break down your log results by file type and give you a list of total requests by hour.

(Complete source code.)

Here's a sample line from a httpd log:

127.12.20.59 - - [01/Nov/2000:00:00:37 -0500] "GET /gfx2/page/home.gif HTTP/1.1" 200 2285

The first task is split this into fields. Remember that the split() function takes a regular expression as its first argument. Use /\s/ to split the line at each whitespace character:

my @fields = split /\s/, $line;

This gives 10 fields. The interesting fields are the fourth field (time and date of request), the seventh (the URL), and the ninth and 10th (HTTP status code and size in bytes of the server response).

Step one is canonicalization: turning any request for a URL that ends in a slash (like /about/) into a request for the index page from that directory (/about/index.html). Remember to escape the slashes so that Perl doesn't consider them the terminating characters of the match or substitution:

$fields[6] =~ s/\/$/\/index.html/;

This line is difficult to read; it suffers from leaning-toothpick syndrome. Here's a useful trick for avoiding the leaning-toothpick syndrome: replace the slashes that mark regular expressions and s/// statements with any other matching pair of characters, such as { and }. This allows you to write a more legible regex where you don't need to escape the slashes:

$fields[6] =~ s{/$}{/index.html};

(To use this syntax with a matching expression, put a m in front of it. /foo/ becomes m{foo}.)

Step two is to assume that any URL request that returns a status code of 200 (a successful request) is a request for the file type of the URL's extension (a request for /gfx/page/home.gif returns a GIF image). Any URL request without an extension returns a plain-text file. Remember that the period is a metacharacter, so escape it!

if ($fields[8] eq '200') {
    if ($fields[6] =~ /\.([a-z]+)$/i) {
        $type_requests{$1}++;
    } else {
        $type_requests{txt}++;
    }
}

Next, retrieve the hour when each request took place. The hour is the first string in $fields[3] that will be two digits surrounded by colons, so all you need to do is look for that. Remember that Perl will stop when it finds the first match in a string:

# Log the hour of this request
$fields[3] =~ /:(\d{2}):/;
$hour_requests{$1}++;

Finally, rewrite the original report() sub. We're doing the same thing over and over (printing a section header and the contents of that section), so we'll break that out into a new sub. We'll call the new sub report_section():

 sub report {
    print "Total bytes requested: ", $bytes, "\n"; print "\n";
    report_section("URL requests:", %url_requests);
    report_section("Status code results:", %status_requests);
    report_section("Requests by hour:", %hour_requests);
    report_section("Requests by file type:", %type_requests);
}

The new report_section() sub is very simple:

sub report_section {
    my ($header, %types) = @_;

    say $header;

    for my $type (sort keys %types) {
        say "$type: $types{$type}";
    }

    print "\n";
}

The keys operator returns a list of the keys in the %types hash, and the sort operator puts them in alphabetic order. The next article will explain sort in more detail.

Perl 5.10 Advanced Regular Expressions

Presentation by Yves Orton (demerphq)

Perl 5.10 Advanced Regular Expressions

Everybody stand back!

Intro

Topics Covered

Recursion Eliminated

Pluggable Interface

use re 'debug';

re 'debug' is lexically scoped

Anatomy of the debug regex compilation output

Anatomy of the debug regex execution output

Quantifier Combinatorial Explosion

Call the bomb squad!

The Good, the Bad, and the Ugly

Possessive Quantifiers

Without possessive quantifiers

With Possessive Quantifiers

Capture Buffers

Introducing Named Capture Buffers

Named Capture Buffers

Getting results from named captures

Named capture at work

Original Backref Syntax

Problems with Numeric Backrefs

New Backref Syntax

Relative backreferences and named capture

Matching Balanced Constructs

Recursive Patterns In Older Perls

Old way commented

Old way compiled

Old way executing

Recursive Patterns In Blead

Pattern recursion in more detail

A grammar

About the (?(DEFINE)...) predicate

New way commented

New way compiled

Recursion Implies Subroutines

Oh good!

Trie and Aho-Corasick matching

Umm, so whats a trie?

More about tries

Other Optimisations

Optimization debugged

What the trie does

What happens without the trie

Backtracking

A bit about backtracking

Backtracking Control Verbs

New Backtracking Control Verbs

(*FAIL)

Exhaustive Matching with (*FAIL)

(*ACCEPT)

(*ACCEPT) and (*FAIL) in action

Verbs With Arguments

(*PRUNE)

Using (*PRUNE)

(*MARK)

Using (*MARK)

(*SKIP)

More backtracking verbs in action

(*THEN)

(*COMMIT)

Recent Additions

The "preserve" modifier /p and ${^MATCH}

Keep pattern \K

Using \K is much faster!

Branch Reset Pattern

Branch Reset Pattern II

Further Reading

More optimisations

Regular Expressions in Perl 5.10

There are many new features in the regular expression engine of Perl 5.10. I point out some of them.

Named captures

I am trying to match a phone number and save the values in variables.

One way to do it is:

    if ($str =~ /^(\d+)-(\d+)-(\d+)$/) {
        $num{country} = $1;
        $num{area}    = $2;
        $num{phone}   = $3;
    }

The new way is

    if ($str =~ /^(?<country>\d+)-(?<area>\d+)-(?<phone>\d+)$/) {
        %num = %+;
    }

perldelta - what is new for perl 5.10.0 - search.cpan.org

perl -d
The Perl debugger can now save all debugger commands for sourcing later; notably, it can now emulate stepping backwards, by restarting and rerunning all bar the last command from a saved command history.

It can also display the parent inheritance tree of a given class, with the i command.

Use of uninitialized value

Perl will now try to tell you the name of the variable (if any) that was undefined.

  1. The feature pragma

    The feature pragma is used to enable new syntax that would break Perl's backwards-compatibility with older releases of the language. It's a lexical pragma, like strict or warnings.

    Currently the following new features are available: switch (adds a switch statement), say (adds a say built-in function), and state (adds a state keyword for declaring "static" variables). Those features are described in their own sections of this document.

    The feature pragma is also implicitly loaded when you require a minimal perl version (with the use VERSION construct) greater than, or equal to, 5.9.5. See feature for details.

  2. say()

    say() is a new built-in, only available when use feature 'say' is in effect, that is similar to print(), but that implicitly appends a newline to the printed string. See "say" in perlfunc. (Robin Houston)

  3. Switch and Smart Match operator

    Perl 5 now has a switch statement. It's available when use feature 'switch' is in effect. This feature introduces three new keywords, given, when, and default:

        given ($foo) {
            when (/^abc/) { $abc = 1; }
            when (/^def/) { $def = 1; }
            when (/^xyz/) { $xyz = 1; }
            default { $nothing = 1; }
        }

    A more complete description of how Perl matches the switch variable against the when conditions is given in "Switch statements" in perlsyn.

    This kind of match is called smart match, and it's also possible to use it outside of switch statements, via the new ~~ operator. See "Smart matching in detail" in perlsyn.

  4. state() variables

    A new class of variables has been introduced. State variables are similar to my variables, but are declared with the state keyword in place of my. They're visible only in their lexical scope, but their value is persistent: unlike my variables, they're not undefined at scope entry, but retain their previous value. (Rafael Garcia-Suarez, Nicholas Clark)

    To use state variables, one needs to enable them by using

        use feature 'state';

    or by using the -E command-line switch in one-liners. See "Persistent variables via state()" in perlsub.

  5. Lexical $_

    The default variable $_ can now be lexicalized, by declaring it like any other lexical variable, with a simple

        my $_;

    The operations that default on $_ will use the lexically-scoped version of $_ when it exists, instead of the global $_.

    In a map or a grep block, if $_ was previously my'ed, then the $_ inside the block is lexical as well (and scoped to the block).

    In a scope where $_ has been lexicalized, you can still have access to the global version of $_ by using $::_, or, more simply, by overriding the lexical declaration with our $_. (Rafael Garcia-Suarez)

[Feb 25, 2009] Perl 5.10 highlights

See also slides at Perl 5.10 for People Who Aren't Totally Insane. active state has precompiled version for many plarforms, see Perl 5.10. Span have some details Rafal Garcia-Suarez - perl-5.10.0 - search.cpan.org

[Dec 12, 2008] The A-Z of Programming Languages Perl

What new elements does Perl 5.10.0 bring to the language? In what way is it preparing for Perl 6?

Perl 5.10.0 involves backporting some ideas from Perl 6, like switch statements and named pattern matches. One of the most popular things is the use of say instead of print.

This is an explicit programming design in Perl easy things should be easy and hard things should be possible. It's optimised for the common case. Similar things should look similar but similar things should also look different, and how you trade those things off is an interesting design principle.

Huffman Coding is one of those principles that makes similar things look different.

In your opinion, what lasting legacy has Perl brought to computer development?

An increased awareness of the interplay between technology and culture. Ruby has borrowed a few ideas from Perl and so has PHP. I don't think PHP understands the use of signals, but all languages borrow from other languages, otherwise they risk being single-purpose languages. Competition is good.

It's interesting to see PHP follow along with the same mistakes Perl made over time and recover from them. But Perl 6 also borrows back from other languages too, like Ruby. My ego may be big, but it's not that big.

Where do you envisage Perl's future lying?

My vision of Perl's future is that I hope I don't recognise it in 20 years.

Where do you see computer programming languages heading in the future, particularly in the next 5 to 20 years?

Don't design everything you will need in the next 100 years, but design the ability to create things we will need in 20 or 100 years. The heart of the Perl 6 effort is the extensibility we have built into the parser and introduced language changes as non-destructively as possible.

Linux Today's comments

> Given the horrible mess that is Perl (and, BTW,
> I derive 90% of my income from programming in Perl),
.
Did the thought that 'horrible mess' you produce with $language 'for an income' could be YOUR horrible mess already cross your mind? The language itself doesn't write any code.

> You just said something against his beloved
> Perl and compounded your heinous crime by
> saying something nice about Python...in his
> narrow view you are the antithesis of all that is
> right in the world. He will respond with his many
> years of Perl == good and everything else == bad
> but just let it go...
.
That's a pretty pointless insult. Languages don't write code. People do. A statement like 'I think that code written in Perl looks very ugly because of the large amount of non-alphanumeric characters' would make sense. Trying to elevate entirely subjective, aesthetic preferences into 'general principles' doesn't. 'a mess' is something inherently chaotic, hence, this is not a sensible description for a regularly structured program of any kind. It is obviously possible to write (or not write) regularly structured programs in any language providing the necessary abstractions for that. This set includes Perl.
.
I had the mispleasure to have to deal with messes created by people both in Perl and Python (and a couple of other languages) in the past. You've probably heard the saying that "real programmers
can write FORTRAN in any language" already.
It is even true that the most horrible code mess I have
seen so far had been written in Perl. But this just means that a fairly chaotic person happened to use this particular programming language.

[Nov 7, 2008] Perl Express A Free Perl IDE-Editor for Windows.

Express is an unique and powerful integrated development environment (IDE) under Windows 98/Me/2000/XP/2003, includes multiple tools for writing and debugging your perl programs.

Perl Express is intended both for the experienced and professional Perl developers and for the beginners.

Since the version 2.5, Perl Express is free software without any limitations, registration is not required.

General Features

Multiple scripts for editing, running and debugging
Full server simulation
Completely integrated debugging with breakpoints, stepping, displaying variable values, etc.
Queries may be created from internal Web browser or Query editor
Test MySQL, MS Access... scripts for Windows
Interactive I/O
Multiple input files
Allows you to set environment variables used for running and debugging script
Customizable code editor with syntax highlighting, unlimited text size, printing, line numbering, bookmarks, column selection, powerful search and replace engine, multilevel undo/redo operations, margin and gutter, etc.
Highlighting of matching braces
Windows/Unix/Mac line endings support
OfficeXP-styled menus and toolbars
HTML, RTF export
Live preview of the scripts in the internal web browser
Directory Window
Code Library
Operation with the projects
Code Templates
Help on functions
Perl printer, pod viewer, table of characters and HTML symbols, and others

[Sep 21, 2008] Using Inline in Perl by Michael Roberts (michael@vivtek.com), Owner, Vitek

Jun 01, 2001 | developerworks

The new Inline module for Perl allows you to write code in other languages (like C, Python, Tcl, or Java) and toss it into Perl scripts with wild abandon. Unlike previous ways of interfacing C code with Perl, Inline is very easy to use, and very much in keeping with the Perl philosophy. One extremely useful application of Inline is to write quick wrapper code around a C-language library to use it from Perl, thus turning Perl into (as far as I'm concerned) the best testing platform on the planet.

Perl has always been pathetically eclectic, but until now it hasn't been terribly easy to make it work with other languages or with libraries that weren't constructed specifically for it. You had to write interface code in the XS language (or get SWIG to do that for you), build an organized module, and generally keep track of a whole lot of details.

But now things have changed. The Inline module, written and actively (very actively) maintained by Brian Ingerson, provides facilities to bind other languages to Perl. In addition its sub-modules (Inline::C, Inline::Python, Inline::Tcl, Inline::Java, Inline::Foo, etc.) allow you to embed those languages directly in Perl files, where they will be found, built, and dynaloaded into Perl in a completely transparent manner. The user of your script will never know the difference, except that the first invocation of Inline-enabled code takes a little time to complete the compilation of the embedded code.

The world's simplest Inline::C program

Just to show you what I mean, let's look at the simplest possible Inline program; this uses an embedded C function, but you can do substantially the same thing with any other language that has Inline support.


Listing 1. Inline "Hello, world"
use Inline C => <<'END_C';

void greet() {
  printf("Hello, world!
");
}
END_C

greet;

Naturally, what the code does is obvious. It defines a C-language function to do the expected action, and then it treats it as a Perl function thereafter. In other words, Inline does exactly what an extension module should do. The question that may be uppermost in your mind is, "How does it do that?". The answer is pretty much what you'd expect: it takes your C code, builds an XS file around it in the same way that a human extension module writer would, builds that module, then loads it. Subsequent invocations of the code will simply find the pre-built module already there, and load it directly.

You can even invoke Inline at runtime by using the Inline->bind function. I don't want to do anything more than dangle that tantalizing fact before you, because there's nothing special about it besides the point that you can do it if you want to.

[May 06, 2008] ack! - Perl-based grep replacement

There are some tools that look like you will never replace them. One of those (for me) is grep. It does what it does very well (remarks about the shortcomings of regexen in general aside). It works reasonably well with Unicode/UTF-8 (a great opportunity to Fail Miserably for any tool, viz. a2ps).

Yet, the other day I read about ack, which claims to be "better than grep, a search tool for programmers". Woo. Better than grep? In what way?

The ack homepage lists the top ten reasons why one should use it instead of grep. Actually, it's thirteen reasons but then some are dupes. So I'd say "about ten reasons". Let's look at them in order.

  1. It's blazingly fast because it only searches the stuff you want searched.

    Wait, how does it know what I want? A DWIM-Interface at last? Not quite. First off, ack is faster than grep for simple searches. Here's an example:

    $ time ack 1Jsztn-000647-SL exim_main.log >/dev/null
    real    0m3.463s
    user    0m3.280s
    sys     0m0.180s
    $ time grep -F 1Jsztn-000647-SL exim_main.log >/dev/null
    real    0m14.957s
    user    0m14.770s
    sys     0m0.160s
    

    Two notes: first, yes, the file was in the page cache before I ran ack; second, I even made it easy for grep by telling it explicitly I was looking for a fixed string (not that it helped much, the same command without -F was faster by about 0.1s). Oh and for completeness, the exim logfile I searched has about two million lines and is 250M. I've run those tests ten times for each, the times shown above are typical.

    So yes, for simple searches, ack is faster than grep. Let's try with a more complicated pattern, then. This time, let's use the pattern (klausman|gentoo) on the same file. Note that we have to use -E for grep to use extended regexen, which ack in turn does not need, since it (almost) always uses them. Here, grep takes its sweet time: 3:56, nearly four minutes. In contrast, ack accomplished the same task in 49 seconds (all times averaged over ten runs, then rounded to integer seconds).

    As for the "being clever" side of speed, see below, points 5 and 6

  2. ack is pure Perl, so it runs on Windows just fine.

    This isn't relevant to me, since I don't use windows for anything where I might need grep. That said, it might be a killer feature for others.

  3. The standalone version uses no non-standard modules, so you can put it in your ~/bin without fear.

    Ok, this is not so much of a feature than a hard criterion. If I needed extra modules for the whole thing to run, that'd be a deal breaker. I already have tons of libraries, I don't need more undergrowth around my dependency tree.

  4. Searches recursively through directories by default, while ignoring .svn, CVS and other VCS directories.

    This is a feature, yet one that wouldn't pry me away from grep: -r is there (though it distinctly feels like an afterthought). Since ack ignores a certain set of files and directories, its recursive capabilities where there from the start, making it feel more seamless.

  5. ack ignores most of the crap you don't want to search

    To be precise:

    • VCS directories
    • blib, the Perl build directory
    • backup files like foo~ and #foo#
    • binary files, core dumps, etc.

    Most of the time, I don't want to search those (and have to exclude them with grep -v from find results). Of course, this ignore-mode can be switched off with ack (-u). All that said, it sure makes command lines shorter (and easier to read and construct). Also, this is the first spot where ack's Perl-centricism shows. I don't mind, even though I prefer that other language with P.

  6. Ignoring .svn directories means that ack is faster than grep for searching through trees.

    Dupe. See Point 5

  7. Lets you specify file types to search, as in --perl or --nohtml.

    While at first glance, this may seem limited, ack comes with a plethora of definitions (45 if I counted correctly), so it's not as perl-centric as it may seem from the example. This feature saves command-line space (if there's such a thing), since it avoids wild find-constructs. The docs mention that --perl also checks the shebang line of files that don't have a suffix, but make no mention of the other "shipped" file type recognizers doing so.

  8. File-filtering capabilities usable without searching with ack -f. This lets you create lists of files of a given type.

    This mostly is a consequence of the feature above. Even if it weren't there, you could simply search for "."

  9. Color highlighting of search results.

    While I've looked upon color in shells as kinda childish for a while, I wouldn't want to miss syntax highlighting in vim, colors for ls (if they're not as sucky as the defaults we had for years) or match highlighting for grep. It's really neat to see that yes, the pattern you grepped for indeed matches what you think it does. Especially during evolutionary construction of command lines and shell scripts.

  10. Uses real Perl regular expressions, not a GNU subset

    Again, this doesn't bother me much. I use egrep/grep -E all the time, anyway. And I'm no Perl programmer, so I don't get withdrawal symptoms every time I use another regex engine.

  11. Allows you to specify output using Perl's special variables

    This sounds neat, yet I don't really have a use case for it. Also, my perl-fu is weak, so I probably won't use it anyway. Still, might be a killer feature for you.

    The docs have an example:

    ack '(Mr|Mr?s)\. (Smith|Jones)' --output='$&'
  12. Many command-line switches are the same as in GNU grep:

    Specifically mentioned are -w, -c and -l. It's always nice if you don't have to look up all the flags every time.

  13. Command name is 25% fewer characters to type! Save days of free-time! Heck, it's 50% shorter compared to grep -r

    Okay, now we have proof that not only the ack webmaster can't count, he's also making up reasons for fun. Works for me.

Bottom line: yes, ack is an exciting new tool which partly replaces grep. That said, a drop-in replacement it ain't. While the standalone version of ack needs nothing but a perl interpreter and its standard modules, for embedded systems that may not work out (vs. the binary with no deps beside a libc). This might also be an issue if you need grep early on during boot and /usr (where your perl resides) isn't mounted yet. Also, default behaviour is divergent enough that it might yield nasty surprises if you just drop in ack instead of grep. Still, I recommend giving ack a try if you ever use grep on the command line. If you're a coder who often needs to search through working copies/checkouts, even more so.

Update

I've written a followup on this, including some tips for day-to-day usage (and an explanation of grep's sucky performance).

Comments

Ren "Necoro" Neumann writes (in German, translation by me):

Stumbled across your blog entry about "ack" today. I tried it and found it to be cool :). So I created two ebuilds for it:

Just wanted to let you know (there is no comment function on your blog).

[Mar 11, 2008] Perl Tutorial 19: Functions lc, uc, lcfirst, ucfirst

Youtube has educational potential

YouTube

[Mar 5, 2008] The New York Times Perl Profiler By Adam Kaplan

Tags: nytprof, open projects, Perl

I work in the NYTimes.com feeds team. We handle retrieving, parsing and transforming incoming feeds from whatever strange proprietary format our partners choose to give us into something that our CMS can digest. As you can imagine, we deal with a huge amount of text processing. To handle all of these transformations as efficiently as possible we rely heavily on the magic of Perl. Recently, as feeds become more and more important, we have begun to feel pains caused by past impromptu segments of inefficient code written to meet quick, episodic deadlines. A situation that we are especially prone to as a fast moving news organization.

I am a relatively new employee here at NYTimes.com and one of my responsibilities is to create tools to help ensure the integrity and scalability of our code. To this end, I would like to introduce you to The New York Times Perl Profiler, or Devel::NYTProf. The purpose of this tool is to allow developers to easily profile Perl code line-by-line with minimal computational overhead and highly visual output. With only one additional command, developers can generate robust color-coded HTML reports that include some useful statistics about their Perl program. Here is the typical usage:

perl -d:NYTProf myslowcode.pl
nytprofhtml

See? Its easy! nytprofhtml is an implementation of the included reporting interface (Devel::NYTProf::Reader). If you dont want HTML reports, you can implement your own format with relative ease. If you create something cool, be sure to let me know via CPAN patch request or open@nytimes.com. Detailed instructions can be found in the documentation and source code on CPAN.

You can see sample screen shots of the html reports index pageand a single module report.

Similar tools exist to profile Perl code. Devel::DProf is the ubiquitous profiler, but it only collects information about subroutine calls. Because of this limitation, its not all that helpful in finding that elusive broken regex in a 75-line subroutine of regex transforms. Devel::FastProf is another per-line profiler, however I found its output difficult to coerce into HTML. It also doesnt support non-Linux systems (we need at least Solaris and Ubuntu/Linux support).

Devel::NYTProf is available as a distribution on the CPAN. You may install by typing install Devel::NYTProf in the cpan command-line application, or manually by downloading the tarball from CPAN.

We were able to reduce the long runtime on one particular application by 20% (about a minute) after the very first test run of our profiler. We hope that you will find our tool as useful as we have. Of course, any comments and suggestions are welcome!

[Feb 21, 2008] Free Perl Books - freeprogrammingresources.com

  1. Perl 5 by Example Online Perl Book. 22 chapters with appendixes.
  2. Beginning Perl Very complete (and completely free) Perl Beginners book, both HTML and downloadable (PDF).
  3. Practical mod_perl This free perl book is available in html or pdf versions, so you can view the perl book online or download this free book.
  4. Extreme Perl Extreme Perl is a book about Extreme Programming using the programming language Perl. This free Perl ebook is available in HTML, PDF, or A4 PDF.
  5. Learning Perl the Hard Way Learning Perl the Hard Way is a free book available under the GNU Free Documentation License. This free perl ebook can be downloaded in pdf or gzipped postscript format.
  6. Web Client Programming with Perl Free Online Perl Book
  7. The Perl Reference Guide The guide contains a concise description of all Perl 5 statements, functions, variables and lots of other useful information.
  8. Perl Reference Guide & Perl Pocket Reference (PDF Link) Short Perl reference book in pdf form.
  9. CGI Programming on the World Wide Web This is an out of print book from 1996 that is available from Oreilly.
  10. Beginning Perl for Bioinformatics (Sample Chapter) GenBank (Genetic Sequence Data Bank) is a rapidly growing international repository of known genetic sequences from a variety of organisms. Its use is central to modern biology and to bioinformatics.
  11. CGI Programming with Perl, 2nd Edition (Sample Chapter) Security.
  12. Advanced Perl Programming (Sample Chapter) Chapter 1: Data References and Anonymous Storage
  13. Programming Web Services with Perl (Sample Chapter) One chapter on Soap.
  14. Oreilly Sample Chapters Quite a few sample chapters from perl books are indexed here (some have already been linked to individually).

[Jan 6, 2008] freshmeat.net Project details for Wendy Site Engine

Wendy is Perl framework for Web sites and services development. It works with mod_perl 2 and PostgreSQL. Built with security and performance in mind, Wendy supports DB servers clustering, separate read- and write- DB back-ends, data cache with memcached, templates cache, etc.

Release focus: Initial freshmeat announcement

[Dec 19, 2007] No Comments

My favorite (so far) programming language has been born 20 years ago. Its been loved and hated. Its been praised and damned. Its been complimented and criticized. But all that doesnt matter. What matters is that it has been helping people all over the world to solve problems. Tricky, boring, annoying problems. It provided enough power to build enterprise grade applications, while still being easy and flexible enough to be the super-glue of many systems.

Im sure Perl will still be with us in another 20 years. I wish it to be as useful in that time, as it is now.

Thanks, respect, and best wishes to everyone who created and supported Perl, its community and tools all these years. Happy birthday!

freshmeat.net Project details for pixconv.pl

pixconv.pl is a Perl script to rename (yyyymmdd_nnn.ext), (auto-)rotate, resize, scale, grayscale, watermark, borderize, and optimize digital images.

Release focus: Major feature enhancements

Changes:
-b/-B border and -C border color options were added along with a -m match images orientation (landscape or portrait) option. EXIF manipulation was fixed. A -R resize option was added for correctly resizing portrait images. Handling of images with whitespace in their filename was fixed

Author:
Iain Lea [contact developer]

Perl Resource Center Perl eBooks

Three free Perl e-books
"Learning Perl the Hard Way" http://www.greenteapress.com/perl/ Free eBook: "Learning Perl the Hard Way" by Allen B. Downey, is designed for programmers who do not know Perl. Open source book available under the GNU Free Documentation License. Users can distribute, copy and modify the content.
"Extreme Perl"
http://www.extremeperl.org/bk/home
Free eBook: "Extreme Perl" by Robert Nagler. Covers extreme programming (an approach to software development that emphasizes business results and involves rapid iteration, code writing and continuous testing), release planning, iteration planning, pair programming, tracking, acceptance testing, coding style, logistics, test-driven design, continuous design, unit testing, refactoring and SMOP.
"Beginning Perl"
http://learn.perl.org/library/beginning_perl/
Free eBook: "Beginning Perl" by Simon Cozens. Fourteen chapter book covers simple values, lists and hashes, loops and decisions, regular expressions, files and data, references, subroutines, running and debugging in Perl, modules, object-oriented Perl, CGI, databases and more.

[Dec 17, 2007] Kazi 1.0 by Luka Novsak

Indexer of file tree written in Perl. Looks like limited to HTML files but can probably be extended to other types

About: Kazi is a simple content management system. It takes a directory tree populated with HTML files, and builds a menu of it. It can be extended with modules and customized with templates.

[Dec 9, 2007] freshmeat.net Project details for Host Grapher

Host Grapher is a very simple collection of Perl scripts that provide graphical display of CPU, memory, process, disk, and network information for a system. There are clients for Windows, Linux, FreeBSD, SunOS, AIX and Tru64. No socket will be opened on the client, nor will SNMP be used for obtaining the data.

[Dec 7, 2007] freshmeat.net Project details for perltidy

Perltidy is a Perl script indenter and beautifier. By default it approximately follows the suggestions in perlstyle(1), but the style can be adjusted with command line parameters. Perltidy can also write syntax-colored HTML output.

Release focus: Minor feature enhancements

[Dec 7, 2007] freshmeat.net Project details for XHTML Family Tree Generator

XHTML Family Tree Generator is a CGI Perl script together with some Perl modules that will create views of a family tree. Data can be stored in a database or in a data file. The data file is either a simple text (CSV), an Excel, or GEDCOM file listing the family members, parents, and other details. It is possible to show a tree of ancestors and descendants for any person, showing any number of generations. Other facilities are provided for showing email directories, birthday reminders, facehall, and more. It has a simple configuration, makes heavy use of CGI (and other CPAN modules), generates valid XHTML, and has support for Unicode and multiple languages.

Release focus: N/A

Changes:
Romanian language support has been added, and the code has been cleaned up.

[Dec 6, 2007] freshmeat.net Project details for Sman

Sman is "The Searcher for Man Pages", an enhanced version of "apropos" and "man -k". Sman adds several key abilities over its predecessors, including stemming and support for complex boolean text searches such as "(linux and kernel) or (mach and microkernel)". It shows results in a ranked order, optionally with a summary of the manpage with the searched text highlighted. Searches may be applied to the manpage section, title, body, or filename. The complete contents of the man page are indexed. A prebuilt index is used to perform fast searches.

[Dec 2, 2007] freshmeat.net Project details for PodBrowser

PodBrowser is a documentation browser for Perl. It can be used to view the documentation for Perl's builtin functions, its "perldoc" pages, pragmatic modules, and the default and user-installed modules. It supports bookmarks, printing, and integration with the CPAN search site.

[Dec 1, 2007] freshmeat.net Project details for ConfigGeneral

With Config::General you can read and write config files and access the parsed contents from a hash structure. The format of config files supported by Config::General is inspired by the Apache config format (and is 100% compatible with Apache configs). It also supports some enhancements such as here-documents, C-style comments, and multiline options.

Release focus: Major bugfixes

Changes:
The variable interpolation code has been rewritten. This fixes two bugs. More checks were added for invalid structures. More tests for variable interpolation were added to "make test".

[Nov 30, 2007] BBC - Radio Labs - Perl on Rails by Tom Scott

| www.bbc.co.uk/blogs/radiolabs

Like most organisations the BBC has its own technical ecosystem; the BBC's is pretty much restricted to Perl and static files. This means that the vast majority of the BBC's website is statically published - in other words HTML is created internally and FTP'ed to the web servers. There are then a range of Perl scripts that are used to provide additional functionality and interactivity.

While there are some advantages to this ecosystem there are also some obvious disadvantages. And a couple of implication, including an effective hard limit on the number of files you can save in a single directory (many older, but still commonly used, filesystems just scan through every file in a directory to find a particular filename so performance rapidly degrades with thousands, or tens of thousands, of files in one directory), the inherent complexity of keeping the links between pages up to date and valid and, the sheer number of static files that would need to be generate to deliver the sort of aggregation pages we wanted to create when we launched /programmes; let alone our plans for /music and personalisation.

What we wanted was a dynamic publishing solution - in other words the ability to render webpages on the fly, when a user requests them. Now obviously there are already a number of existing frameworks out there that provide the sort of functionality that we needed, however none that provided the functionality and that could be run on the BBC servers. So we (the Audio and Music bit of Future Media and Technology - but more specifically Paul, Duncan, Michael and Jamie) embarked on building a Model-view-controller (MVC) framework in Perl.

For applications that run internally we use Ruby on Rail. Because we enjoy using it, its fast to develop with, straight forward to use and because we use it (i.e. to reduce knowledge transfer and training requirements) we decided to follow the same design patterns and coding conventions used in Rails when we built our MVC framework. Yes that's right we've built Perl on Rails.

This isn't quite as insane as it might appear. Remember that we have some rather specific non-functional requirements. We need to use Perl, there are restrictions on which libraries can and can't be installed on the live environment and we needed a framework that could handle significant load. What we've built ticks all those boxes. Our benchmarking figures point to significantly better performance than Ruby on Rails (at least for the applications we are building), it can live in the BBC technical ecosystem and it provides a familiar API to our web development and software engineering teams with a nice clean separation of duties with rendering completely separated from models and controllers.

Using this framework we have launched /programmes. And because the pages are generated dynamically we can aggregate and slice and dice the content in interesting ways. And nor do we have to sub divide our pages into arbitrary directories on the web server - the BBC broadcasts about 1,400 programmes a day which means if we created a single static file for each episode we would start to run into performance problems within a couple of weeks.

Now since we've gone to the effort of building this framework and because it can be used to deliver great, modern web products we want to use it elsewhere. As I've written about elsewhere we are working on building an enhanced music site built around a MusicBrainz spine. But that's just my department - what about the rest of the BBC?

In general the BBC's Software Engineering community is pretty good at sharing code. If one team has something that might be useful elsewhere then there's no problem in installing it and using it elsewhere. What we're not so good at is coordinating our effort so that we can all contribute to the same code base - in short we don't really have an open source mentality between teams - we're more cathedral and less bazaar even if we freely let each other into our cathedrals.

With the Perl on Rails framework I was keen to adopted a much more open source model - and actively encouraged other teams around the BBC to contribute code - and that's pretty much what we've done. In the few weeks since the programmes beta launch JSON and YAML views have been written - due to go live next month. Integration with the BBC's centrally managed controlled vocabulary - to provide accurate term extraction and therefore programme aggregation by subject, person or place - is well underway and should be with us in the new year. And finally the iPlayer team are building the next generation iPlayer browse using the framework. All this activity is great news. With multiple teams contributing code (rather than forking it) everyone benefits from faster development cycles, less bureaucracy and enhanced functionality.

Comments

======

  1. At 12:37 AM on 01 Dec 2007, Anonymous Perl Lover wrote:

    Any reason U didn't use Catalyst, Maypole, Combust, CGI::Application, CGI::Prototype, or any of the dozens of other perl MVC frameworks?

    Catalyst was around long before Ruby on Rails (possibly before the Ruby language for that matter), but never made the kind of headlines RoR gets. The Ruby community seems to be much better at mobilizing.

    Actually, I think it's the Perl community's TMTOWTDI lifestyle. In Ruby, for small things *maybe* U use Camping, but you'll probably use Rails and for everything else you'll definitely use Rails. There are some others, but only the developers of them use them. In Perl, literally everyone writes their own.

    Inferior languages like Java and C# rose up real quick and stayed there--keep getting bigger even--because they limit their users' choices. Perl stayed in the background and is now dying because it believes in offering as many choices as possible. That's why Perl 666 is going to be more limiting. As U can tell from my subtle gibe that the next version of Perl is evil, I prefer choices. But developers like me are a dying breed.

    Developers now-a-days need cookie cutter, copy&paste code. When's the Perl on Rails book going to be released? Probably around the time the Catalyst one is. Or the CGI::Application one.

    Bleh. I wrote way too much. U can't even put this up now, it's too long. I didn't realize I was so annoyed by the one jillion perl MVC web frameworks and how they're just one tiny example of why perl is dead.

  2. At 01:20 AM on 01 Dec 2007, Anon wrote:

    > The Ruby community seems to be much better at
    > mobilizing.

    I really think that first video demo of RoR using Textmate is what had a large effect. Before that, I don't remember seeing hardly any videos of development happening right in front of your eyes.

    You watched the video thinking, "wow! it's so fast and easy! I'm gonna get in on that!". When, in reality, any good programmer using any good environment can make a software look good like that (if they practice a bit beforehand).

    As an aside, anyone know of a video demo podcast for Catalyst?

  3. At 07:14 AM on 01 Dec 2007, Dave Cross wrote:

    Others have already commented that you seem to be reinventing the wheel here. No-one seems to have mentioned the Perl MVC framework which (in my opinion) is most like Rails - it's called Jifty (http://jifty.org/).

    But there already parts of the BBC who are using Catalyst with great success. Why didn't you ask around a bit before embarking on what was presumably not a trivial task?

  4. At 09:52 AM on 01 Dec 2007, Raips wrote:


    How about BBC doing same as New York Times did?

    http://www.linux.com/feature/120359
    http://open.nytimes.com/

    Complain about this post

[Oct 27, 2007] UNIX System Administration Tools

rshall
Runs commands on multiple remote hosts simultaneously. (Perl)
View the README
Download version 11.0 - gzipped tarball, 9 KB
Last update: November 2005

[Oct 27, 2007] UNIX System Administration Tools

autosync
Copies files to remote hosts based on a configuration file. (Perl)
View the README
Download version 1.4 - gzipped tarball, 5 KB
Last update: April 2007

[Sep 6, 2007] Komodo Spawns New Open Source IDE Project

"In February, ActiveState released a free version of its flagship Komodo IDE called Komodo Edit, and that release was a prelude to going open source. Open Komodo is only a subset of Edit, though. "

September 6, 2007
Komodo Spawns New Open Source IDE Project
By Sean Michael Kerner

Development tools vendor ActiveState is opening up parts of its Komodo IDE (define) in a new effort called Open Komodo.

Komodo is a Mozilla Framework-based application that uses Mozilla's XUL (XML-based User Interface Language), which is Mozilla's language for creating its user interface.

The Open Komodo effort will take code from ActiveState's freely available, but not open source, Komodo Edit product and use it as a base for the new open source IDE. The aim is to create a community and a project that will help Web developers to more easily create modern Web-based applications.

"This is our first entry into managing an open source project," Shane Caraveo, Komodo Dev Lead, told Internetnews.com. "We want to start with a tight focus on what we want to accomplish and that focus is supporting the Open Web with a development environment."

Caraveo explained that back in February, ActiveState released a free version of its flagship Komodo IDE called Komodo Edit, and that release was a prelude to going open source. Open Komodo is only a subset of Edit, though.

"We're focusing first strictly on Web development," Caraveo said. "So some of the language support for backend dynamic languages will not be available as open source. They will still be available for free in Edit and possibly as extensions to Open Komodo."

The idea behind creating a fully open source IDE for Web development has been percolating for over a year at ActiveState, according to Caraveo. He said there are also a lot of people in the Mozilla community that have been discussing the creation of an IDE.

"I feel there is no need for them to start from nothing, which is a large investment," Caraveo said. "Since we were a couple months from having everything done, I felt it was a good time to announce, so we can start to talk with people in the community about Komodo from a standpoint that they are willing to work with."

A build of the Open Komodo code base that actually works is expected by late October or early November. That build according to Caraveo will look and work much like Komodo Edit does now.

"We want to be sure that people have something they can play with and actually use immediately, even if it is not the product we want in the end," Caraveo said.

The longer-term project is something called Komodo Snapdragon. The intention of Snapdragon is to provide a top-quality IDE for Web development that focuses on open technologies, such as AJAX, HTML/XML, JavaScript and more.

"We want to provide tight integration into other Firefox-based development tools as well," Caraveo explained. "This would target Web 2.0 applications, and next-generation Rich Internet Applications."

With many IDEs already out in a crowded marketplace for development tools, Open Komodo's use of Mozilla's XUL (pronounced "zule") may well be its key differentiators.

"A XUL-based application uses all the same technologies that you would use to develop an advanced Web site today," Caraveo said.

"This includes XML, CSS and JavaScript. This type of platform allows people who can develop Web site to develop applications. So, I would say that this is an IDE that Web developers can easily modify, hack, build, extend, without having to learn new languages and technologies."

Being open and accessible are critical to the success of Open Komodo; in fact Caraveo noted that the No. 1 success factor is community involvement.

"If Snapdragon is only an ActiveState project, then it has not succeeded in the way we want it to."

Picking Up Perl by Bradley M. Kuhn

[Jul 12, 2007] Minimal Perl

See also Manning Minimal Perl

Only on author site

[May 3, 2007] Python, Tcl and Perl, oh my! (was Re tcl vs. perl) - comp.lang.perl.tk Google Groups

Jun 26 1996 (Dan Connolly)

Sorry for the length, but I felt inspired tonight...

In article <TMB.96Jun17182...@best.best.com> t...@best.com (.) writes:

>
> In article <Pine.SUN.3.93.960617173341.9643A-100...@blackhole.dimensional.com> Kirk Haines <oshcn...@dimensional.com> writes:
>
> > Well it's probably just my stupidity
> > (and that of everyone else who works here) but I've got about 50 Perl
> > scripts that do god knows what, and the people who wrote them left,
> > and we are experiencing excruciating pain.
>
> And that is not a situation in the least bit related to Perl. That is the
> fault of whoever wrote those scripts [...]

> Of course it is _related_ to Perl. Yes, you can write better or worse
> Perl code.

> In fact, one way management can bring about good coding styles without > examining each and every line of code is by choosing tools and > languages that enforce some aspects of good coding styles. Perl isn't > one of those languages.

Short form: (1) there's a tension between early detection of faults and rapid prototyping, and perl and python are at very different points on the spectrum. (2) it's more the community around a language than the language itself that influences code quality. (3) For my purposes, perl will continue to be a work-horse tool, but I'll be using Java more for things that I would have used python or Modula-3 for, and I hope the industry uses Java for things that it has been using C++ for.

Long form:

(1) Traditional perl programming is a black art, but a darned useful craft as well. The semantics are very powerful, and the syntactic features combine in amazingly powerful ways. But you definitely have enough rope to hang yourself; not enough to hang the machine or crash all the time, like the way you can corrupt the runtime in C by writing past the end of an array or calling free() twice. But like C, you can introduce subtle logic bugs by using = where you meant ==. And failing to check return values results in a program that nearly always reports successful completion, whether it really succeeded or not.

I like studying and learning programming languages, and I found it more difficult to build the necessary intuitions to read and write traditional perl programs than to build intuitions for any language I leaned previously, and nearly every language I learned since.

I learned perl "from a master" -- Tom Christiansen was in the next office, and he painstakingly (if not patiently :-) answered my many frustrated questions. Previous to learning perl, I had learned a dozen or so languages without much difficulty (here in roughly the order I learned them):

Extended BASIC (Radio Shack Color Computer) learned from a book, disassembled the interpreter 6809 assembler learned from a book with a friend, and from disassembling LOTS of stuff Basic09 learned from the manuals, with help from BBS folks Logo learned in a store one afternoon, reading a book Pascal learned one summer from a college professor C read a book one weekend COBOL learned at a summer job shell misc. hacking in school LISP read some books, hacked on TI lisp machine in a class prolog programming languages course Modula-2 programming languages course Ada programming languages course

Learning assembly after basic was tough: "Where are the variables? Geez.. rebooting the machine all the time is a pain. I wish this thing had automatic string handling." And I'm not sure I ever grokked Ada's rendezvous stuff completely. And I learned COBOL in a strictly monkey-see-monkey-do manner. It was months before I found a manual.

None of those were particularly unexpected difficulties. But after an intial taste of perl, it looked really easy and powerful, and I was frustrated when the first few real programs I tried to write had bugs that I just could not figure out at all.

Really learning to use regexps was well worth the effort, but things like "surprise! <FILE> works completely different in an array context!" was an experience I don't care to repeat. I can't remember the exact program that drove me batty, but it was related to:

$x = <FILE>; # reads one line @x = <FILE>; # reads whole file, split into lines

but the idiom I used that created an array context wasn't as transparent as @x -- it was something like grep() or chop(). Ah yes, I think it was chop(). Who would have guessed that

chop(<XXX>);

would read thew whole file?

Perl is full of short-hand idioms that are so useful that knowledgeable perl programmer's would feel awkward writing them out long-hand, and yet they can throw newbies for a loop. For example, the work-horse idiom:

while(<>){ ...; }

is short for:

while( ($_ = <STDIN>) gt ''){ ...; }

roughly speaking; that is, ignoring the tremendously useful feature of <> which processes files mentioned on the command line (aka @ARGV) ala traditional unix filters, which would take me about 10 or 20 lines to write out longhand, and about an hour to get just right. Ah! and I forgot to mention that <XXX> is an idiom for reading one line from a file... and lines are delimited by the magic $/, and ...

The point is that even as of several years ago, perl is a highly-evolved, highly idiomatic language and tool, based on zillions of person-years of use in unix system administration. The vast majority of text-processing/system management tasks that folks might want to hack up a script to tackle can be developed quickly, expressed succincly, and run efficiently in perl.

The first crack usually looks like:

while(<>){ if(/X-Diagnostic: (.*)/){ print "diagnostic: $1\n"; } }

and it usually works great the first time you try it. Then you add a few wrinkles, and before you know it, the task you set out to do is solved.

Taking that piece of code that solves a particular problem, and software-engineering it usually takes about 10x longer than it took to develop in the first place (as these tasks are often personal and transient, it's rarely worth the trouble anyway).

The author of the hack is generally in a position to restrict the inputs to reasonable stuff (eliminating the need to deal with corner cases) and check the output by hand (eliminating the need to document and report errors in typical engineering fashion).

This is very much in contrast with other languages, where the cost of solving the immediate problem may be significantly higher, but the result is much more likely to have good software engineering characteristics, such that it's useful to other folks or other projects with little added effort.

For example, Olin Shivers described his experience writing ML programs: they are a royal pain to get through the type checker, but once they compile, they are often bug-free.

Python isn't that far along the quick-and-dirty vs. slow-and-clean spectrum, but it's in that direction.

Contrast the work-horse example above with a loose translation to python:

import sys while 1: line = sys.stdin.readline() if not line: break ...

incorporating the @ARGV parts of <> would expand it to something like:

import sys

for f in sys.argv[1:]: in = open(f) while 1: line = sys.stdin.readline() if not line: break ...

Python doesn't have special syntax for this sort of thing. So the python code is more verbose and less idiomatic -- easier to grok for the newbie, but harder to "pattern match," or recognize as a common idiom for the seasoned programmer.

For an example of the stylistic slants of the two languages, consider error/exception conditions. As a rule, in perl, errors are reported as particular return values, whereas in python, they signal exceptions. So in the error case, a perl code fragment will run merrily along, while a python code fragment will trap out. In many text-processing tasks, running merrily along is just what you want. But when you hand that code to your friend, and he presents it with some input that you never considered, python is a lot more likely to let your friend know that the program needs to be enhanced to handle the new situation.

I've seen exception idioms in perl, but they involve die and EVAL. The runtime libraries don't die on errors, as a rule, and EVAL is a pretty hairy way to do something as mundane as error handling.

Next, consider naming and scope. By default, perl variables are global, so you almost never have to declare them. Local variables have dynamic scope by default (ala early lisp systems) and traditional statically scoped variables are a perl5 innovation.

On the flip side, python variables are local by default, so you almost never have to worry about the variable clobbering problem. (python has some semantic gotchas of its own here for the folks who have intuitions about traditional static scoping)

So far, I have discussed mostly the intrinsic aspects of a language that vary along the quick-and-dirty vs. slow-and-clean spectrum.

But the point of this article is that:

(2) The comunity around a language -- i.e. the conventional wisdom, history, documentation, and available source code -- has a lot more influence of the quality of code developed in a given language than the intrinsic aspects of the language itself.

For example, it's perfectly possible to write clean, well structured programs in Fortran. But the bulk of traditional fortran has no comments or indentation, and lots of GOTOs, global variables, and aliased variables. The mindset behind fortran was that hand-optimization was superior to machine-optimization -- a mindset left over from assembler, and popularized by bad compilers.

COBOL has some really bad features (e.g. lack of local variables) that make writing good programs hard, but don't come close to explaining the astoundinly uninspired programming techniques I've seen employed in some business/database apps I've seen. Stuff like writing 12 paragraphs (subroutines, or functions to the modern world) -- one for each month of the year, with 12 sets of variables jan-X, jan-Y, feb-X, feb-Y, etc., rather than using loops and arrays, which DO exist in COBOL.

Perl, as a language, is evolving faster than the perl development community. Perl5 in strict mode a reasonable modern object-oriented programming language. But there are ZILLIONs of perl programmers, and from what I can see, about 2% of them bought into the new facilities. The rest of them are still happily getting their jobs done writing perl4 code -- myself included.

Perl was useful and widely deployed before the OOP "paradigm-shift" hit the industry. And a community with that much momentum doesn't turn on a dime.

In contrast, python started from scratch after some earlier languages, and had the benefit of looking back at REXX, icon, and perl, as well as C++ and -- most importantly -- Modula-3. So documentation encouraged some pretty modern concepts like objects and modules while the python development community was still young.

As a result, consider the namespace of functions in the two systems: the languages have roughly equivalent support: python has modules, and perl has packages. But you might not know that from looking at most of the code you see on the net: traditional perl folks rarely use the $package`var stuff, while python folks use it routinely. The perl5 movement is quickly changing this, but until recently, perl programmers use the vast majority of perl's facilities without ever considering packages, while python programmers run into the concept of modules in the early tutorials.

For me, the bottom line is that I do a lot of quick-and-dirty stuff, and I'm comfortable with perl4's idioms, so I use it a lot. I have dabbled in perl5, but I'm not yet comfortable with it's OOP idioms.

I prefer the feel and syntax of python, but the "strictness" often gets in the way, and I end up switching to perl in order to finish the task before leaving for the day.

When I want to write "correct" programs, neither is good enough. I want lots more help from the machine, like static typechecking. And sad to say, when I want to write code that other folks will use, I choose C.

As much as the industry adopted C++, I find it frightening. It requires all the priestly knowledge and incantations of perl with none of the rapid-prototyping benefits, gives no more safety guarantees than C, and has never been specified to my satisfaction.

Modual-3 was more fun to learn than I had had in years. The precision, simplicity, and discipline employed in the design of the language and libraries is refreshing and results in a system with amazing complexity management characteristics.

I have high hopes for Java. I will miss a few of Modula-3's really novel features. The way interfaces, generics, exceptions, partial revelations, structural typing + brands come together is fantastic. But Java has threads, exceptions, and garbage collection, combined with more hype than C++ ever had.

I'm afraid that the portion of the space of problems for which I might have looked to python and Modula-3 has been covered -- by perl for quick-and-dirty tasks, and by Java for more engineered stuff. And both perl and Java seem more economical than python and Modula-3.

Dan --
Daniel W. Connolly "We believe in the interconnectedness of all things"
Research Scientist, MIT/W3C PGP: EDF8 A8E4 F3BB 0F3C FD1B 7BE0 716C FF21
<conno...@w3.org> http://www.w3.org/pub/WWW/People/Connolly/

[Apr 28, 2007] freshmeat.net Project details for DocPerl

DocPerl provides a Web-based interface to Perl's Plain Old Documentation (POD). It is a graphical, easy-to-use interface to POD, automatically listing all installed modules on the local host, and any other nominated directories containing Perl files. DocPerl can also display a summary of the APIs defined by files and the code of those files. It can search the POD documentation for module names and for functions defined in modules.

Release focus: Minor bugfixes

Changes:
This release includes fixes for many minor bugs, including the removal of a configuration option that should not have been removed, and many JavaScript issues. The code has been tidied up.

[Mar 26, 2007] freshmeat.net Project details for Perl Dev Kit

Perl Dev Kit 7.0 released...
The Perl Dev Kit (PDK) provides essential tools for building self-contained, easily deployable executables for Windows, Mac OS X, Linux, Solaris, AIX, and HP-UX. The comprehensive feature set includes a graphical debugger and code coverage and hotspot analyzer, as well as tools for building sophisticated Perl-based filters and easily converting useful VBScript code to Perl.

Release focus: Major feature enhancements

Changes:
A coverage and hotspot analyzer tool, PerlCov, was added for better code performance and reliability. PerlApp was improved with more sophisticated module wrapping to improve executable performance. By popular demand, PDK support has been extended to Mac OS X. New native 64-bit support was dded for Windows (x64), Linux (x64), and Solaris (Sparc). New Solaris and AIX GUIs were added.

Author:
Activator [contact developer]

[Mar 13, 2007] Programming in Perl - Debugging

On this page, I will post aides and tools that Perl provides which allow you to more efficently debug your Perl code. I will post updates as we cover material necessary for understanding the tools mentioned.
CGI::Dump
Dump is one of the functions exported in CGI.pm's :standard set. It's functionality is similar to that of Data::Dumper. Rather than pretty-printing a complex data structure, however, this module pretty-prints all of the parameters passed to your CGI script. That is to say that when called, it generates an HTML list of each parameter's name and value, so that you can see exactly what parameters were passed to your script. Don't forget that you must print the return value of this function - it doesn't do any printing on its own.
use CGI qw/:standard/;
print Dump;
Benchmark
As you know by now, one of Perl's mottos is "There's More Than One Way To Do It" (TMTOWTDI ). This is usually a Good Thing, but can occasionally lead to confusion. One of the most common forms of confusion that Perl's verstaility causes is wondering which of multiple ways one should use to get the job done most quickly.

Analyzing two or more chunks of code to see how they compare time-wise is known as "Benchmarking". Perl provides a standard module that will Benchmark your code for you. It is named, unsurprisingly, Benchmark. Benchmark provides several helpful subroutines, but the most common is called cmpthese(). This subroutine takes two arguments: The number of iterations to run each method, and a hashref containing the code blocks (subroutines) you want to compare, keyed by a label for each block. It will run each subroutine the number of times specified, and then print out statistics telling you how they compare.

For example, my solution to ICA5 contained three different ways of creating a two dimensional array. Which one of these ways is "best"? Let's have Benchmark tell us:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark 'cmpthese';

sub explicit {
    my @two_d = ([ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ]);
}

sub new_per_loop {
    my @two_d;
    for (0..4){
        my @inner = ('x') x 10;
        push @two_d, \@inner;
    }
}

sub anon_ref_per_loop {
    my @two_d;
    for (0..4){
        push @two_d, [ ('x') x 10 ];
    }
}

sub nested {
    my @two_d;
    for my $i (0..4){
        for my $j (0..9){
            $two_d[$i][$j] = 'x';
        }
    }
}
cmpthese (10_000, {
                 'Explicit'           => \&explicit,
                 'New Array Per Loop' => \&new_per_loop,
                 'Anon. Ref Per Loop' => \&anon_ref_per_loop,
                 'Nested Loops'       => \&nested,
             }
      );
The above code will print out the following statistics (numbers may be slightly off, of course):
Benchmark: timing 10000 iterations of Anon. Ref Per Loop, Explicit, Nested Loops, New Array Per Loop...
Anon. Ref Per Loop:  2 wallclock secs ( 1.53 usr +  0.00 sys =  1.53 CPU) @ 6535.95/s (n=10000)
Explicit:  1 wallclock secs ( 1.24 usr +  0.00 sys =  1.24 CPU) @ 8064.52/s (n=10000)
Nested Loops:  4 wallclock secs ( 4.01 usr +  0.00 sys =  4.01 CPU) @ 2493.77/s (n=10000)
New Array Per Loop:  2 wallclock secs ( 1.76 usr +  0.00 sys =  1.76 CPU) @ 5681.82/s (n=10000)
                     Rate Nested Loops New Array Per Loop Anon. Ref Per Loop Explicit
Nested Loops       2494/s           --               -56%               -62%     -69%
New Array Per Loop 5682/s         128%                 --               -13%     -30%
Anon. Ref Per Loop 6536/s         162%                15%                 --     -19%
Explicit           8065/s         223%                42%                23%       --

The benchmark first tells us how many iterations of which subroutines it's running. It then tells us how long each method took to run the given number of iterations. Finally, it prints out the statistics table, sorted from slowest to fastest. The Rate column tells us how many iterations each subroutine was able to perform per second. The remaining colums tells us how fast each method was in comparison to each of the other methods. (For example, 'Explicit' was 223% faster than 'Nested Loops', while 'New Array Per Loop' is 13% slower than 'Anon. Ref Per Loop'). From the above, we can see that 'Explicit' is by far the fastest of the four methods. It is, however, only 23% faster than 'Ref Per Loop', which requires far less typing and is much more easily maintainable (if your boss suddenly tells you he'd rather have the two-d array be 20x17, and each cell init'ed to 'X' rather than 'x', which of the two would you rather had been used?).

You can, of course, read more about this module, and see its other options, by reading: perldoc Benchmark

Command-line options
Perl provides several command-line options which make it possible to write very quick and very useful "one-liners". For more information on all the options available, refer to perldoc perlrun
-e
This option takes a string and evaluates the Perl code within. This is the primary means of executing a one-liner
perl -e'print qq{Hello World\n};'
(In windows, you may have to use double-quotes rather than single. Either way, it's probably better to use q// and qq// within your one liner, rather than remembering to escape the quotes).
-l
This option has two distinct effects that work in conjunction. First, it sets $\ (the output record terminator) to the current value of $/ (the input record separator). In effect, this means that every print statement will automatically have a newline appended. Secondly, it auto-chomps any input read via the <> operator, saving you the typing necessary to do it.
perl -le 'while (<>){ $_ .= q{testing};  print; }'
The above would automatically chomp $_, and then add the newline back on at the print statement, so that "testing" appears on the same line as the entered string.
-w
This is the standard way to enable warnings in your one liners. This saves you from having to type use warnings;
-M
This option auto-uses a given module.
perl -MData::Dumper -le'my @foo=(1..10); print Dumper(\@foo);'
-n
This disturbingly powerful option wraps your entire one-liner in a while (<>) { ... } loop. That is, your one-liner will be executed once for each line of each file specified on the command line, each time setting $_ to the current line and $. to current line number.
perl -ne 'print if /^\d/' foo.txt beta.txt
The above one-line of code would loop through foo.txt and beta.txt, printing out all the lines that start with a digit. ($_ is assigned via the implicit while (<>) loop, and both print and m// operate on $_ if an explict argument isn't given).
-p
This is essentially the same thing as -n, except that it places a continue { print; } block after the while (<>) { ... } loop in which your code is wrapped. This is useful for reading through a list of files, making some sort of modification, and printing the results.
perl -pe 's/Paul/John/' email.txt
Open the file email.txt, loop through each line, replacing any instance of "Paul" with "John", and print every line (modified or not) to STDOUT
-i
This one sometimes astounds people that such a thing is possible with so little typing. -i is used in conjunction with either -n or -p. It causes the files specified on the command line to be edited "in-place", meaning that while you're looping through the lines of the files, all print statements are directed back to the original files. (That goes for both explicit prints, as well as the print in the continue block added by -p.)
If you give -i a string, this string will be used to create a back-up copy of the original file. Like so:
perl -pi.bkp -e's/Paul/John/' email.txt msg.txt
The above opens email.txt, replaces each line's instance of "Paul" with "John", and prints the results back to email.txt. The original email.txt is saved as email.txt.bkp. The same is then done for msg.txt

Remember that any of the command-line options listed here can also be given at the end of the shebang in non-oneliners. (But please do not start using -w in your real programs - use warnings; is still preferred because of its lexical scope and configurability).

Data::Dumper
The standard Data::Dumper module is very useful for examining exactly what is contained in your data structure (be it hash, array, or object (when we come to them) ). When you use this module, it exports one function, named Dumper. This function takes a reference to a data structure and returns a nicely formatted description of what that structure contains.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my @foo = (5..10);
#add one element to the end of the array
#do you see the error?
$foo[@foo+1] = 'last';

print Dumper(\@foo);

When run, this program shows you exactly what is inside @foo:

$VAR1 = [
          5,
          6,
          7,
          8,
          9,
          10,
          undef,
          'last'
        ];

(I know we haven't covered references yet. For now, just accept my assertion that you create a reference by prepending the variable name with a backslash...)

__DATA__ & <DATA>
Perl uses the __DATA__ marker as a pseudo-datafile. You can use this marker to write quick tests which would involve finding a file name, opening that file, and reading from that file. If you just want to test a piece of code that requires a file to be read (but don't want to test the actual file opening and reading), place the data that would be in the input file under the __DATA__ marker. You can then read from this pseudo-file using <DATA>, without bothering to open an actual file:
#!/usr/bin/env perl
use strict;
use warnings;

while (my $line = <DATA>) {
  chomp $line;
  print "Size of line $.:  ", length $line, "\n";
}

__DATA__
hello world
42
abcde

The above program would print:

Size of line 1: 11
Size of line 2: 2
Size of line 3: 5
$.
The $. variable keeps track of the line numbers of the file currently being processed via a while (<$fh>) { ... } loop. More explicitly, it is the number of the last line read of the last file read.
__FILE__ & __LINE__
These are two special markers that return, respectively, the name of the file Perl is currently executing, and the Line number where it resides. These can be used in your own debugging statements, to remind yourself where your outputs were in the source code:
  print "On line " . __LINE__ . " of file " . __FILE__ . ", \$foo = $foo\n";
    

Note that neither of these markers are variables, so they cannot be interpolated in a double-quoted string

warn() & die()
These are the most basic of all debugging techniques. warn() takes a list of strings, and prints them to STDERR. If the last element of the list does not end in a newline, warn() will also print the current filename and line number on which the warning occurred. Execution then proceeds as normal.

die() is identical to warn(), with one major exception - the program exits after printing the list of strings.

All debugging statements should make use of either warn() or die() rather than print(). This will insure you see your debugging output even if STDOUT has been redirected, and will give you the helpful clues of exactly where in your code the warning occurred.

[Mar 11, 2007] Sys Admin v16, i03 The Replacements

OK, this is starting to look ugly. Like a regex match, we can pull that apart with a trailing x:
s/
  (
    ^        # either beginning of line
    |        # or
    (?<=,)   # a single comma to the left
  )
  .*?        # as few characters as possible
  (
    (?=,)    # a single comma to the right
    |        # or
    $        # end of string
  )
/XXX/gx;
That's much easier to read (relatively speaking).

Like a regular expression match, we can use an alternate delimiter for the left and right sides of the substitution:

$_ = "hello";
s%ell%ipp%; # $_ is now "hippo"
The rules are a bit complicated, but it works precisely the way Larry Wall wanted it to work. If the delimiter chosen is not one of the special characters that begins a pair, then we use the character twice more to both separate the pattern from the replacement and to terminate the replacement, as the example above showed.

However, if we use the beginning character of a paired character set (parentheses, curly braces, square brackets, or even less-than and greater-than), we close off the pattern with the corresponding closing character. Then, we get to pick another delimiter all over again, using the same rules. For example, these all do the same thing:

s/ell/ipp/;
s%ell%ipp%;
s;ell;ipp;; # don't do this!
s#ell#ipp#; # one of my favorites
s[ell]#ipp#; [] for pattern, # for replacement
s[ell][ipp]; [] for both pattern and replacement
s<ell><ipp>; <> for both pattern and replacement
s{ell}(ipp); {} for pattern, () for replacement
No matter what the closing delimiter might be for either the pattern or the replacement, we can include the character literally by preceding it with a backslash:
$_ = "hello";
s/ell/i\/n/; # $_ is now "hi/no";
s/\/no/res/; # $_ is now "hires";
To avoid backslashing, pick a distinct delimiter:
$_ = "hello";
s%ell%i/n%; # $_ is now "hi/no";
s%/no%res%; # $_ is now "hires";
Conveniently, if a paired character is used, the pairs may be nested without invoking any backslashes:
$_ = "aaa,bbb,ccc,ddd,eee,fff,ggg";
s((^|(?<=,)).*?((?=,)|$))(XXX)g; # replace all fields with XXX
Note that even though the pattern contains closing parentheses, they are all paired with opening parentheses, so the pattern ends at the right place.

The right side of the substitution operation is generally treated as if it were a double-quoted string: variable interpolation and backslash interpretation is performed directly:

$replacement = "ipp";
$_ = "hello";
s/ell/$replacement/; # $_ is now "hippo"
The left side of a substitution is also treated as if it were a double-quoted string (with a few exceptions), and this interpolation happens before the result is evaluated as a regular expression:
$pattern = "ell";
$replacement = "ipp";
$_ = "hello";
s/$pattern/$replacement/; # $_ is now "hippo"
Using this form of pattern, Perl is forced to compile the regular expression at runtime. If this happens in a loop, Perl may need to recompile the regular expression repeatedly, causing a slowdown. We can give Perl a hint that the pattern is really a regular expression by using a regular expression literal:
$pattern = qr/ell/;
$replacement = "ipp";
$_ = "hello";
s/$pattern/$replacement/; # $_ is now "hippo"
The qr operation creates a Regexp object, which interpolates into the pattern with minimal fuss and maximal speed.

[Feb 23, 2007]Submitted Tech Tip How to Send Email Without Using sendmail by Ross Moffatt

A useful topic, especially about attachment sending.
BigAdmin

If you need to send emails from a host but don't want to run sendmail, this tech tip explains how to use Perl to send emails. This procedure can be used on a host such as a Sun Fire V120 server running the Solaris 9 OS.

[Feb 20, 2007] Dakshina`s Blog Weblog

CGI/Perl script for uploading files

Here's a small perl script that I have used for uploading files to a webserver. The location can be changed .Rt now it saves the files to /tmp/upload1

#!/usr/bin/perl
use CGI ;
my $query = new CGI;
print $query->header ( );
# Expects the client to sends the name of the file to be uploaded in an input field "file"


my $filename=$query->param("file");
my $fpath1="/tmp/upload1/$filename";

open (UPLOADFILE,">$fpath1") || die "Cannot open file";

$filename =~ s/.*[\/\\](.*)/$1/;
my $upload_filehandle = $query->upload("file");

my $buf;
while (read($upload_filehandle,$buf,1024)) {
print UPLOADFILE $buf;
}

close UPLOADFILE;

#This has been tested on Solaris only

# Can be used to transfer binary files also

#For WINDOWS the BINMODE option may be needed

Manning Data Munging with Perl

The table of contents, two sample chapters, and the index from Data Munging with Perl are available in PDF format. You need Adobe's free Acrobat Reader software to view it. You may download Acrobat Reader here.

Download the Table of Contents

Download Chapter 2

Download Chapter 3

Download the Index

... ... ...

Source code from Data Munging with Perl is contained in either a single ZIP file, or a Unix gzipped and tarred file archive.

Free unzip programs for most platforms are available at Info-Zip.

Download the source code:

cross_src.zip (44 Kb)

or

cross_src.tar.gz (19 Kb)

How to write slow algorithms quickly in Perl (Playing Chomp)

Playing Chomp
by Gbor Szab

Abstract
Though some of us might think so, chomp is not only a Perl function. It is also the name of a NIM-like Combinatorial Game that was unsolved until recently. It has a solution and implementation in Maple and I am writing an implementation in Perl for educational and research purposes.

Introduction
When I went to high-school in the early 1980s in Budapest, Hungary, I used to play a game with a class mate that we called eating chocolate. We actually did not really play it as we knew that there was a winning strategy for the player that moved first but we tried to find a mathematical description for that winning strategy. For that I wrote several programs that would compute the winning positions but we did not have any results.

A few years later I bought a book called "Dienes Professzor Jtkai" [DIENES] in Hungarian translation but actually I have looked only at a couple of pages in the book until recently.

Then about a year ago I decided it is time to learn how to create and upload a module to CPAN and as the explanation regarding how to get accepted in PAUSE was rather discouraging I decided I try to play safe and start with a module that probably no one else wants to develop but which can be nice to have on CPAN: Games::NIM. I planned to develop the module to play the game and to calculate the winning positions for NIM and later to extend to Chocolate. To my surprise I got the access and uploaded version 0.01 in December 2001 and then it got stuck at that version.

Now when I thought about attending YAPC::Europe::2002 I decided to renew the work around Games::NIM and proposed a talk about complexity in algorithms in connection to that module and another module called Array::Unique.
When the proposal got accepted I suddenly discovered that I have not much to say about the subject and have to work really hard in order to give you something worthwhile. So I started to work on Games::NIM again and read the book of Dienes [DIENES] about games and another very useful one called "Mastering Algorithms with Perl" [ALGORITHMS]. I suddenly discovered that the game I knew as chocolate eating game is actually known as Chomp and it is still basically unsolved. It all sounded very encouraging.

[Apr 10, 2006] log4perl - log4j for Perl

sourceforge.net

Welcome to the log4perl project page. Log::Log4perl is a Perl port of the widely popular log4j logging package.

Logging beats a debugger if you want to know what's going on in your code during runtime. However, traditional logging packages are too static and generate a flood of log messages in your log files that won't help you.

Log::Log4perl is different. It allows you to control the amount of logging messages generated very effectively. You can bump up the logging level of certain components in your software, using powerful inheritance techniques. You can redirect the additional logging messages to an entirely different output (append to a file, send by email etc.) -- and everything without modifying a single line of source code.

Further reading

[Mar 25, 2006] Beginning Perl now available in eBook from Perl.com. this is a very good intro book!.

[Mar 24, 2006] Project details for Perl-Linux

This is a great idea that might change the way UNIX is perceived (C-written somewhat archaic system with non-uniform set of obscure command line utilities) and used.
freshmeat.net

Perl/Linux is a Linux distribution where all programs are written in Perl. The only compiled code in this Perl/Linux system is the Linux Kernel (not currently built with this project), Perl, and uClibc.

[Mar 24, 2006] Project details for Ryan's In-Out Board

freshmeat.net

About: Ryan's In/Out Board (formerly known as Whosin) is a simple and quick Perl-driven Web-based in/out board for use on intranets and extranets. Users can change their status by clicking their name or calling the script with a name parameter, allowing for desktop shortcuts which give single click "check-in/out" links. Custom and/or default comments can be added to their status. No database system is required, you just need a Web server and Perl. A script to check all staff out is also provided, which is handy if called as an overnight cron job. It uses the Date::EzDate Perl module.

Changes: A few people were having problems with data files not being written to. This version will print read/write errors to the browser if it encounters them. It does not fix any read/write issues similar to the ones people were experiencing, because there's nothing to fix as such. Those errors were related to filesystem permissions and thus beyond the realm of the script.

[Mar 24, 2006] Project details for otl

freshmeat.net

About: otl is intended to convert a text file to a HTML or XHTML file. It is different than many other text-to-HTML programs in that the input format (by default a simple highly readable plain text format) can be customized by the user, and the output format (by default XHTML) can be user-defined. It can process complex structures such as ordered and unordered lists (nested or not), and add custom "headers" and "footers" to documents. The conversion utilizes Perl regex, adding quite a bit of flexibility and power to the conversion process. Since both the syntax of the source file and of the output can be readily customized, otl in theory can be used for many types of conversions. The package also includes tag-remove, a script for stripping HTML/XHTML-ish tags from documents.

Changes: The "chempretty" script has been removed and replaced with a more general script, "otlsub". With otlsub, you can perform a set of search/replace operations on a set of files using a Perl regex for matching. otlsub supports recursion, allowing you to descend through a directory tree and process all files matching a filename pattern. otlsub automatically adjusts references to local files in hyperlinks depending on directory depth. New otl features include a --descend option (recursive descent through all subdirectories) and various other minor modifications.

[Feb 28, 2006] Visual Python (Python), and Visual Perl (Perl) integrate with Visual Studio 2005

[Feb 14, 2006] Logic Programming with Perl and Prolog

Perl isn't the last, best programming language you'll ever use for every task. (Perl itself is a C program, you know.) Sometimes other languages do things better. Take logic programming--Prolog handles relationships and rules amazingly well, if you take the time to learn it. Robert Pratte shows how to take advantage of this from Perl. [Perl.com]

[Feb 14, 2006] Analyzing HTML with Perl

Kendrew Lau taught HTML development to business students. Grading web pages by hand was tedious--but Perl came to the rescue. Here's how Perl and HTML parsing modules helped make teaching fun again. [Perl.com]

Acky.net Tutorials Perl

Section 2 - Flat Files:

Programmers often use flat files when storing small amounts of data. Take for example storing something such as small caching information. For example for one project I was working on, I needed to store IP numbers, the unique IP address of the visitor, and the time the entry occurred. I used flat files for this task because it was not very data intensive, and the information was cleared every 15 minutes.

When doing something like this, you can take 2 different approaches. You can create a file for each visitor (what I had done, as I needed to store extra information), something that I like to call flat-files, or you can have the same file for all entries.

When creating many different files you will need to be able to ensure that you can have a unique filename for each file, otherwise files will start to overlap after some time. You can use the Digest::SHA1 modules to generate a 160 bit signature from random data (only in incredibly rare cases will the signature to be the same), however there are number of different ways to do this. Once you generate the unique name you can start to create the flat file.

# Open file for write only or die.
open(FH, "> $unique_filename") or die("Error: $!");
# Lock the file.
flock(FH, 2);
# Save the remote ip address, a null, and then the time.
print FH $ENV{REMOTE_ADDR}, "\0", time;
# Close the file and release lock or die.
close(FH) or die("Error: $!");

Now this takes care of saving the data in flat-files. Retrieving data from a simple structure like this is very simple.
# We open the file for reading only or die.
open(FH, "$unique_filename") or die("Error: $!");
# Read the first line from open file.
$line = <FH>;
# Close the file or die.
close(FH) or die("Error: $!");
# Separate the data using split.
($remote_addr, $create_time) = split(/\0/, $line);

In this example, the $ENV{REMOTE_ADDR} and the time since epoch is saved in the $unique_filename file. Be careful to watch for security risks when using a variable in an open (for more information read perlsec man page or view it online at http://www.perl.com/pub/doc/manual/html/pod/perlsec.html). Using the same fundamental ideas you can create much more complex data structures within flat-files.

As I mentioned earlier, the other way of using flat files is to create one larger file for all entries. Retrieving data from this kind of flat file database can be slower as data increases, so only use this if it presents something beneficial to your programs. You've been warned! The basic ideas for using this type of flat file database is virtually the same as for flat-files.

Rather than opening the file for writing as we did in the flat-files example, we have to open the file for appending, because overwriting data will not help us in this example. We must also separate each entry by a delimiter, I will use the newline character, and we no longer need to use $unique_filename in open because the filename will be static.
# Open file for append or die.
open(FH, ">> ./cache.db") or die("Error: $!");
# Lock the file.
flock(FH, 2);
# Save the unique id, a null, remote ip address, a null, and then the time since epoch.
print FH $unique_id, "\0", $ENV{REMOTE_ADDR}, "\0", time, "\n";
# Close the file and release lock or die.
close(FH) or die("Error: $!");


For retrieving data from the file we still needed the $unique_filename because in order for the program to be able to pick out a certain entry it needs something to search for, you could use the remote ip address, or the time, but I personally prefer a unique id for each visitor (that I save as a cookie, and retrieve anytime a script is run by the user).

Once you know what the unique id is that you want to retrieve from the flat file database, you can do the following.

# Open the file for read only.
open(FH, " ./cache.db") or die("Error: $!");
# Loop through each entry in the flat file and look for the one we need.
while ($line = <FH>) {
# Remove the newline character at the end of the line
chomp($line);
# Separate the data on line using split.
($unique_id, $remote_addr, $create_time) = split(/\0/, $line);
# Check if the unique id that we saved earlier matches the one
# that we are looking for this time, where $our_id is the id that
# we are looking for. If the two ids match, we break out of the loop.
if ($unique_id eq $our_id) {
$found = 1;
last;
}
}
# Close the file or die.
close(FH) or die("Error: $!");
unless ($found) {
die("Error: Could not find entry $our_id in the flat file database.");
}
In this example the $unique_id, $remote_addr, and $create_time will be retrieved from the cache.db file if they match the $our_id variable, otherwise it will die. You can adapt this for your own programs with minimal effort. Let me be mention this again, this can be very inefficient when dealing with large amounts of data, as the program must loop through every line until the entry is found. Another deficiency in this small example is the program will only retrieve the first entry in the cache.db file and exit, this is what most people would want, but if you want to retrieve all entries, or the most recent one, a little more work will be required. (There are different ways of sorting, and matching data which can speed this process up significantly.)

I will mention some other ways of storing data in flat files as well as other storing data methods, in the following pages.

Interview with Tim Maher of Consultix

TeachMePerl.Com

I'll be happy to tell you, but first let me put a few things in historical perspective.

Way back in 1976, as a graduate student at the University of Toronto, I was using C, grep, sed, expr (yuck!) and the Mashey shell (the Bourne shell's predecessor) on UNIX to simulate neurophysiological experiments on a virtual cat (in Prof. Ron Baecker's Interactive Computer Graphics class).

I became pretty adept with all these tools, but I had some reservations about UNIX's ''tinkertoy'' approach to utility programs, which struck me as an example of a fundamentally good idea taken to an undesirable extreme.

As a case in point, in the Bourne shell you have to use the external expr command to do simple arithmetic. The variable-incrementing idiom was (and still is):

value=`expr $value + $inc_val`

Just imagine how efficient that approach is, at the cost of one extra (synchronous) process per calculation, when you have to total a series of numbers. It's pathetic!

So when AWK came out in 1977, I was intrigued by its potential for improving the state of UNIX programming, with features such as:

  1. Program simplification through an implicit input-reading loop,
  2. Automatic parsing of input into fields (forever ending sed's monopoly on a manual approach, based on cumbersome "\(.*\)" -based techniques),
  3. The Pattern/Action model of programming, that links pattern-matches to code blocks, and
  4. Built-in support for basic mathematical functions, including floating point calculations.

I rapidly became a dedicated AWKaholic, promoting its use wherever I went. And if there had been a Nobel Prize for Artificial Languages, I would have nominated Aho, Weinberger, and Kernighan for it!

The AWK approach is just so good that I'm convinced modern programmers would currently be using languages with names like Turbo-AWK, AWK++, Visual AWK, Objective AWK, and perhaps even JAWKA, PythAWK, and AWK#, if not for an egregious travesty of high-tech justice.

Which is simply that this ingenious 1977 language was not properly documented until 1988, when Prentice-Hall's AWK Programming Language book came out. What a tragedy! But on the other hand, perhaps Larry Wall would have missed his chance with Perl if things had been otherwise. I guess that's the silver lining.

But getting back to my story, I wasn't really affected by the AWK documentational snafu. That's because I got the chance to make a career change from a university ''CS Professorship'' to a ''UNIX Course Developer and Instructor'' position with Western Electric (the branch of the Bell System that owned UNIX). They hired me in 1982 to develop and teach classes on UNIX topics, providing me access to internal documentation and bona fide UNIX ''Subject Matter Experts''. So I rapidly became an accomplished AWK programmer, and developed lots of nifty examples of its use for the training materials I created.

One especially useful program I wrote was a shell syntax checker and beautifier. I wrote this out of necessity, after a huge shell script stopped working due to a misplaced single quote that I just couldn't find. It saved many programming projects for me over the years, and then sadly, it was lost forever in a disk crash.

You have a very interesting background, Tim. But where does Perl fit into all of this?

Believe it or not, I was getting to that. I began dabbling with Perl in the early 1990s, but frankly had a hard time feeling comfortable with some of its more unconventional features.

I objected to what I saw as superfluous deviations from UNIX standards (like tagging all scalars with $), an overabundance of syntactically equivalent ways of writing the same thing (e.g., forwards vs. backwards loops and conditionals), and the unnecessary inclusion of radical new concepts (esp. LIST vs. SCALAR contexts).

For me, learning Perl was like watching a movie where I found the initial developments sufficiently disjointed and deranged that I had serious doubts that the writer would ever be able to make sense of it all for me, and ultimately reward me for my attention.

The bottom line is I just wasn't confident that Larry's programming mentality was compatible with mine, and without that faith, I wasn't willing to make the considerable effort to learn a new, and rather peculiar, programming language.

Moreover, as a C, Shell, and AWK guy since the mid-70s, I figured I could do everything I needed with those tools already -- given a sufficient number of User Processes and Development Time! So I didn't really feel the need for a One Language Does Everything solution.

But by 1997 Perl usage was growing by leaps and bounds, and many were waxing poetic about what a joy it was to write in a language that freed them from the micro-management of minutiae and just ''did the right thing'' most of the time.

And, on top of that, Perl offered the capability of doing UNIX-style network programming, which was rapidly escalating in importance, without resorting to the travails of C.

So suddenly, I came to see Perl as my dream language. It was like AWK with sockets! What more could one ask for?

You received a White Camel for developing and starting SPUG, the Seattle Perl Users Group. What were your reasons for creating this users group?

When I finally decided to get serious about learning Perl, I realized that what I needed most was to improve my capacity for PerlThink. (That's Larry's term for using Perl's features judiciously, and then getting out of the way so it can do its magic.)

I figured the best way to achieve this goal was to hang out with people who were already PerlThinking, so in late 1997 I started looking for a Perl SIG in Seattle. But I quickly learned there wasn't a group, just a web page dedicated to the proposition that there should be a group, and it had been sitting there for a long time, collecting comments from would-be members!

Many months later, while cooking breakfast in an escaping steam vent atop a smoke-spewing volcano in Indonesia (no kidding!), I gave this situation some more thought, and decided that, if necessary, I'd step forward to start the group myself.

Hmm ... how can I convey to you just how excited I was about taking on this role? I'm reminded of a play by Woody Allen in which a distraught woman makes a moving soliloquy about her desperate need for intimate contact with a Man. Just when she's on the verge of descending into a deep depression, an actor planted in the audience shouts out:

I'll sleep with that girl, if nobody else will!
That's exactly how excited I was about starting SPUG!

I had never created an organization before, so I found that proposition itself rather daunting. And on top of that I was concerned that such unpleasant activities as begging, pleading, imploring, beseeching, and ultimately arm-twisting would be required of me to sign up prospective speakers -- and, unfortunately, I was right!

(I later learned they'd invariably thank me afterwards for pressuring them into giving talks, once they realized how much the exercise helped solidify their knowledge, and how much fun they had sharing it.)

[July 1, 2005] O'Reilly Is Perl Still Relevant

Subject: Is Perl relevant any longer?

With the emergence of .NET, J2EE, Python, PHP, et. al, has Perl lost its niche as a scripting glue language? The buzz is all around PHP these days and also around Python. The complaints about Perl 6's complexity are only getting louder. Besides, Perl does not occupy the central position in O'Reilly's offerings that it once did.

Is Perl on its way out?

Jag


Hi Jag,

While I agree that the long wait for Perl 6 has harmed Perl, and many Perl programmers do in fact find what they've seen to be unnecessarily complex (one well-known Perl programmer of my acquaintance referred to it as "performance art"), I've learned never to count Perl out. There was a similar slowdown in Perl in the mid-90s, and it saw a huge resurgence as "the duct tape of the internet." Perl is so useful that there may yet come another new market for which it is uniquely suited. It's a powerful, adaptable language, and the folks creating Perl 6 have a history of "seeing around corners" and developing features that turn out to be just right for some emerging market. So when Perl 6 comes out, we certainly won't be on the publishing sidelines. We'd love to be in the position to do some substantial updates to our bestselling Perl books!

That being said, there has always been an element of snobbery in the Perl market--I remember trying to persuade the authors of the second edition of Programming Perl, back in 1996, to pay more attention to the web. I was told that web programming was "trivial" and didn't require any special treatment. Of course, languages like PHP, which considered the web to be central, eventually came to occupy that niche. If book sales are any indicator, PHP is twice as popular as Perl.

I've always believed that one of the most important things about scripting languages is that they (potentially) make a new class of applications more accessible to people who didn't previously think of themselves as programmers. Languages then grow up, get computer-science envy, and forget their working-class roots.

In terms of the competitive landscape among programming languages, in addition to PHP, Python has long been gaining on Perl. From about 1/6 the size of the Perl market when I first began tracking it, it's now about 2/3 the size of the Perl book market. The other scripting language (in addition to Perl, Python, and PHP) that we're paying a lot more attention to these days is Ruby. The Ruby On Rails framework is taking the world by storm, and has gone one up on PHP in terms of making database backed application programming a piece of cake.

And while JavaScript is not generally thought of as an alternative to these fuller-featured languages, the conjunction of JavaScript and XML that has so meme-felicitously been named AJAX is driving a new surge of interest. The JavaScript book market is now slightly larger than the Perl book market--quite a bit larger if you consider JavaScript variants such as Macromedia's ActionScript.

I recently wrote about the relative market share of programming languages in my O'Reilly Radar blog. The posting focuses on the rise of open source Java books, but includes a graph showing the relative share of all programming language books, in terms of sell-through data from Neilsen BookScan. (See also this blog entry for a description of BookScan and our technology trend tracking tools.)

Tim O'Reilly

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended

Internal Links

Search engines:

Download

Top sites:

Reference: see below

See also

Top community sites:

Products, projects and resources:

People:

Press and Perl related sites


Reference

Reference cards

Annotated man pages

Perl Man Pages on the WEB

Etc


Random Findings

perl.com Critique of the Perl 6 RFC Process [Oct. 31, 2000]

Conferences

Second Perl conference [added November 4, 1998]



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haters Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last updated: June 28, 2017