Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Perl for Unix System Administrators

News

Perl Certification

ebook: Perl for system admins

Recommended Books Recommended Links Recommended Papers Perl Language Reference
FAQs Perl Debugging Perl applications Perl programming environment Perl Style Perl as a command line utility tool Perl for Win32 Perl on SFU
Perl Warts Perl Critique Larry Wall Quotes History and philosophy Tips Humor Etc

This is a very limited effort to help Unix sysadmins to learn of Perl. It is based on my FDU lectures to CS students. We discuss mainly "simple Perl" and the site tries to counter "excessive complexity" drive that is dominant in many Perl-related sites and publications. It can be also used for preparing to the  Certified Internet Web Professional Exam.  See also Introduction to Perl for Unix system administrators.

Systems administrators need to deal with many repetitive task in a complex, changing environment often includes several different flavors of Unix. Perl is the only scripting language which now are included in all major Unix flavors.  That means that it provides the simplest way to automate recurring tasks on multiple platforms.  Among typical tasks which sysadmin need to deal with:

IMHO the main advantage of using powerful complex language like Perl is the ability to write simple programs. Perhaps the world has gone overboard on this object-oriented thing. You do not need many tricks used in lower level languages as Perl itself provides you high level primitives for the task. This page is linked to several sub-pages. The most important among them are:

All language have quirks, and all inflict a lot of pain before one can adapt to them. Once learned the quirks become incorporated into your understanding of the language. But there is no royal way to mastering the language. The more different is one's background is,  more one needs to suffer. Generally any user of a new programming language needs to suffer a lot ;-)

When mastering a new language first you face a level of "cognitive overload" until the quirks of the language become easily handled by your unconscious mind. At that point, all of the sudden the quirky interaction becomes a "standard" way of performing the task. For example regular expression  syntax seems to be a weird base for serious programs, fraught with pitfalls, a big semantic mess as a result of outgrowing its primary purpose. On the other hand, in skilled hands its a very powerful tool.

One early sign of adaptation to Perl idiosyncrasies is when you start to put $ on all scalar variables automatically. The next step is to overcome notational difficulties of using different  two operations ("==" and eq) for comparison -- the source of many subtle errors for novices much like accidental use of assignment statement instead of comparison in if statement C (like in -- if (a=1)... ). Before than happens please be vary of using complex constructs -- diagnostic in Perl is really bad.

I would call them complexity junkies. A classic example of this "killing readers with obscurity" approach is  an article Understanding and Using Iterators. Actually Perl has weak support of iterators as it lacks co-routine support. But you can never learn that from the paper where trivial example were presented using obscure overcomplicated code that reminds me The International Obfuscated C Code Contest.  Please don't follow them. Try to write simple transparent code.  

There is also OO-enthusiasts flavor of complexity junkies with Damian Conway as the most prominent representative. Their advice should not be taken at face value. Please remember about KISS principle and try to write simple Perl scripts without overly complex regular expressions or fancy idioms.

Some Perl gurus pathological preoccupation with idioms is not healthy. Although definitely gifted authors Randal L. Schwartz and Tom Christiansen are a little bit too much preoccupied with this fancy art. Fancy idioms are bad for novices and can contain subtle limitations or side effects that can byte even seasoned Perl programmers.

Generally the problems mentioned above are more fundamental than the trivial "abstraction is the enemy of convenience".  It is more like that badly chosen notational abstraction at one level can lead to an inhibition of innovative notational abstraction on others.

All-in-all Perl is a great language. But even sun has dark spots...

Dr. Nikolai Bezroukov


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index


Old News ;-)

2007 marked Perl's 20th year anniversary

Happy Birthday, Perl!

2007 2006 2005 2004 2003 2002 2001 2000 1999

[Apr 21, 2009] Why you should upgrade to Perl 5.10

External links

Articles

Perl Tips on Perl 5.10

First Look Perl 5.10 is a Pearl Compiler from Wired.com

By Scott Gilbertson January 02, 2008 | 10:56:58 AM

As most Perl fans are no doubt aware, the Perl Foundation released version 5.10 last month and introduced a number of significant upgrades for the popular programming language. Perl 5.10 is the first significant feature upgrade since the 5.8 release back in 2002.

First the good news, AKA why you should go ahead and upgrade: the major new language features are turned off by default which means you can upgrade without breaking existing scripts, and take advantage of the new features for new scripts. Even cooler is ability to progressively upgrade scripts using the “use” syntax.

For instance, add the line use feature 'switch'; prior to a block of code where you’d like to take advantage of the new switch statement in Perl 5.10 and then turn it off after upgrading that block of code using the statement no feature 'switch';. New features can be enabled by name or as a collective group using the statement use feature ':5.10';.

In addition to the switch statement, there’s a new say statement which acts like print() but adds a newline character and a state feature, which enables a new class of variables with very explicit scope control.

But perhaps the most interesting of 5.10’s new features is the new ‘or’ operator, //, which is a “defined or” construct. For instance the following statements are syntactically equivalent:

$foo // $bar

defined $foo ? $foo : $bar

Obviously the first line is much more compact and (I would argue) readable — i.e. is $foo defined? If not, give it the value $bar.” You can also add an equal sign like so:

$bar //= $foo;

Which is the same as writing:

$bar = $foo unless defined $bar;

Another noteworthy new feature is the smart match operator, which the Perl Foundation explains as “a new kind of comparison, the specifics of which are contextual based on the inputs to the operator.” For example, to find if scalar $needle is in array @haystack, simply use the new ~~ operator

if ( $needle ~~ @haystack ) ...

Perl 5.10 also finally gains support for named regex statements, which means you can avoid the dreaded lines of $1 $2 etc, which often make Perl regex hard to decipher. Finally I might be able to understand what’s going on in complex regex scripts like Markdown.

Other improvements include a faster interpreter with a smaller memory footprint, better error messages and more. For full details on the new release check out the notes.

I’ll confess I abandoned Perl for Python some time ago, but after playing with 5.10 I may have to rethink that decision, Perl 5.10’s new features are definitely worth the upgrade and a must have for anyone who uses Perl on a daily basis.

 

A Beginner's Introduction to Perl 5.10, part three - O'Reilly News By chromatic

The first two articles in this series (A Beginner's Introduction to Perl 5.10 and A Beginner's Introduction to Files and Strings in Perl 5.10) covered flow control, math and string operations, and files. (A Beginner's Introduction to Perl Web Programming demonstrates how to write secure web programs.) This is a Beginner's Introduction to Perl 5.10, part three
June 26, 2008

Simple matching

The simplest regular expressions are matching expressions. They perform tests using keywords like if, while and unless. If you want to be really clever, you can use them with and and or. A matching regexp will return a true value if whatever you try to match occurs inside a string. To match a regular expression against a string, use the special =~ operator:

use 5.010;

my $user_location = "I see thirteen black cats under a ladder.";
say "Eek, bad luck!" if $user_location =~ /thirteen/;

Notice the syntax of a regular expression: a string within a pair of slashes. The code $user_location =~ /thirteen/ asks whether the literal string thirteen occurs anywhere inside $user_location. If it does, then the test evaluates true; otherwise, it evaluates false.

Metacharacters

A metacharacter is a character or sequence of characters that has special meaning. You may remember metacharacters in the context of double-quoted strings, where the sequence \n means the newline character, not a backslash and the character n, and where \t means the tab character.

Regular expressions have a rich vocabulary of metacharacters that let you ask interesting questions such as, "Does this expression occur at the end of a string?" or "Does this string contain a series of numbers?"

The two simplest metacharacters are ^ and $. These indicate "beginning of string" and "end of string," respectively. For example, the regexp /^Bob/ will match "Bob was here," "Bob", and "Bobby." It won't match "It's Bob and David," because Bob doesn't occur at the beginning of the string. The $ character, on the other hand, matches at the end of a string. The regexp /David$/ will match "Bob and David," but not "David and Bob." Here's a simple routine that will take lines from a file and only print URLs that seem to indicate HTML files:

for my $line (<$urllist>) {
    # "If the line starts with http: and ends with html...."
    print $line if $line =~ /^http:/ and $line =~ /html$/;
}

Another useful set of metacharacters is called wildcards. If you've ever used a Unix shell or the Windows DOS prompt, you're familiar with wildcards characters such * and ?. For example, when you type ls a*.txt, you see all filenames that begin with the letter a and end with .txt. Perl is a bit more complex, but works on the same general principle.

In Perl, the generic wildcard character is .. A period inside a regular expression will match any character, except a newline. For example, the regexp /a.b/ will match anything that contains a, another character that's not a newline, followed by b -- "aab," "a3b," "a b," and so forth.

To match a literal metacharacter, escape it with a backslash. The regex /Mr./ matches anything that contains "Mr" followed by another character. If you only want to match a string that actually contains "Mr.," use /Mr\./.

On its own, the . metacharacter isn't very useful, which is why Perl provides three wildcard quantifiers: +, ? and *. Each quantifier means something different.

The + quantifier is the easiest to understand: It means to match the immediately preceding character or metacharacter one or more times. The regular expression /ab+c/ will match "abc," "abbc," "abbbc", and so on.

The * quantifier matches the immediately preceding character or metacharacter zero or more times. This is different from the + quantifier! /ab*c/ will match "abc," "abbc," and so on, just like /ab+c/ did, but it'll also match "ac," because there are zero occurences of b in that string.

Finally, the ? quantifier will match the preceding character zero or one times. The regex /ab?c/ will match "ac" (zero occurences of b) and "abc" (one occurence of b). It won't match "abbc," "abbbc", and so on.

The URL-matching code can be more concise with these metacharacters. This'll make it more concise. Instead of using two separate regular expressions (/^http:/ and /html$/), combine them into one regular expression: /^http:.+html$/. To understand what this does, read from left to right: This regex will match any string that starts with "http:" followed by one or more occurences of any character, and ends with "html". Now the routine is:

for my $line (<$urllist>) {
    print $line if $line =~ /^http:.+html$/;
}

Remember the /^something$/ construction -- it's very useful!

Character classes

The special metacharacter, ., matches any character except a newline. It's common to want to match only specific types of characters. Perl provides several metacharacters for this. \d matches a single digit, \w will match any single "word" character (a letter, digit or underscore), and \s matches a whitespace character (space and tab, as well as the \n and \r characters).

These metacharacters work like any other character: You can match against them, or you can use quantifiers like + and *. The regex /^\s+/ will match any string that begins with whitespace, and /\w+/ will match a string that contains at least one word. (Though remember that Perl's definition of "word" characters includes digits and the underscore, so whether you think _ or 25 are words, Perl does!)

One good use for \d is testing strings to see whether they contain numbers. For example, you might need to verify that a string contains an American-style phone number, which has the form 555-1212. You could use code like this:

use 5.010;

say "Not a phone number!" unless $phone =~ /\d\d\d-\d\d\d\d/;

All those \d metacharacters make the regex hard to read. Fortunately, Perl can do better. Use numbers inside curly braces to indicate a quantity you want to match:

use 5.010;

say "Not a phone number!" unless $phone =~ /\d{3}-\d{4}/;

The string \d{3} means to match exactly three numbers, and \d{4} matches exactly four digits. To use a range of numbers, you can separate them with a comma; leaving out the second number makes the range open-ended. \d{2,5} will match two to five digits, and \w{3,} will match a word that's at least three characters long.

You can also invert the \d, \s and \w metacharacters to refer to anything but that type of character. \D matches nondigits; \W matches any character that isn't a letter, digit, or underscore; and \S matches anything that isn't whitespace.

If these metacharacters won't do what you want, you can define your own. You define a character class by enclosing a list of the allowable characters in square brackets. For example, a class containing only the lowercase vowels is [aeiou]. /b[aeiou]g/ will match any string that contains "bag," "beg," "big," "bog", or "bug". Use dashes to indicate a range of characters, like [a-f]. (If Perl didn't give us the \d metacharacter, we could do the same thing with [0-9].) You can combine character classes with quantifiers:

use 5.010;
 say "This string contains at least two vowels in a row."
    if $string =~ /[aeiou]{2}/;

You can also invert character classes by beginning them with the ^ character. An inverted character class will match anything you don't list. [^aeiou] matches every character except the lowercase vowels. (Yes, ^ can also mean "beginning of string," so be careful.)

Flags

By default, regular expression matches are case-sensitive (that is, /bob/ doesn't match "Bob"). You can place flags after a regexp to modify their behaviour. The most commonly used flag is i, which makes a match case-insensitive:

use 5.010;

my $greet = "Hey everybody, it's Bob and David!";
    say "Hi, Bob!" if $greet =~ /bob/i;

Subexpressions

You might want to check for more than one thing at a time. For example, you're writing a "mood meter" that you use to scan outgoing e-mail for potentially damaging phrases. Use the pipe character | to separate different things you are looking for:

use 5.010;

# In reality, @email_lines would come from your email text,
# but here we'll just provide some convenient filler.
my @email_lines = ("Dear idiot:",
                   "I hate you, you twit.  You're a dope.",
                   "I bet you mistreat your llama.",
                   "Signed, Doug");

for my $check_line (@email_lines) {
   if ($check_line =~ /idiot|dope|twit|llama/) {
       say "Be careful!  This line might contain something offensive:\n$check_line";
   }

The matching expression /idiot|dope|twit|llama/ will be true if "idiot," "dope," "twit" or "llama" show up anywhere in the string.

One of the more interesting things you can do with regular expressions is subexpression matching, or grouping. A subexpression is another, smaller regex buried inside your larger regexp within matching parentheses. The string that caused the subexpression to match will be stored in the special variable $1. This can make your mood meter more explicit about the problems with your e-mail:

for my $check_line (@email_lines) {
   if ($check_line =~ /(idiot|dope|twit|llama)/) {
       say "Be careful!  This line contains the offensive word '$1':\n$check_line";
   }

Of course, you can put matching expressions in your subexpression. Your mood watch program can be extended to prevent you from sending e-mail that contains more than three exclamation points in a row. The special {3,} quantifier will make sure to get all the exclamation points.

for my $check_line (@email_lines) {
    if ($check_line =~ /(!{3,})/) {
        say "Using punctuation like '$1' is the sign of a sick mind:\n$check_line";
    }
}

If your regex contains more than one subexpression, the results will be stored in variables named $1, $2, $3 and so on. Here's some code that will change names in "lastname, firstname" format back to normal:

my $name = 'Wall, Larry';
$name =~ /(\w+), (\w+)/;
# $1 contains last name, $2 contains first name

$name = "$2 $1";
# $name now contains "Larry Wall"

You can even nest subexpressions inside one another -- they're ordered as they open, from left to right. Here's an example of how to retrieve the full time, hours, minutes and seconds separately from a string that contains a timestamp in hh:mm:ss format. (Notice the use of the {1,2} quantifier to match a timestamp like "9:30:50".)

my $string = "The time is 12:25:30 and I'm hungry.";
if ($string =~ /((\d{1,2}):(\d{2}):(\d{2}))/) {
    my @time = ($1, $2, $3, $4);
}

Here's a hint that you might find useful: You can assign to a list of scalar values whenever you're assigning from a list. If you prefer to have readable variable names instead of an array, try using this line instead:

my ($time, $hours, $minutes, $seconds) = ($1, $2, $3, $4);

Assigning to a list of variables when you're using subexpressions happens often enough that Perl gives you a handy shortcut. In list context, a successful regular expression match returns its captured variables in the order in which they appear within the regexp:

my ($time, $hours, $minutes, $seconds) = $string =~ /((\d{1,2}):(\d{2}):(\d{2}))/;

Counting parentheses to see where one group begins and another group ends is troublesome though. Perl 5.10 added a new feature, lovingly borrowed from other languages, where you can give names to capture groups and access the captured values through the special hash %+. This is most obvious by example:

my $name = 'Wall, Larry';
$name =~ /(?<last>\w+), (?<first>\w+)/;
# %+ contains all named captures

$name = "$+{last} $+{first}";
# $name now contains "Larry Wall"

There's a common mistake related to captures, namely assuming that $1 and %+ et al will hold meaningful values if the match failed:

my $name = "Damian Conway";
# no comma, so the match will fail!
$name =~ /(?<last>\w+), (?<first>\w+)/;

# and there's nothing in the capture buffers
$name = "$+{last} $+{first}";

# $name now contains a blank space

Always check the success or failure of your regular expression when working with captures!

my $name = "Damian Conway";
$name = "$+{last} $+{first}" if $name =~ /(?<last>\w+), (?<first>\w+)/;

Watch out!

Regular expressions have two othertraps that generate bugs in your Perl programs: They always start at the beginning of the string, and quantifiers always match as much of the string as possible.

Here's some simple code for counting all the numbers in a string and showing them to the user. It uses while to loop over the string, matching over and over until it has counted all the numbers.

use 5.010;
my $number       = "Look, 200 5-sided, 4-colored pentagon maps.";
my $number_count = 0;

while ($number =~ /(\d+)/) {
    say "I found the number $1.\n";
    $number_count++;
}

say "There are $number_count numbers here.\n";

This code is actually so simple it doesn't work! When you run it, Perl will print I found the number 200 over and over again. Perl always begins matching at the beginning of the string, so it will always find the 200, and never get to the following numbers.

You can avoid this by using the g flag with your regex. This flag will tell Perl to remember where it was in the string when it returns to it (due to a while loop). When you insert the g flag, the code becomes:

use 5.010;
my $number       = "Look, 200 5-sided, 4-colored pentagon maps.";
my $number_count = 0;

while ($number =~ /(\d+)/g) {
    say "I found the number $1.\n";
    $number_count++;
}

say "There are $number_count numbers here.\n";

Now you get the expected results:

I found the number 200.
I found the number 5.
I found the number 4.
There are 3 numbers here.

The second trap is that a quantifier will always match as many characters as it can. Look at this example code, but don't run it yet:

use 5.010;
my $book_pref = "The cat in the hat is where it's at.\n";
say $+{match} if $book_pref =~ /(?<match>cat.*at)/;

Take a guess: What's in $+{match} right now? Now run the code. Does this seem counterintuitive?

The matching expression cat.*at is greedy. It contains cat in the hat is where it's at because that's the longest string that matches. Remember, read left to right: "cat," followed by any number of characters, followed by "at." If you want to match the string cat in the hat, you have to rewrite your regexp so it isn't as greedy. There are two ways to do this:

Search and replace

Regular expressions can do something else for you: replacing.

If you've ever used a text editor or word processor, you've probably used its search-and-replace function. Perl's regexp facilities include something similar, the s/// operator: s/regex/replacement string/. If the string you're testing matches regex, then whatever matched is replaced with the contents of replacement string. For instance, this code will change a cat into a dog:

use 5.010;
my $pet = "I love my cat.";
$pet =~ s/cat/dog/;
say $pet;

You can also use subexpressions in your matching expression, and use the variables $1, $2 and so on, that they create. The replacement string will substitute these, or any other variables, as if it were a double-quoted string. Remember the code for changing Wall, Larry into Larry Wall? It makes a fine single s/// statement!

my $name = 'Wall, Larry';
$name =~ s/(\w+), (\w+)/$2 $1/;
# "Larry Wall"

You don't have to worry about using captures if the match fails; the substitution won't take place. Of course, named captures work equally well:

my $name = 'Wall, Larry';
$name =~ s/(?<last>\w+), (?<first>\w+)/$+{first} $+{last}/;
# "Larry Wall"

s/// can take flags, just like matching expressions. The two most important flags are g (global) and i (case-insensitive). Normally, a substitution will only happen once, but specifying the g flag will make it happen as long as the regex matches the string. Try this code with and without the g flag:

use 5.010;

my $pet = "I love my cat Sylvester, and my other cat Bill.\n";
$pet =~ s/cat/dog/g;
say $pet;

Notice that without the g flag, Bill avoids substitution-related polymorphism.

The i flag works just as it does in matching expressions: It forces your matching search to be case-insensitive.

Maintainability

Once you start to see how patterns describe text, everything so far is reasonably simple. Regexps may start simple, but often they grow in to larger beasts. There are two good techniques for making regexps more readable: adding comments and factoring them into smaller pieces.

The x flag allows you to use whitespace and comments within regexps, without it being significant to the pattern:

my ($time, $hours, $minutes, $seconds) =
    $string =~ /(                 # capture entire match
                    (\d{1,2})     # one or two digits for the hour
                    :
                    (\d{2})       # two digits for the minutes
                    :
                    (\d{2})       # two digits for the seconds
                )
    /x;

That may be a slight improvement for the previous version of this regexp, but this technique works even better for complex regexps. Be aware that if you do need to match whitespace within the pattern, you must use \s or an equivalent.

Adding comments is helpful, but sometimes giving a name to a particular piece of code is sufficient clarification. The qr// operator compiles but does not execute a regexp, producing a regexp object that you can use inside a match or substitution:

my $two_digits = qr/\d{2}/;

my ($time, $hours, $minutes, $seconds) =
    $string =~ /(                 # capture entire match
                    (\d{1,2})     # one or two digits for the hour
                    :
                    ($two_digits) # minutes
                    :
                    ($two_digits) # seconds
                )
    /x;

Of course, you can use all of the previous techniques as well:

use 5.010;

my $two_digits        = qr/\d{2}/;
my $one_or_two_digits = qr/\d{1,2}/;

my ($time, $hours, $minutes, $seconds) =
    $string =~ /(?<time>
                    (?<hours> $one_or_two_digits)
                    :
                    (?<minutes> $two_digits)
                    :
                    (?<seconds> $two_digits)
                )
    /x;

Note that the captures are available through %+ as well as in the list of values returned from the match.

Putting it all together

Regular expressions have many practical uses. Consider a httpd log analyzer for an example. One of the play-around items in the previous article was to write a simple log analyzer. You can make it more interesting; how about a log analyzer that will break down your log results by file type and give you a list of total requests by hour.

(Complete source code.)

Here's a sample line from a httpd log:

127.12.20.59 - - [01/Nov/2000:00:00:37 -0500] "GET /gfx2/page/home.gif HTTP/1.1" 200 2285

The first task is split this into fields. Remember that the split() function takes a regular expression as its first argument. Use /\s/ to split the line at each whitespace character:

my @fields = split /\s/, $line;

This gives 10 fields. The interesting fields are the fourth field (time and date of request), the seventh (the URL), and the ninth and 10th (HTTP status code and size in bytes of the server response).

Step one is canonicalization: turning any request for a URL that ends in a slash (like /about/) into a request for the index page from that directory (/about/index.html). Remember to escape the slashes so that Perl doesn't consider them the terminating characters of the match or substitution:

$fields[6] =~ s/\/$/\/index.html/;

This line is difficult to read; it suffers from leaning-toothpick syndrome. Here's a useful trick for avoiding the leaning-toothpick syndrome: replace the slashes that mark regular expressions and s/// statements with any other matching pair of characters, such as { and }. This allows you to write a more legible regex where you don't need to escape the slashes:

$fields[6] =~ s{/$}{/index.html};

(To use this syntax with a matching expression, put a m in front of it. /foo/ becomes m{foo}.)

Step two is to assume that any URL request that returns a status code of 200 (a successful request) is a request for the file type of the URL's extension (a request for /gfx/page/home.gif returns a GIF image). Any URL request without an extension returns a plain-text file. Remember that the period is a metacharacter, so escape it!

if ($fields[8] eq '200') {
    if ($fields[6] =~ /\.([a-z]+)$/i) {
        $type_requests{$1}++;
    } else {
        $type_requests{txt}++;
    }
}

Next, retrieve the hour when each request took place. The hour is the first string in $fields[3] that will be two digits surrounded by colons, so all you need to do is look for that. Remember that Perl will stop when it finds the first match in a string:

# Log the hour of this request
$fields[3] =~ /:(\d{2}):/;
$hour_requests{$1}++;

Finally, rewrite the original report() sub. We're doing the same thing over and over (printing a section header and the contents of that section), so we'll break that out into a new sub. We'll call the new sub report_section():

 sub report {
    print "Total bytes requested: ", $bytes, "\n"; print "\n";
    report_section("URL requests:", %url_requests);
    report_section("Status code results:", %status_requests);
    report_section("Requests by hour:", %hour_requests);
    report_section("Requests by file type:", %type_requests);
}

The new report_section() sub is very simple:

sub report_section {
    my ($header, %types) = @_;

    say $header;

    for my $type (sort keys %types) {
        say "$type: $types{$type}";
    }

    print "\n";
}

The keys operator returns a list of the keys in the %types hash, and the sort operator puts them in alphabetic order. The next article will explain sort in more detail.

 

Perl 5.10 Advanced Regular Expressions

Presentation by Yves Orton (demerphq)

 

Regular Expressions in Perl 5.10

There are many new features in the regular expression engine of Perl 5.10. I point out some of them.

Named captures

I am trying to match a phone number and save the values in variables.

One way to do it is:

    if ($str =~ /^(\d+)-(\d+)-(\d+)$/) {
        $num{country} = $1;
        $num{area}    = $2;
        $num{phone}   = $3;
    }

The new way is

    if ($str =~ /^(?<country>\d+)-(?<area>\d+)-(?<phone>\d+)$/) {
        %num = %+;
    }

 

perldelta - what is new for perl 5.10.0 - search.cpan.org

perl -d
The Perl debugger can now save all debugger commands for sourcing later; notably, it can now emulate stepping backwards, by restarting and rerunning all bar the last command from a saved command history.

It can also display the parent inheritance tree of a given class, with the i command.

Use of uninitialized value

Perl will now try to tell you the name of the variable (if any) that was undefined.

  1. The feature pragma

    The feature pragma is used to enable new syntax that would break Perl's backwards-compatibility with older releases of the language. It's a lexical pragma, like strict or warnings.

    Currently the following new features are available: switch (adds a switch statement), say (adds a say built-in function), and state (adds a state keyword for declaring "static" variables). Those features are described in their own sections of this document.

    The feature  pragma is also implicitly loaded when you require a minimal perl version (with the use VERSION  construct) greater than, or equal to, 5.9.5. See feature for details. 
     

  2. say()

    say() is a new built-in, only available when use feature 'say' is in effect, that is similar to print(), but that implicitly appends a newline to the printed string. See "say" in perlfunc. (Robin Houston)
     

  3. Switch and Smart Match operator

    Perl 5 now has a switch statement. It's available when use feature 'switch' is in effect. This feature introduces three new keywords, given, when, and default:

        given ($foo) {
            when (/^abc/) { $abc = 1; }
            when (/^def/) { $def = 1; }
            when (/^xyz/) { $xyz = 1; }
            default { $nothing = 1; }
        }

    A more complete description of how Perl matches the switch variable against the when conditions is given in "Switch statements" in perlsyn.

    This kind of match is called smart match, and it's also possible to use it outside of switch statements, via the new ~~ operator. See "Smart matching in detail" in perlsyn.
     

  4. state() variables

    A new class of variables has been introduced. State variables are similar to my variables, but are declared with the state keyword in place of my. They're visible only in their lexical scope, but their value is persistent: unlike my variables, they're not undefined at scope entry, but retain their previous value. (Rafael Garcia-Suarez, Nicholas Clark)

    To use state variables, one needs to enable them by using

        use feature 'state';

    or by using the -E command-line switch in one-liners. See "Persistent variables via state()" in perlsub.
     

  5. Lexical $_

    The default variable $_ can now be lexicalized, by declaring it like any other lexical variable, with a simple

        my $_;

    The operations that default on $_ will use the lexically-scoped version of $_ when it exists, instead of the global $_.

    In a map or a grep block, if $_ was previously my'ed, then the $_ inside the block is lexical as well (and scoped to the block).

    In a scope where $_ has been lexicalized, you can still have access to the global version of $_ by using $::_, or, more simply, by overriding the lexical declaration with our $_. (Rafael Garcia-Suarez)

[Feb 25, 2009] Perl 5.10 highlights

See also slides at Perl 5.10 for People Who Aren't Totally Insane. active state has precompiled version for many plarforms, see  Perl 5.10. Span have some details  Rafaël Garcia-Suarez - perl-5.10.0 - search.cpan.org

[Dec 12, 2008] The A-Z of Programming Languages Perl

What new elements does Perl 5.10.0 bring to the language? In what way is it preparing for Perl 6?

Perl 5.10.0 involves backporting some ideas from Perl 6, like switch statements and named pattern matches. One of the most popular things is the use of “say” instead of “print”.

This is an explicit programming design in Perl — easy things should be easy and hard things should be possible. It's optimised for the common case. Similar things should look similar but similar things should also look different, and how you trade those things off is an interesting design principle.

Huffman Coding is one of those principles that makes similar things look different.

In your opinion, what lasting legacy has Perl brought to computer development?

An increased awareness of the interplay between technology and culture. Ruby has borrowed a few ideas from Perl and so has PHP. I don't think PHP understands the use of signals, but all languages borrow from other languages, otherwise they risk being single-purpose languages. Competition is good.

It's interesting to see PHP follow along with the same mistakes Perl made over time and recover from them. But Perl 6 also borrows back from other languages too, like Ruby. My ego may be big, but it's not that big.

Where do you envisage Perl's future lying?

My vision of Perl's future is that I hope I don't recognise it in 20 years.

Where do you see computer programming languages heading in the future, particularly in the next 5 to 20 years?

Don't design everything you will need in the next 100 years, but design the ability to create things we will need in 20 or 100 years. The heart of the Perl 6 effort is the extensibility we have built into the parser and introduced language changes as non-destructively as possible.

Linux Today's comments

> Given the horrible mess that is Perl (and, BTW,
> I derive 90% of my income from programming in Perl),
.
Did the thought that 'horrible mess' you produce with $language 'for an income' could be YOUR horrible mess already cross your mind? The language itself doesn't write any code.

> You just said something against his beloved
> Perl and compounded your heinous crime by
> saying something nice about Python...in his
> narrow view you are the antithesis of all that is
> right in the world. He will respond with his many
> years of Perl == good and everything else == bad
> but just let it go...
.
That's a pretty pointless insult. Languages don't write code. People do. A statement like 'I think that code written in Perl looks very ugly because of the large amount of non-alphanumeric characters' would make sense. Trying to elevate entirely subjective, aesthetic preferences into 'general principles' doesn't. 'a mess' is something inherently chaotic, hence, this is not a sensible description for a regularly structured program of any kind. It is obviously possible to write (or not write) regularly structured programs in any language providing the necessary abstractions for that. This set includes Perl.
.
I had the mispleasure to have to deal with messes created by people both in Perl and Python (and a couple of other languages) in the past. You've probably heard the saying that "real programmers
can write FORTRAN in any language" already.
It is even true that the most horrible code mess I have
seen so far had been written in Perl. But this just means that a fairly chaotic person happened to use this particular programming language.

[Nov 7, 2008] Perl Express A Free Perl IDE-Editor for Windows.

Express is an unique and powerful integrated development environment (IDE) under Windows 98/Me/2000/XP/2003, includes multiple tools for writing and debugging your perl programs.

Perl Express is intended both for the experienced and professional Perl developers and for the beginners.

Since the version 2.5, Perl Express is free software without any limitations, registration is not required.

General Features
 
Multiple scripts for editing, running and debugging
Full server simulation
Completely integrated debugging with breakpoints, stepping, displaying variable values, etc.
Queries may be created from internal Web browser or Query editor
Test MySQL, MS Access... scripts for Windows
Interactive I/O
Multiple input files
Allows you to set environment variables used for running and debugging script
Customizable code editor with syntax highlighting, unlimited text size, printing, line numbering, bookmarks, column selection, powerful search and replace engine, multilevel undo/redo operations, margin and gutter, etc.
Highlighting of matching braces
Windows/Unix/Mac line endings support
OfficeXP-styled menus and toolbars
HTML, RTF export
Live preview of the scripts in the internal web browser
Directory Window
Code Library
Operation with the projects
Code Templates
Help on functions
Perl printer, pod viewer, table of characters and HTML symbols, and others

[Sep 21, 2008] Using Inline in Perl by Michael Roberts (michael@vivtek.com), Owner, Vitek

 Jun 01, 2001 | developerworks

The new Inline module for Perl allows you to write code in other languages (like C, Python, Tcl, or Java) and toss it into Perl scripts with wild abandon. Unlike previous ways of interfacing C code with Perl, Inline is very easy to use, and very much in keeping with the Perl philosophy. One extremely useful application of Inline is to write quick wrapper code around a C-language library to use it from Perl, thus turning Perl into (as far as I'm concerned) the best testing platform on the planet.

Perl has always been pathetically eclectic, but until now it hasn't been terribly easy to make it work with other languages or with libraries that weren't constructed specifically for it. You had to write interface code in the XS language (or get SWIG to do that for you), build an organized module, and generally keep track of a whole lot of details.

But now things have changed. The Inline module, written and actively (very actively) maintained by Brian Ingerson, provides facilities to bind other languages to Perl. In addition its sub-modules (Inline::C, Inline::Python, Inline::Tcl, Inline::Java, Inline::Foo, etc.) allow you to embed those languages directly in Perl files, where they will be found, built, and dynaloaded into Perl in a completely transparent manner. The user of your script will never know the difference, except that the first invocation of Inline-enabled code takes a little time to complete the compilation of the embedded code.

The world's simplest Inline::C program

Just to show you what I mean, let's look at the simplest possible Inline program; this uses an embedded C function, but you can do substantially the same thing with any other language that has Inline support.


Listing 1. Inline "Hello, world"
use Inline C => <<'END_C';

void greet() {
  printf("Hello, world!
");
}
END_C

greet;

Naturally, what the code does is obvious. It defines a C-language function to do the expected action, and then it treats it as a Perl function thereafter. In other words, Inline does exactly what an extension module should do. The question that may be uppermost in your mind is, "How does it do that?". The answer is pretty much what you'd expect: it takes your C code, builds an XS file around it in the same way that a human extension module writer would, builds that module, then loads it. Subsequent invocations of the code will simply find the pre-built module already there, and load it directly.

You can even invoke Inline at runtime by using the Inline->bind function. I don't want to do anything more than dangle that tantalizing fact before you, because there's nothing special about it besides the point that you can do it if you want to.

[May 06, 2008] ack! - Perl-based grep replacement

There are some tools that look like you will never replace them. One of those (for me) is grep. It does what it does very well (remarks about the shortcomings of regexen in general aside). It works reasonably well with Unicode/UTF-8 (a great opportunity to Fail Miserably for any tool, viz. a2ps).

Yet, the other day I read about ack, which claims to be "better than grep, a search tool for programmers". Woo. Better than grep? In what way?

The ack homepage lists the top ten reasons why one should use it instead of grep. Actually, it's thirteen reasons but then some are dupes. So I'd say "about ten reasons". Let's look at them in order.

  1. It's blazingly fast because it only searches the stuff you want searched.

    Wait, how does it know what I want? A DWIM-Interface at last? Not quite. First off, ack is faster than grep for simple searches. Here's an example:

    $ time ack 1Jsztn-000647-SL exim_main.log >/dev/null
    real    0m3.463s
    user    0m3.280s
    sys     0m0.180s
    $ time grep -F 1Jsztn-000647-SL exim_main.log >/dev/null
    real    0m14.957s
    user    0m14.770s
    sys     0m0.160s
    

    Two notes: first, yes, the file was in the page cache before I ran ack; second, I even made it easy for grep by telling it explicitly I was looking for a fixed string (not that it helped much, the same command without -F was faster by about 0.1s). Oh and for completeness, the exim logfile I searched has about two million lines and is 250M. I've run those tests ten times for each, the times shown above are typical.

    So yes, for simple searches, ack is faster than grep. Let's try with a more complicated pattern, then. This time, let's use the pattern (klausman|gentoo) on the same file. Note that we have to use -E for grep to use extended regexen, which ack in turn does not need, since it (almost) always uses them. Here, grep takes its sweet time: 3:56, nearly four minutes. In contrast, ack accomplished the same task in 49 seconds (all times averaged over ten runs, then rounded to integer seconds).

    As for the "being clever" side of speed, see below, points 5 and 6

  2. ack is pure Perl, so it runs on Windows just fine.

    This isn't relevant to me, since I don't use windows for anything where I might need grep. That said, it might be a killer feature for others.

  3. The standalone version uses no non-standard modules, so you can put it in your ~/bin without fear.

    Ok, this is not so much of a feature than a hard criterion. If I needed extra modules for the whole thing to run, that'd be a deal breaker. I already have tons of libraries, I don't need more undergrowth around my dependency tree.

  4. Searches recursively through directories by default, while ignoring .svn, CVS and other VCS directories.

    This is a feature, yet one that wouldn't pry me away from grep: -r is there (though it distinctly feels like an afterthought). Since ack ignores a certain set of files and directories, its recursive capabilities where there from the start, making it feel more seamless.

  5. ack ignores most of the crap you don't want to search

    To be precise:

    • VCS directories
    • blib, the Perl build directory
    • backup files like foo~ and #foo#
    • binary files, core dumps, etc.

    Most of the time, I don't want to search those (and have to exclude them with grep -v from find results). Of course, this ignore-mode can be switched off with ack (-u). All that said, it sure makes command lines shorter (and easier to read and construct). Also, this is the first spot where ack's Perl-centricism shows. I don't mind, even though I prefer that other language with P.

  6. Ignoring .svn directories means that ack is faster than grep for searching through trees.

    Dupe. See Point 5

  7. Lets you specify file types to search, as in --perl or --nohtml.

    While at first glance, this may seem limited, ack comes with a plethora of definitions (45 if I counted correctly), so it's not as perl-centric as it may seem from the example. This feature saves command-line space (if there's such a thing), since it avoids wild find-constructs. The docs mention that --perl also checks the shebang line of files that don't have a suffix, but make no mention of the other "shipped" file type recognizers doing so.

  8. File-filtering capabilities usable without searching with ack -f. This lets you create lists of files of a given type.

    This mostly is a consequence of the feature above. Even if it weren't there, you could simply search for "."

  9. Color highlighting of search results.

    While I've looked upon color in shells as kinda childish for a while, I wouldn't want to miss syntax highlighting in vim, colors for ls (if they're not as sucky as the defaults we had for years) or match highlighting for grep. It's really neat to see that yes, the pattern you grepped for indeed matches what you think it does. Especially during evolutionary construction of command lines and shell scripts.

  10. Uses real Perl regular expressions, not a GNU subset

    Again, this doesn't bother me much. I use egrep/grep -E all the time, anyway. And I'm no Perl programmer, so I don't get withdrawal symptoms every time I use another regex engine.

  11. Allows you to specify output using Perl's special variables

    This sounds neat, yet I don't really have a use case for it. Also, my perl-fu is weak, so I probably won't use it anyway. Still, might be a killer feature for you.

    The docs have an example:

    ack '(Mr|Mr?s)\. (Smith|Jones)' --output='$&'
  12. Many command-line switches are the same as in GNU grep:

    Specifically mentioned are -w, -c and -l. It's always nice if you don't have to look up all the flags every time.

  13. Command name is 25% fewer characters to type! Save days of free-time! Heck, it's 50% shorter compared to grep -r

    Okay, now we have proof that not only the ack webmaster can't count, he's also making up reasons for fun. Works for me.

Bottom line: yes, ack is an exciting new tool which partly replaces grep. That said, a drop-in replacement it ain't. While the standalone version of ack needs nothing but a perl interpreter and its standard modules, for embedded systems that may not work out (vs. the binary with no deps beside a libc). This might also be an issue if you need grep early on during boot and /usr (where your perl resides) isn't mounted yet. Also, default behaviour is divergent enough that it might yield nasty surprises if you just drop in ack instead of grep. Still, I recommend giving ack a try if you ever use grep on the command line. If you're a coder who often needs to search through working copies/checkouts, even more so.

Update

I've written a followup on this, including some tips for day-to-day usage (and an explanation of grep's sucky performance).

Comments

René "Necoro" Neumann writes (in German, translation by me):

Stumbled across your blog entry about "ack" today. I tried it and found it to be cool :). So I created two ebuilds for it:

Just wanted to let you know (there is no comment function on your blog).

[Mar 11, 2008]  Perl Tutorial 19: Functions lc, uc, lcfirst, ucfirst

Youtube has educational potential

YouTube

[Mar 5, 2008] The New York Times Perl Profiler By Adam Kaplan

Tags: , ,

I work in the NYTimes.com feeds team. We handle retrieving, parsing and transforming incoming feeds from whatever strange proprietary format our partners choose to give us into something that our CMS can digest. As you can imagine, we deal with a huge amount of text processing. To handle all of these transformations as efficiently as possible we rely heavily on the magic of Perl. Recently, as feeds become more and more important, we have begun to feel pains caused by past impromptu segments of inefficient code written to meet quick, episodic deadlines. A situation that we are especially prone to as a fast moving news organization.

I am a relatively new employee here at NYTimes.com and one of my responsibilities is to create tools to help ensure the integrity and scalability of our code. To this end, I would like to introduce you to The New York Times Perl Profiler, or Devel::NYTProf. The purpose of this tool is to allow developers to easily profile Perl code line-by-line with minimal computational overhead and highly visual output. With only one additional command, developers can generate robust color-coded HTML reports that include some useful statistics about their Perl program. Here is the typical usage:

perl -d:NYTProf myslowcode.pl
nytprofhtml

See? Its easy! nytprofhtml is an implementation of the included reporting interface (Devel::NYTProf::Reader). If you don’t want HTML reports, you can implement your own format with relative ease. If you create something cool, be sure to let me know via CPAN patch request or open@nytimes.com. Detailed instructions can be found in the documentation and source code on CPAN.

You can see sample screen shots of the html report’s index pageand a single module report.

Similar tools exist to profile Perl code. Devel::DProf is the ubiquitous profiler, but it only collects information about subroutine calls. Because of this limitation, its not all that helpful in finding that elusive broken regex in a 75-line subroutine of regex transforms. Devel::FastProf is another per-line profiler, however I found its output difficult to coerce into HTML. It also doesn’t support non-Linux systems (we need at least Solaris and Ubuntu/Linux support).

Devel::NYTProf is available as a distribution on the CPAN. You may install by typing “install Devel::NYTProf” in the ‘cpan’ command-line application, or manually by downloading the tarball from CPAN.

We were able to reduce the long runtime on one particular application by 20% (about a minute) after the very first test run of our profiler. We hope that you will find our tool as useful as we have. Of course, any comments and suggestions are welcome!

[Nov 30, 2007] BBC - Radio Labs - Perl on Rails by Tom Scott

| www.bbc.co.uk/blogs/radiolabs

Like most organisations the BBC has its own technical ecosystem; the BBC's is pretty much restricted to Perl and static files. This means that the vast majority of the BBC's website is statically published - in other words HTML is created internally and FTP'ed to the web servers. There are then a range of Perl scripts that are used to provide additional functionality and interactivity.

While there are some advantages to this ecosystem there are also some obvious disadvantages. And a couple of implication, including an effective hard limit on the number of files you can save in a single directory (many older, but still commonly used, filesystems just scan through every file in a directory to find a particular filename so performance rapidly degrades with thousands, or tens of thousands, of files in one directory), the inherent complexity of keeping the links between pages up to date and valid and, the sheer number of static files that would need to be generate to deliver the sort of aggregation pages we wanted to create when we launched /programmes; let alone our plans for /music and personalisation.

What we wanted was a dynamic publishing solution - in other words the ability to render webpages on the fly, when a user requests them. Now obviously there are already a number of existing frameworks out there that provide the sort of functionality that we needed, however none that provided the functionality and that could be run on the BBC servers. So we (the Audio and Music bit of Future Media and Technology - but more specifically Paul, Duncan, Michael and Jamie) embarked on building a Model-view-controller (MVC) framework in Perl.

For applications that run internally we use Ruby on Rail. Because we enjoy using it, its fast to develop with, straight forward to use and because we use it (i.e. to reduce knowledge transfer and training requirements) we decided to follow the same design patterns and coding conventions used in Rails when we built our MVC framework. Yes that's right we've built Perl on Rails.

This isn't quite as insane as it might appear. Remember that we have some rather specific non-functional requirements. We need to use Perl, there are restrictions on which libraries can and can't be installed on the live environment and we needed a framework that could handle significant load. What we've built ticks all those boxes. Our benchmarking figures point to significantly better performance than Ruby on Rails (at least for the applications we are building), it can live in the BBC technical ecosystem and it provides a familiar API to our web development and software engineering teams with a nice clean separation of duties with rendering completely separated from models and controllers.

Using this framework we have launched /programmes. And because the pages are generated dynamically we can aggregate and slice and dice the content in interesting ways. And nor do we have to sub divide our pages into arbitrary directories on the web server - the BBC broadcasts about 1,400 programmes a day which means if we created a single static file for each episode we would start to run into performance problems within a couple of weeks.

Now since we've gone to the effort of building this framework and because it can be used to deliver great, modern web products we want to use it elsewhere. As I've written about elsewhere we are working on building an enhanced music site built around a MusicBrainz spine. But that's just my department - what about the rest of the BBC?

In general the BBC's Software Engineering community is pretty good at sharing code. If one team has something that might be useful elsewhere then there's no problem in installing it and using it elsewhere. What we're not so good at is coordinating our effort so that we can all contribute to the same code base - in short we don't really have an open source mentality between teams - we're more cathedral and less bazaar even if we freely let each other into our cathedrals.

With the Perl on Rails framework I was keen to adopted a much more open source model - and actively encouraged other teams around the BBC to contribute code - and that's pretty much what we've done. In the few weeks since the programmes beta launch JSON and YAML views have been written - due to go live next month. Integration with the BBC's centrally managed controlled vocabulary - to provide accurate term extraction and therefore programme aggregation by subject, person or place - is well underway and should be with us in the new year. And finally the iPlayer team are building the next generation iPlayer browse using the framework. All this activity is great news. With multiple teams contributing code (rather than forking it) everyone benefits from faster development cycles, less bureaucracy and enhanced functionality.

Comments

======

  1. At 12:37 AM on 01 Dec 2007, Anonymous Perl Lover wrote:

    Any reason U didn't use Catalyst, Maypole, Combust, CGI::Application, CGI::Prototype, or any of the dozens of other perl MVC frameworks?

    Catalyst was around long before Ruby on Rails (possibly before the Ruby language for that matter), but never made the kind of headlines RoR gets. The Ruby community seems to be much better at mobilizing.

    Actually, I think it's the Perl community's TMTOWTDI lifestyle. In Ruby, for small things *maybe* U use Camping, but you'll probably use Rails and for everything else you'll definitely use Rails. There are some others, but only the developers of them use them. In Perl, literally everyone writes their own.

    Inferior languages like Java and C# rose up real quick and stayed there--keep getting bigger even--because they limit their users' choices. Perl stayed in the background and is now dying because it believes in offering as many choices as possible. That's why Perl 666 is going to be more limiting. As U can tell from my subtle gibe that the next version of Perl is evil, I prefer choices. But developers like me are a dying breed.

    Developers now-a-days need cookie cutter, copy&paste code. When's the Perl on Rails book going to be released? Probably around the time the Catalyst one is. Or the CGI::Application one.

    Bleh. I wrote way too much. U can't even put this up now, it's too long. I didn't realize I was so annoyed by the one jillion perl MVC web frameworks and how they're just one tiny example of why perl is dead.
     

  2. At 01:20 AM on 01 Dec 2007, Anon wrote:

    > The Ruby community seems to be much better at
    > mobilizing.

    I really think that first video demo of RoR using Textmate is what had a large effect. Before that, I don't remember seeing hardly any videos of development happening right in front of your eyes.

    You watched the video thinking, "wow! it's so fast and easy! I'm gonna get in on that!". When, in reality, any good programmer using any good environment can make a software look good like that (if they practice a bit beforehand).

    As an aside, anyone know of a video demo podcast for Catalyst?
     

  3. At 07:14 AM on 01 Dec 2007, Dave Cross wrote:

    Others have already commented that you seem to be reinventing the wheel here. No-one seems to have mentioned the Perl MVC framework which (in my opinion) is most like Rails - it's called Jifty (http://jifty.org/).

    But there already parts of the BBC who are using Catalyst with great success. Why didn't you ask around a bit before embarking on what was presumably not a trivial task?
     

  4. At 09:52 AM on 01 Dec 2007, Raips wrote:


    How about BBC doing same as New York Times did?

    http://www.linux.com/feature/120359
    http://open.nytimes.com/
     

    Complain about this post
 

 

 

 

 

 

[Feb 21, 2008] Free Perl Books - freeprogrammingresources.com

  1. Perl 5 by Example  Online Perl Book. 22 chapters with appendixes.
  2. Beginning Perl  Very complete (and completely free) Perl Beginners book, both HTML and downloadable (PDF).
  3. Practical mod_perl  This free perl book is available in html or pdf versions, so you can view the perl book online or download this free book.
  4. Extreme Perl  Extreme Perl is a book about Extreme Programming using the programming language Perl. This free Perl ebook is available in HTML, PDF, or A4 PDF.
  5. Learning Perl the Hard Way  Learning Perl the Hard Way is a free book available under the GNU Free Documentation License. This free perl ebook can be downloaded in pdf or gzipped postscript format.
  6. Web Client Programming with Perl  Free Online Perl Book
  7. The Perl Reference Guide  The guide contains a concise description of all Perl 5 statements, functions, variables and lots of other useful information.
  8. Perl Reference Guide & Perl Pocket Reference (PDF Link)  Short Perl reference book in pdf form.
  9. CGI Programming on the World Wide Web  This is an out of print book from 1996 that is available from Oreilly.
  10. Beginning Perl for Bioinformatics (Sample Chapter)  GenBank (Genetic Sequence Data Bank) is a rapidly growing international repository of known genetic sequences from a variety of organisms. Its use is central to modern biology and to bioinformatics.
  11. CGI Programming with Perl, 2nd Edition (Sample Chapter)  Security.
  12. Advanced Perl Programming (Sample Chapter)  Chapter 1: Data References and Anonymous Storage
  13. Programming Web Services with Perl (Sample Chapter)  One chapter on Soap.
  14. Oreilly Sample Chapters Quite a few sample chapters from perl books are indexed here (some have already been linked to individually).

[Jan 6, 2008] freshmeat.net Project details for Wendy Site Engine

Wendy is Perl framework for Web sites and services development. It works with mod_perl 2 and PostgreSQL. Built with security and performance in mind, Wendy supports DB servers clustering, separate read- and write- DB back-ends, data cache with memcached, templates cache, etc.

Release focus: Initial freshmeat announcement

[Dec 19, 2007]  No Comments

My favorite (so far) programming language has been born 20 years ago.   It’s been loved and hated.  It’s been praised and damned.  It’s been complimented and criticized.  But all that doesn’t matter.  What matters is that it has been helping people all over the world to solve problems.  Tricky, boring, annoying problems.  It provided enough power to build enterprise grade applications, while still being easy and flexible enough to be the super-glue of many systems.

I’m sure Perl will still be with us in another 20 years.  I wish it to be as useful in that time, as it is now.

Thanks, respect, and best wishes to everyone who created and supported Perl, its community and tools all these years.  Happy birthday!

freshmeat.net Project details for pixconv.pl

pixconv.pl is a Perl script to rename (yyyymmdd_nnn.ext), (auto-)rotate, resize, scale, grayscale, watermark, borderize, and optimize digital images.

Release focus: Major feature enhancements

Changes:
-b/-B border and -C border color options were added along with a -m match images orientation (landscape or portrait) option. EXIF manipulation was fixed. A -R resize option was added for correctly resizing portrait images. Handling of images with whitespace in their filename was fixed

Author:
Iain Lea [contact developer]  

Perl Resource Center Perl eBooks

Three free Perl e-books
"Learning Perl the Hard Way" http://www.greenteapress.com/perl/ Free eBook: "Learning Perl the Hard Way" by Allen B. Downey, is designed for programmers who do not know Perl. Open source book available under the GNU Free Documentation License. Users can distribute, copy and modify the content.  
"Extreme Perl"
http://www.extremeperl.org/bk/home
Free eBook: "Extreme Perl" by Robert Nagler. Covers extreme programming (an approach to software development that emphasizes business results and involves rapid iteration, code writing and continuous testing), release planning, iteration planning, pair programming, tracking, acceptance testing, coding style, logistics, test-driven design, continuous design, unit testing, refactoring and SMOP.
 
"Beginning Perl"
http://learn.perl.org/library/beginning_perl/
Free eBook: "Beginning Perl" by Simon Cozens. Fourteen chapter book covers simple values, lists and hashes, loops and decisions, regular expressions, files and data, references, subroutines, running and debugging in Perl, modules, object-oriented Perl, CGI, databases and more.

[Dec 17, 2007] Kazi 1.0  by Luka Novsak

Indexer of file tree written in Perl. Looks like limited to HTML files but can probably be extended to other types

About: Kazi is a simple content management system. It takes a directory tree populated with HTML files, and builds a menu of it. It can be extended with modules and customized with templates.

[Dec 9, 2007] freshmeat.net Project details for Host Grapher

Host Grapher is a very simple collection of Perl scripts that provide graphical display of CPU, memory, process, disk, and network information for a system. There are clients for Windows, Linux, FreeBSD, SunOS, AIX and Tru64. No socket will be opened on the client, nor will SNMP be used for obtaining the data.

[Dec 7, 2007] freshmeat.net Project details for perltidy

Perltidy is a Perl script indenter and beautifier. By default it approximately follows the suggestions in perlstyle(1), but the style can be adjusted with command line parameters. Perltidy can also write syntax-colored HTML output.

Release focus: Minor feature enhancements

[Dec 7, 2007] freshmeat.net Project details for XHTML Family Tree Generator

XHTML Family Tree Generator is a CGI Perl script together with some Perl modules that will create views of a family tree. Data can be stored in a database or in a data file. The data file is either a simple text (CSV), an Excel, or GEDCOM file listing the family members, parents, and other details. It is possible to show a tree of ancestors and descendants for any person, showing any number of generations. Other facilities are provided for showing email directories, birthday reminders, facehall, and more. It has a simple configuration, makes heavy use of CGI (and other CPAN modules), generates valid XHTML, and has support for Unicode and multiple languages.

Release focus: N/A

Changes:
Romanian language support has been added, and the code has been cleaned up.

[Dec 6, 2007] freshmeat.net Project details for Sman

Sman is "The Searcher for Man Pages", an enhanced version of "apropos" and "man -k". Sman adds several key abilities over its predecessors, including stemming and support for complex boolean text searches such as "(linux and kernel) or (mach and microkernel)". It shows results in a ranked order, optionally with a summary of the manpage with the searched text highlighted. Searches may be applied to the manpage section, title, body, or filename. The complete contents of the man page are indexed. A prebuilt index is used to perform fast searches.

[Dec 2, 2007] freshmeat.net Project details for PodBrowser

PodBrowser is a documentation browser for Perl. It can be used to view the documentation for Perl's builtin functions, its "perldoc" pages, pragmatic modules, and the default and user-installed modules. It supports bookmarks, printing, and integration with the CPAN search site.

[Dec 1, 2007] freshmeat.net Project details for ConfigGeneral

With Config::General you can read and write config files and access the parsed contents from a hash structure. The format of config files supported by Config::General is inspired by the Apache config format (and is 100% compatible with Apache configs). It also supports some enhancements such as here-documents, C-style comments, and multiline options.

Release focus: Major bugfixes

Changes:
The variable interpolation code has been rewritten. This fixes two bugs. More checks were added for invalid structures. More tests for variable interpolation were added to "make test".

[Oct 27, 2007] UNIX System Administration Tools

rshall
Runs commands on multiple remote hosts simultaneously. (Perl)
View the README
Download version 11.0 - gzipped tarball, 9 KB
Last update: November 2005

[Oct 27, 2007] UNIX System Administration Tools

autosync
Copies files to remote hosts based on a configuration file. (Perl)
View the README
Download version 1.4 - gzipped tarball, 5 KB
Last update: April 2007

[Sep 6, 2007]  Komodo Spawns New Open Source IDE Project

"In February, ActiveState released a free version of its flagship Komodo IDE called Komodo Edit, and that release was a prelude to going open source. Open Komodo is only a subset of Edit, though. "

September 6, 2007
Komodo Spawns New Open Source IDE Project
By Sean Michael Kerner

Development tools vendor ActiveState is opening up parts of its Komodo IDE (define) in a new effort called Open Komodo.

Komodo is a Mozilla Framework-based application that uses Mozilla's XUL (XML-based User Interface Language), which is Mozilla's language for creating its user interface.

The Open Komodo effort will take code from ActiveState's freely available, but not open source, Komodo Edit product and use it as a base for the new open source IDE. The aim is to create a community and a project that will help Web developers to more easily create modern Web-based applications.

"This is our first entry into managing an open source project," Shane Caraveo, Komodo Dev Lead, told Internetnews.com. "We want to start with a tight focus on what we want to accomplish and that focus is supporting the Open Web with a development environment."

Caraveo explained that back in February, ActiveState released a free version of its flagship Komodo IDE called Komodo Edit, and that release was a prelude to going open source. Open Komodo is only a subset of Edit, though.

"We're focusing first strictly on Web development," Caraveo said. "So some of the language support for backend dynamic languages will not be available as open source. They will still be available for free in Edit and possibly as extensions to Open Komodo."

The idea behind creating a fully open source IDE for Web development has been percolating for over a year at ActiveState, according to Caraveo. He said there are also a lot of people in the Mozilla community that have been discussing the creation of an IDE.

"I feel there is no need for them to start from nothing, which is a large investment," Caraveo said. "Since we were a couple months from having everything done, I felt it was a good time to announce, so we can start to talk with people in the community about Komodo from a standpoint that they are willing to work with."

A build of the Open Komodo code base that actually works is expected by late October or early November. That build according to Caraveo will look and work much like Komodo Edit does now.

"We want to be sure that people have something they can play with and actually use immediately, even if it is not the product we want in the end," Caraveo said.

The longer-term project is something called Komodo Snapdragon. The intention of Snapdragon is to provide a top-quality IDE for Web development that focuses on open technologies, such as AJAX, HTML/XML, JavaScript and more.

"We want to provide tight integration into other Firefox-based development tools as well," Caraveo explained. "This would target Web 2.0 applications, and next-generation Rich Internet Applications."

With many IDEs already out in a crowded marketplace for development tools, Open Komodo's use of Mozilla's XUL (pronounced "zule") may well be its key differentiators.

"A XUL-based application uses all the same technologies that you would use to develop an advanced Web site today," Caraveo said.

"This includes XML, CSS and JavaScript. This type of platform allows people who can develop Web site to develop applications. So, I would say that this is an IDE that Web developers can easily modify, hack, build, extend, without having to learn new languages and technologies."

Being open and accessible are critical to the success of Open Komodo; in fact Caraveo noted that the No. 1 success factor is community involvement.

"If Snapdragon is only an ActiveState project, then it has not succeeded in the way we want it to."

Picking Up Perl by Bradley M. Kuhn

[Jul 12, 2007] Minimal Perl

See also Manning Minimal Perl

Only on author site

[May 3, 2007] Python, Tcl and Perl, oh my! (was Re tcl vs. perl) - comp.lang.perl.tk Google Groups

Jun 26 1996 (Dan Connolly)

Sorry for the length, but I felt inspired tonight...
 

In article <TMB.96Jun17182...@best.best.com> t...@best.com (.) writes:
 
 >
 > In article <Pine.SUN.3.93.960617173341.9643A-100...@blackhole.dimensional.com> Kirk Haines <oshcn...@dimensional.com> writes:
 >
 >    > Well it's probably just my stupidity
 >    > (and that of everyone else who works here) but I've got about 50 Perl
 >    > scripts that do god knows what, and the people who wrote them left,
 >    > and we are experiencing excruciating pain.
 >
 >    And that is not a situation in the least bit related to Perl.  That is the
 >    fault of whoever wrote those scripts [...]
 

 > Of course it is _related_ to Perl.  Yes, you can write better or worse
 > Perl code.
 

 > In fact, one way management can bring about good coding styles without  > examining each and every line of code is by choosing tools and  > languages that enforce some aspects of good coding styles.  Perl isn't  > one of those languages. 

Short form: (1) there's a tension between early detection of faults and rapid prototyping, and perl and python are at very different points on the spectrum.  (2) it's more the community around a language than the language itself that influences code quality. (3) For my purposes, perl will continue to be a work-horse tool, but I'll be using Java more for things that I would have used python or Modula-3 for, and I hope the industry uses Java for things that it has been using C++ for. 

Long form:  

(1) Traditional perl programming is a black art, but a darned useful craft as well. The semantics are very powerful, and the syntactic features combine in amazingly powerful ways. But you definitely have enough rope to hang yourself; not enough to hang the machine or crash all the time, like the way you can corrupt the runtime in C by writing past the end of an array or calling free() twice. But like C, you can introduce subtle logic bugs by using = where you meant ==. And failing to check return values results in a program that nearly always reports successful completion, whether it really succeeded or not. 

I like studying and learning programming languages, and I found it more difficult to build the necessary intuitions to read and write traditional perl programs than to build intuitions for any language I leaned previously, and nearly every language I learned since. 

I learned perl "from a master" -- Tom Christiansen was in the next office, and he painstakingly (if not patiently :-) answered my many frustrated questions. Previous to learning perl, I had learned a dozen or so languages without much difficulty (here in roughly the order I learned them): 

        Extended BASIC (Radio Shack Color Computer)                 learned from a book, disassembled the interpreter         6809 assembler                 learned from a book with a friend, and from                 disassembling LOTS of stuff         Basic09                 learned from the manuals, with help from BBS folks         Logo                 learned in a store one afternoon, reading a book         Pascal                 learned one summer from a college professor         C                 read a book one weekend         COBOL                 learned at a summer job         shell                 misc. hacking in school         LISP                 read some books, hacked on TI lisp machine in a class         prolog                 programming languages course         Modula-2                 programming languages course         Ada                 programming languages course 

Learning assembly after basic was tough: "Where are the variables? Geez.. rebooting the machine all the time is a pain. I wish this thing had automatic string handling." And I'm not sure I ever grokked Ada's rendezvous stuff completely. And I learned COBOL in a strictly monkey-see-monkey-do manner. It was months before I found a manual. 

None of those were particularly unexpected difficulties. But after an intial taste of perl, it looked really easy and powerful, and I was frustrated when the first few real programs I tried to write had bugs that I just could not figure out at all. 

Really learning to use regexps was well worth the effort, but things like "surprise!  <FILE> works completely different in an array context!"  was an experience I don't care to repeat. I can't remember the exact program that drove me batty, but it was related to: 

        $x = <FILE>; # reads one line         @x = <FILE>; # reads whole file, split into lines 

but the idiom I used that created an array context wasn't as transparent as @x -- it was something like grep() or chop(). Ah yes, I think it was chop(). Who would have guessed that 

        chop(<XXX>); 

would read thew whole file? 

Perl is full of short-hand idioms that are so useful that knowledgeable perl programmer's would feel awkward writing them out long-hand, and yet they can throw newbies for a loop. For example, the work-horse idiom: 

        while(<>){                 ...;         } 

is short for: 

        while( ($_ = <STDIN>) gt ''){                 ...;         } 

roughly speaking; that is, ignoring the tremendously useful feature of <> which processes files mentioned on the command line (aka @ARGV) ala traditional unix filters, which would take me about 10 or 20 lines to write out longhand, and about an hour to get just right. Ah! and I forgot to mention that <XXX> is an idiom for reading one line from a file... and lines are delimited by the magic $/, and ... 

The point is that even as of several years ago, perl is a highly-evolved, highly idiomatic language and tool, based on zillions of person-years of use in unix system administration. The vast majority of text-processing/system management tasks that folks might want to hack up a script to tackle can be developed quickly, expressed succincly, and run efficiently in perl. 

The first crack usually looks like: 

        while(<>){                 if(/X-Diagnostic: (.*)/){                         print "diagnostic: $1\n";                 }         } 

and it usually works great the first time you try it. Then you add a few wrinkles, and before you know it, the task you set out to do is solved. 

Taking that piece of code that solves a particular problem, and software-engineering it usually takes about 10x longer than it took to develop in the first place (as these tasks are often personal and transient, it's rarely worth the trouble anyway). 

The author of the hack is generally in a position to restrict the inputs to reasonable stuff (eliminating the need to deal with corner cases) and check the output by hand (eliminating the need to document and report errors in typical engineering fashion). 

This is very much in contrast with other languages, where the cost of solving the immediate problem may be significantly higher, but the result is much more likely to have good software engineering characteristics, such that it's useful to other folks or other projects with little added effort. 

For example, Olin Shivers described his experience writing ML programs: they are a royal pain to get through the type checker, but once they compile, they are often bug-free. 

Python isn't that far along the quick-and-dirty vs. slow-and-clean spectrum, but it's in that direction. 

Contrast the work-horse example above with a loose translation to python: 

        import sys         while 1:                 line = sys.stdin.readline()                 if not line: break                 ... 

incorporating the @ARGV parts of <> would expand it to something like: 

        import sys 

        for f in sys.argv[1:]:                 in = open(f)                 while 1:                         line = sys.stdin.readline()                         if not line: break                         ... 

Python doesn't have special syntax for this sort of thing. So the python code is more verbose and less idiomatic -- easier to grok for the newbie, but harder to "pattern match," or recognize as a common idiom for the seasoned programmer. 

For an example of the stylistic slants of the two languages, consider error/exception conditions. As a rule, in perl, errors are reported as particular return values, whereas in python, they signal exceptions. So in the error case, a perl code fragment will run merrily along, while a python code fragment will trap out. In many text-processing tasks, running merrily along is just what you want. But when you hand that code to your friend, and he presents it with some input that you never considered, python is a lot more likely to let your friend know that the program needs to be enhanced to handle the new situation. 

I've seen exception idioms in perl, but they involve die and EVAL. The runtime libraries don't die on errors, as a rule, and EVAL is a pretty hairy way to do something as mundane as error handling. 

Next, consider naming and scope. By default, perl variables are global, so you almost never have to declare them. Local variables have dynamic scope by default (ala early lisp systems) and traditional statically scoped variables are a perl5 innovation.

On the flip side, python variables are local by default, so you almost never have to worry about the variable clobbering problem.  (python has some semantic gotchas of its own here for the folks who have intuitions about traditional static scoping)

So far, I have discussed mostly the intrinsic aspects of a language that vary along the quick-and-dirty vs. slow-and-clean spectrum. 

But the point of this article is that: 

(2) The comunity around a language -- i.e. the conventional wisdom, history, documentation, and available source code -- has a lot more influence of the quality of code developed in a given language than the intrinsic aspects of the language itself. 

For example, it's perfectly possible to write clean, well structured programs in Fortran. But the bulk of traditional fortran has no comments or indentation, and lots of GOTOs, global variables, and aliased variables. The mindset behind fortran was that hand-optimization was superior to machine-optimization -- a mindset left over from assembler, and popularized by bad compilers. 

COBOL has some really bad features (e.g. lack of local variables) that make writing good programs hard, but don't come close to explaining the astoundinly uninspired programming techniques I've seen employed in some business/database apps I've seen. Stuff like writing 12 paragraphs (subroutines, or functions to the modern world) -- one for each month of the year, with 12 sets of variables jan-X, jan-Y, feb-X, feb-Y, etc., rather than using loops and arrays, which DO exist in COBOL. 

Perl, as a language, is evolving faster than the perl development community. Perl5 in strict mode a reasonable modern object-oriented programming language. But there are ZILLIONs of perl programmers, and from what I can see, about 2% of them bought into the new facilities. The rest of them are still happily getting their jobs done writing perl4 code -- myself included. 

Perl was useful and widely deployed before the OOP "paradigm-shift" hit the industry. And a community with that much momentum doesn't turn on a dime. 

In contrast, python started from scratch after some earlier languages, and had the benefit of looking back at REXX, icon, and perl, as well as C++ and -- most importantly -- Modula-3. So documentation encouraged some pretty modern concepts like objects and modules while the python development community was still young. 

As a result, consider the namespace of functions in the two systems: the languages have roughly equivalent support: python has modules, and perl has packages. But you might not know that from looking at most of the code you see on the net: traditional perl folks rarely use the $package`var stuff, while python folks use it routinely. The perl5 movement is quickly changing this, but until recently, perl programmers use the vast majority of perl's facilities without ever considering packages, while python programmers run into the concept of modules in the early tutorials. 

For me, the bottom line is that I do a lot of quick-and-dirty stuff, and I'm comfortable with perl4's idioms, so I use it a lot. I have dabbled in perl5, but I'm not yet comfortable with it's OOP idioms. 

I prefer the feel and syntax of python, but the "strictness" often gets in the way, and I end up switching to perl in order to finish the task before leaving for the day. 

When I want to write "correct" programs, neither is good enough. I want lots more help from the machine, like static typechecking. And sad to say, when I want to write code that other folks will use, I choose C.

As much as the industry adopted C++, I find it frightening. It requires all the priestly knowledge and incantations of perl with none of the rapid-prototyping benefits, gives no more safety guarantees than C, and has never been specified to my satisfaction. 

Modual-3 was more fun to learn than I had had in years. The precision, simplicity, and discipline employed in the design of the language and libraries is refreshing and results in a system with amazing complexity management characteristics.

I have high hopes for Java. I will miss a few of Modula-3's really novel features. The way interfaces, generics, exceptions, partial revelations, structural typing + brands come together is fantastic. But Java has threads, exceptions, and garbage collection, combined with more hype than C++ ever had.

I'm afraid that the portion of the space of problems for which I might have looked to python and Modula-3 has been covered -- by perl for quick-and-dirty tasks, and by Java for more engineered stuff. And both perl and Java seem more economical than python and Modula-3. 

Dan --
Daniel W. Connolly        "We believe in the interconnectedness of all things"
Research Scientist, MIT/W3C     PGP: EDF8 A8E4 F3BB 0F3C  FD1B 7BE0 716C FF21
<conno...@w3.org>                  http://www.w3.org/pub/WWW/People/Connolly/

[Apr 28, 2007] freshmeat.net Project details for DocPerl

DocPerl provides a Web-based interface to Perl's Plain Old Documentation (POD). It is a graphical, easy-to-use interface to POD, automatically listing all installed modules on the local host, and any other nominated directories containing Perl files. DocPerl can also display a summary of the APIs defined by files and the code of those files. It can search the POD documentation for module names and for functions defined in modules.

Release focus: Minor bugfixes

Changes:
This release includes fixes for many minor bugs, including the removal of a configuration option that should not have been removed, and many JavaScript issues. The code has been tidied up.

[Mar 26, 2007] freshmeat.net Project details for Perl Dev Kit

Perl Dev Kit 7.0 released...
The Perl Dev Kit (PDK) provides essential tools for building self-contained, easily deployable executables for Windows, Mac OS X, Linux, Solaris, AIX, and HP-UX. The comprehensive feature set includes a graphical debugger and code coverage and hotspot analyzer, as well as tools for building sophisticated Perl-based filters and easily converting useful VBScript code to Perl.

Release focus: Major feature enhancements

Changes:
A coverage and hotspot analyzer tool, PerlCov, was added for better code performance and reliability. PerlApp was improved with more sophisticated module wrapping to improve executable performance. By popular demand, PDK support has been extended to Mac OS X. New native 64-bit support was dded for Windows (x64), Linux (x64), and Solaris (Sparc). New Solaris and AIX GUIs were added.

Author:
Activator [contact developer]  

[Mar 13, 2007]  Programming in Perl - Debugging

On this page, I will post aides and tools that Perl provides which allow you to more efficently debug your Perl code. I will post updates as we cover material necessary for understanding the tools mentioned.
CGI::Dump
Dump is one of the functions exported in CGI.pm's :standard set. It's functionality is similar to that of Data::Dumper. Rather than pretty-printing a complex data structure, however, this module pretty-prints all of the parameters passed to your CGI script. That is to say that when called, it generates an HTML list of each parameter's name and value, so that you can see exactly what parameters were passed to your script. Don't forget that you must print the return value of this function - it doesn't do any printing on its own.
use CGI qw/:standard/;
print Dump;
Benchmark
As you know by now, one of Perl's mottos is "There's More Than One Way To Do It" (TMTOWTDI ©). This is usually a Good Thing, but can occasionally lead to confusion. One of the most common forms of confusion that Perl's verstaility causes is wondering which of multiple ways one should use to get the job done most quickly.

Analyzing two or more chunks of code to see how they compare time-wise is known as "Benchmarking". Perl provides a standard module that will Benchmark your code for you. It is named, unsurprisingly, Benchmark. Benchmark provides several helpful subroutines, but the most common is called cmpthese(). This subroutine takes two arguments: The number of iterations to run each method, and a hashref containing the code blocks (subroutines) you want to compare, keyed by a label for each block. It will run each subroutine the number of times specified, and then print out statistics telling you how they compare.

For example, my solution to ICA5 contained three different ways of creating a two dimensional array. Which one of these ways is "best"? Let's have Benchmark tell us:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark 'cmpthese';

sub explicit {
    my @two_d = ([ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ],
                 [ ('x') x 10 ]);
}

sub new_per_loop {
    my @two_d;
    for (0..4){
        my @inner = ('x') x 10;
        push @two_d, \@inner;
    }
}

sub anon_ref_per_loop {
    my @two_d;
    for (0..4){
        push @two_d, [ ('x') x 10 ];
    }
}

sub nested {
    my @two_d;
    for my $i (0..4){
        for my $j (0..9){
            $two_d[$i][$j] = 'x';
        }
    }
}
cmpthese (10_000, {
                 'Explicit'           => \&explicit,
                 'New Array Per Loop' => \&new_per_loop,
                 'Anon. Ref Per Loop' => \&anon_ref_per_loop,
                 'Nested Loops'       => \&nested,
             }
      );
The above code will print out the following statistics (numbers may be slightly off, of course):
Benchmark: timing 10000 iterations of Anon. Ref Per Loop, Explicit, Nested Loops, New Array Per Loop...
Anon. Ref Per Loop:  2 wallclock secs ( 1.53 usr +  0.00 sys =  1.53 CPU) @ 6535.95/s (n=10000)
Explicit:  1 wallclock secs ( 1.24 usr +  0.00 sys =  1.24 CPU) @ 8064.52/s (n=10000)
Nested Loops:  4 wallclock secs ( 4.01 usr +  0.00 sys =  4.01 CPU) @ 2493.77/s (n=10000)
New Array Per Loop:  2 wallclock secs ( 1.76 usr +  0.00 sys =  1.76 CPU) @ 5681.82/s (n=10000)
                     Rate Nested Loops New Array Per Loop Anon. Ref Per Loop Explicit
Nested Loops       2494/s           --               -56%               -62%     -69%
New Array Per Loop 5682/s         128%                 --               -13%     -30%
Anon. Ref Per Loop 6536/s         162%                15%                 --     -19%
Explicit           8065/s         223%                42%                23%       --

The benchmark first tells us how many iterations of which subroutines it's running. It then tells us how long each method took to run the given number of iterations. Finally, it prints out the statistics table, sorted from slowest to fastest. The Rate column tells us how many iterations each subroutine was able to perform per second. The remaining colums tells us how fast each method was in comparison to each of the other methods. (For example, 'Explicit' was 223% faster than 'Nested Loops', while 'New Array Per Loop' is 13% slower than 'Anon. Ref Per Loop'). From the above, we can see that 'Explicit' is by far the fastest of the four methods. It is, however, only 23% faster than 'Ref Per Loop', which requires far less typing and is much more easily maintainable (if your boss suddenly tells you he'd rather have the two-d array be 20x17, and each cell init'ed to 'X' rather than 'x', which of the two would you rather had been used?).

You can, of course, read more about this module, and see its other options, by reading: perldoc Benchmark

Command-line options
Perl provides several command-line options which make it possible to write very quick and very useful "one-liners". For more information on all the options available, refer to perldoc perlrun
-e
This option takes a string and evaluates the Perl code within. This is the primary means of executing a one-liner
 
perl -e'print qq{Hello World\n};'
(In windows, you may have to use double-quotes rather than single. Either way, it's probably better to use q// and qq// within your one liner, rather than remembering to escape the quotes).
-l
This option has two distinct effects that work in conjunction. First, it sets $\ (the output record terminator) to the current value of $/ (the input record separator). In effect, this means that every print statement will automatically have a newline appended. Secondly, it auto-chomps any input read via the <> operator, saving you the typing necessary to do it.
 
perl -le 'while (<>){ $_ .= q{testing};  print; }'
The above would automatically chomp $_, and then add the newline back on at the print statement, so that "testing" appears on the same line as the entered string.
-w
This is the standard way to enable warnings in your one liners. This saves you from having to type use warnings;
-M
This option auto-uses a given module.
 
perl -MData::Dumper -le'my @foo=(1..10); print Dumper(\@foo);'
-n
This disturbingly powerful option wraps your entire one-liner in a while (<>) { ... } loop. That is, your one-liner will be executed once for each line of each file specified on the command line, each time setting $_ to the current line and $. to current line number.
 
perl -ne 'print if /^\d/' foo.txt beta.txt
The above one-line of code would loop through foo.txt and beta.txt, printing out all the lines that start with a digit. ($_ is assigned via the implicit while (<>) loop, and both print and m// operate on $_ if an explict argument isn't given).
-p
This is essentially the same thing as -n, except that it places a continue { print; } block after the while (<>) { ... } loop in which your code is wrapped. This is useful for reading through a list of files, making some sort of modification, and printing the results.
 
perl -pe 's/Paul/John/' email.txt
Open the file email.txt, loop through each line, replacing any instance of "Paul" with "John", and print every line (modified or not) to STDOUT
-i
This one sometimes astounds people that such a thing is possible with so little typing. -i is used in conjunction with either -n or -p. It causes the files specified on the command line to be edited "in-place", meaning that while you're looping through the lines of the files, all print statements are directed back to the original files. (That goes for both explicit prints, as well as the print in the continue block added by -p.)
If you give -i a string, this string will be used to create a back-up copy of the original file. Like so:
 
perl -pi.bkp -e's/Paul/John/' email.txt msg.txt
The above opens email.txt, replaces each line's instance of "Paul" with "John", and prints the results back to email.txt. The original email.txt is saved as email.txt.bkp. The same is then done for msg.txt

Remember that any of the command-line options listed here can also be given at the end of the shebang in non-oneliners. (But please do not start using -w in your real programs - use warnings; is still preferred because of its lexical scope and configurability).

Data::Dumper
The standard Data::Dumper module is very useful for examining exactly what is contained in your data structure (be it hash, array, or object (when we come to them) ). When you use this module, it exports one function, named Dumper. This function takes a reference to a data structure and returns a nicely formatted description of what that structure contains.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my @foo = (5..10);
#add one element to the end of the array
#do you see the error?
$foo[@foo+1] = 'last';

print Dumper(\@foo);

When run, this program shows you exactly what is inside @foo:

$VAR1 = [
          5,
          6,
          7,
          8,
          9,
          10,
          undef,
          'last'
        ];

(I know we haven't covered references yet. For now, just accept my assertion that you create a reference by prepending the variable name with a backslash...)

__DATA__ & <DATA>
Perl uses the __DATA__ marker as a pseudo-datafile. You can use this marker to write quick tests which would involve finding a file name, opening that file, and reading from that file. If you just want to test a piece of code that requires a file to be read (but don't want to test the actual file opening and reading), place the data that would be in the input file under the __DATA__ marker. You can then read from this pseudo-file using <DATA>, without bothering to open an actual file:
#!/usr/bin/env perl
use strict;
use warnings;

while (my $line = <DATA>) {
  chomp $line;
  print "Size of line $.:  ", length $line, "\n";
}

__DATA__
hello world
42
abcde

The above program would print:

Size of line 1: 11
Size of line 2: 2
Size of line 3: 5
$.
The $. variable keeps track of the line numbers of the file currently being processed via a while (<$fh>) { ... } loop. More explicitly, it is the number of the last line read of the last file read.
__FILE__ & __LINE__
These are two special markers that return, respectively, the name of the file Perl is currently executing, and the Line number where it resides. These can be used in your own debugging statements, to remind yourself where your outputs were in the source code:
  print "On line " . __LINE__ . " of file " . __FILE__ . ", \$foo = $foo\n";
    

Note that neither of these markers are variables, so they cannot be interpolated in a double-quoted string

warn() & die()
These are the most basic of all debugging techniques. warn() takes a list of strings, and prints them to STDERR. If the last element of the list does not end in a newline, warn() will also print the current filename and line number on which the warning occurred. Execution then proceeds as normal.

die() is identical to warn(), with one major exception - the program exits after printing the list of strings.

All debugging statements should make use of either warn() or die() rather than print(). This will insure you see your debugging output even if STDOUT has been redirected, and will give you the helpful clues of exactly where in your code the warning occurred.

[Mar 11, 2007] Sys Admin v16, i03 The Replacements

OK, this is starting to look ugly. Like a regex match, we can pull that apart with a trailing x:
s/
  (
    ^        # either beginning of line
    |        # or
    (?<=,)   # a single comma to the left
  )
  .*?        # as few characters as possible
  (
    (?=,)    # a single comma to the right
    |        # or
    $        # end of string
  )
/XXX/gx;
That's much easier to read (relatively speaking).

Like a regular expression match, we can use an alternate delimiter for the left and right sides of the substitution:

$_ = "hello";
s%ell%ipp%; # $_ is now "hippo"
The rules are a bit complicated, but it works precisely the way Larry Wall wanted it to work. If the delimiter chosen is not one of the special characters that begins a pair, then we use the character twice more to both separate the pattern from the replacement and to terminate the replacement, as the example above showed.

However, if we use the beginning character of a paired character set (parentheses, curly braces, square brackets, or even less-than and greater-than), we close off the pattern with the corresponding closing character. Then, we get to pick another delimiter all over again, using the same rules. For example, these all do the same thing:

s/ell/ipp/;
s%ell%ipp%;
s;ell;ipp;; # don't do this!
s#ell#ipp#; # one of my favorites
s[ell]#ipp#; [] for pattern, # for replacement
s[ell][ipp]; [] for both pattern and replacement
s<ell><ipp>; <> for both pattern and replacement
s{ell}(ipp); {} for pattern, () for replacement
No matter what the closing delimiter might be for either the pattern or the replacement, we can include the character literally by preceding it with a backslash:
$_ = "hello";
s/ell/i\/n/; # $_ is now "hi/no";
s/\/no/res/; # $_ is now "hires";
To avoid backslashing, pick a distinct delimiter:
$_ = "hello";
s%ell%i/n%; # $_ is now "hi/no";
s%/no%res%; # $_ is now "hires";
Conveniently, if a paired character is used, the pairs may be nested without invoking any backslashes:
$_ = "aaa,bbb,ccc,ddd,eee,fff,ggg";
s((^|(?<=,)).*?((?=,)|$))(XXX)g; # replace all fields with XXX
Note that even though the pattern contains closing parentheses, they are all paired with opening parentheses, so the pattern ends at the right place.

The right side of the substitution operation is generally treated as if it were a double-quoted string: variable interpolation and backslash interpretation is performed directly:

$replacement = "ipp";
$_ = "hello";
s/ell/$replacement/; # $_ is now "hippo"
The left side of a substitution is also treated as if it were a double-quoted string (with a few exceptions), and this interpolation happens before the result is evaluated as a regular expression:
$pattern = "ell";
$replacement = "ipp";
$_ = "hello";
s/$pattern/$replacement/; # $_ is now "hippo"
Using this form of pattern, Perl is forced to compile the regular expression at runtime. If this happens in a loop, Perl may need to recompile the regular expression repeatedly, causing a slowdown. We can give Perl a hint that the pattern is really a regular expression by using a regular expression literal:
$pattern = qr/ell/;
$replacement = "ipp";
$_ = "hello";
s/$pattern/$replacement/; # $_ is now "hippo"
The qr operation creates a Regexp object, which interpolates into the pattern with minimal fuss and maximal speed.

[Feb 23, 2007]  BigAdmin Submitted Tech Tip How to Send Email Without Using sendmail by Ross Moffatt

A useful topic, especially about attachment sending. 

If you need to send emails from a host but don't want to run sendmail, this tech tip explains how to use Perl to send emails. This procedure can be used on a host such as a Sun Fire V120 server running the Solaris 9 OS.

[Feb 20, 2007] Dakshina`s Blog Weblog

CGI/Perl script for uploading files

Here's a small perl script that I have used for uploading files to a webserver. The location can be changed .Rt now it saves the files to /tmp/upload1

#!/usr/bin/perl
use CGI ;
my $query = new CGI;
print $query->header ( );
# Expects the client to sends the name of the file to be uploaded in an input field "file"
 


my $filename=$query->param("file");
my $fpath1="/tmp/upload1/$filename";

open (UPLOADFILE,">$fpath1") || die "Cannot open file";

$filename =~ s/.*[\/\\](.*)/$1/;
my $upload_filehandle = $query->upload("file");

my $buf;
while (read($upload_filehandle,$buf,1024)) {
   print UPLOADFILE $buf;
}

close UPLOADFILE;
 

#This has been tested on Solaris only

# Can be used to transfer binary files also

#For WINDOWS the BINMODE option may be needed  

Manning Data Munging with Perl

The table of contents, two sample chapters, and the index from Data Munging with Perl are available in PDF format. You need Adobe's free Acrobat Reader software to view it. You may download Acrobat Reader here.

Download the Table of Contents

Download Chapter 2

Download Chapter 3

Download the Index

... ... ...

Source code from Data Munging with Perl is contained in either a single ZIP file, or a Unix gzipped and tarred file archive.

Free unzip programs for most platforms are available at Info-Zip.

Download the source code:

       cross_src.zip (44 Kb)

or

       cross_src.tar.gz (19 Kb)

How to write slow algorithms quickly in Perl (Playing Chomp)

Playing Chomp
by Gábor Szabó


Abstract
Though some of us might think so, chomp is not only a Perl function. It is also the name of a NIM-like Combinatorial Game that was unsolved until recently. It has a solution and implementation in Maple and I am writing an implementation in Perl for educational and research purposes.

Introduction
When I went to high-school in the early 1980s in Budapest, Hungary, I used to play a game with a class mate that we called eating chocolate. We actually did not really play it as we knew that there was a winning strategy for the player that moved first but we tried to find a mathematical description for that winning strategy. For that I wrote several programs that would compute the winning positions but we did not have any results.

A few years later I bought a book called "Dienes Professzor Játékai" [DIENES] in Hungarian translation but actually I have looked only at a couple of pages in the book until recently.

Then about a year ago I decided it is time to learn how to create and upload a module to CPAN and as the explanation regarding how to get accepted in PAUSE was rather discouraging I decided I try to play safe and start with a module that probably no one else wants to develop but which can be nice to have on CPAN: Games::NIM. I planned to develop the module to play the game and to calculate the winning positions for NIM and later to extend to Chocolate. To my surprise I got the access and uploaded version 0.01 in December 2001 and then it got stuck at that version.

Now when I thought about attending YAPC::Europe::2002 I decided to renew the work around Games::NIM and proposed a talk about complexity in algorithms in connection to that module and another module called Array::Unique.
When the proposal got accepted I suddenly discovered that I have not much to say about the subject and have to work really hard in order to give you something worthwhile. So I started to work on Games::NIM again and read the book of Dienes [DIENES] about games and another very useful one called "Mastering Algorithms with Perl" [ALGORITHMS]. I suddenly discovered that the game I knew as chocolate eating game is actually known as Chomp and it is still basically unsolved. It all sounded very encouraging.

[Apr 10, 2006] log4perl - log4j for Perl

Welcome to the log4perl project page. Log::Log4perl is a Perl port of the widely popular log4j logging package.

Logging beats a debugger if you want to know what's going on in your code during runtime. However, traditional logging packages are too static and generate a flood of log messages in your log files that won't help you.

Log::Log4perl is different. It allows you to control the amount of logging messages generated very effectively. You can bump up the logging level of certain components in your software, using powerful inheritance techniques. You can redirect the additional logging messages to an entirely different output (append to a file, send by email etc.) -- and everything without modifying a single line of source code.

Further reading

[Mar 25, 2006] Beginning Perl  now available in eBook from Perl.com. this is a very good intro book!.

[Mar 24, 2006]  freshmeat.net Project details for Perl-Linux

This is a great idea that might change the way UNIX is perceived (C-written somewhat archaic system with non-uniform set of obscure command line utilities) and used.

Perl/Linux is a Linux distribution where all programs are written in Perl. The only compiled code in this Perl/Linux system is the Linux Kernel (not currently built with this project), Perl, and uClibc.

[Mar 24, 2006]  freshmeat.net Project details for Ryan's In-Out Board

About: Ryan's In/Out Board (formerly known as Whosin) is a simple and quick Perl-driven Web-based in/out board for use on intranets and extranets. Users can change their status by clicking their name or calling the script with a name parameter, allowing for desktop shortcuts which give single click "check-in/out" links. Custom and/or default comments can be added to their status. No database system is required, you just need a Web server and Perl. A script to check all staff out is also provided, which is handy if called as an overnight cron job. It uses the Date::EzDate Perl module.

Changes: A few people were having problems with data files not being written to. This version will print read/write errors to the browser if it encounters them. It does not fix any read/write issues similar to the ones people were experiencing, because there's nothing to fix as such. Those errors were related to filesystem permissions and thus beyond the realm of the script.

[Mar 24, 2006] freshmeat.net Project details for otl

About: otl is intended to convert a text file to a HTML or XHTML file. It is different than many other text-to-HTML programs in that the input format (by default a simple highly readable plain text format) can be customized by the user, and the output format (by default XHTML) can be user-defined. It can process complex structures such as ordered and unordered lists (nested or not), and add custom "headers" and "footers" to documents. The conversion utilizes Perl regex, adding quite a bit of flexibility and power to the conversion process. Since both the syntax of the source file and of the output can be readily customized, otl in theory can be used for many types of conversions. The package also includes tag-remove, a script for stripping HTML/XHTML-ish tags from documents.

Changes: The "chempretty" script has been removed and replaced with a more general script, "otlsub". With otlsub, you can perform a set of search/replace operations on a set of files using a Perl regex for matching. otlsub supports recursion, allowing you to descend through a directory tree and process all files matching a filename pattern. otlsub automatically adjusts references to local files in hyperlinks depending on directory depth. New otl features include a --descend option (recursive descent through all subdirectories) and various other minor modifications.

[Feb 28, 2006]  Visual Python (Python), and Visual Perl (Perl) integrate with Visual Studio 2005

[Feb 14, 2006] Logic Programming with Perl and Prolog

Perl isn't the last, best programming language you'll ever use for every task. (Perl itself is a C program, you know.) Sometimes other languages do things better. Take logic programming--Prolog handles relationships and rules amazingly well, if you take the time to learn it. Robert Pratte shows how to take advantage of this from Perl. [Perl.com]

[Feb 14, 2006] Analyzing HTML with Perl

Kendrew Lau taught HTML development to business students. Grading web pages by hand was tedious--but Perl came to the rescue. Here's how Perl and HTML parsing modules helped make teaching fun again. [Perl.com]

Acky.net Tutorials Perl    

Section 2 - Flat Files:
 
Programmers often use flat files when storing small amounts of data. Take for example storing something such as small caching information. For example for one project I was working on, I needed to store IP numbers, the unique IP address of the visitor, and the time the entry occurred. I used flat files for this task because it was not very data intensive, and the information was cleared every 15 minutes.
When doing something like this, you can take 2 different approaches. You can create a file for each visitor (what I had done, as I needed to store extra information), something that I like to call flat-files, or you can have the same file for all entries.
When creating many different files you will need to be able to ensure that you can have a unique filename for each file, otherwise files will start to overlap after some time. You can use the Digest::SHA1 modules to generate a 160 bit signature from random data (only in incredibly rare cases will the signature to be the same), however there are number of different ways to do this. Once you generate the unique name you can start to create the flat file.
 
# Open file for write only or die.
open(FH, "> $unique_filename") or die("Error: $!");
# Lock the file.
flock(FH, 2);
# Save the remote ip address, a null, and then the time.
print FH $ENV{REMOTE_ADDR}, "\0", time;
# Close the file and release lock or die.
close(FH) or die("Error: $!");

 

Now this takes care of saving the data in flat-files. Retrieving data from a simple structure like this is very simple.
# We open the file for reading only or die.
open(FH, "$unique_filename") or die("Error: $!");
# Read the first line from open file.
$line = <FH>;
# Close the file or die.
close(FH) or die("Error: $!");
# Separate the data using split.
($remote_addr, $create_time) = split(/\0/, $line);

 

In this example, the $ENV{REMOTE_ADDR} and the time since epoch is saved in the $unique_filename file. Be careful to watch for security risks when using a variable in an open (for more information read perlsec man page or view it online at http://www.perl.com/pub/doc/manual/html/pod/perlsec.html). Using the same fundamental ideas you can create much more complex data structures within flat-files.
As I mentioned earlier, the other way of using flat files is to create one larger file for all entries. Retrieving data from this kind of flat file database can be slower as data increases, so only use this if it presents something beneficial to your programs. You've been warned! The basic ideas for using this type of flat file database is virtually the same as for flat-files.
Rather than opening the file for writing as we did in the flat-files example, we have to open the file for appending, because overwriting data will not help us in this example. We must also separate each entry by a delimiter, I will use the newline character, and we no longer need to use $unique_filename in open because the filename will be static.
# Open file for append or die.
open(FH, ">> ./cache.db") or die("Error: $!");
# Lock the file.
flock(FH, 2);
# Save the unique id, a null, remote ip address, a null, and then the time since epoch.
print FH $unique_id, "\0", $ENV{REMOTE_ADDR}, "\0", time, "\n";
# Close the file and release lock or die.
close(FH) or die("Error: $!");


 

For retrieving data from the file we still needed the $unique_filename because in order for the program to be able to pick out a certain entry it needs something to search for, you could use the remote ip address, or the time, but I personally prefer a unique id for each visitor (that I save as a cookie, and retrieve anytime a script is run by the user).
Once you know what the unique id is that you want to retrieve from the flat file database, you can do the following.
# Open the file for read only.
open(FH, " ./cache.db") or die("Error: $!");
# Loop through each entry in the flat file and look for the one we need.
while ($line = <FH>) {
# Remove the newline character at the end of the line
chomp($line);
# Separate the data on line using split.
($unique_id, $remote_addr, $create_time) = split(/\0/, $line);
# Check if the unique id that we saved earlier matches the one
# that we are looking for this time, where $our_id is the id that
# we are looking for. If the two ids match, we break out of the loop.
if ($unique_id eq $our_id) {
$found = 1;
last;
}
}
# Close the file or die.
close(FH) or die("Error: $!");
unless ($found) {
die("Error: Could not find entry $our_id in the flat file database.");
}
In this example the $unique_id, $remote_addr, and $create_time will be retrieved from the cache.db file if they match the $our_id variable, otherwise it will die. You can adapt this for your own programs with minimal effort. Let me be mention this again, this can be very inefficient when dealing with large amounts of data, as the program must loop through every line until the entry is found. Another deficiency in this small example is the program will only retrieve the first entry in the cache.db file and exit, this is what most people would want, but if you want to retrieve all entries, or the most recent one, a little more work will be required. (There are different ways of sorting, and matching data which can speed this process up significantly.)
I will mention some other ways of storing data in flat files as well as other storing data methods, in the following pages.

TeachMePerl.Com Interview with Tim Maher of Consultix

I'll be happy to tell you, but first let me put a few things in historical perspective.

Way back in 1976, as a graduate student at the University of Toronto, I was using C, grep, sed, expr (yuck!) and the Mashey shell (the Bourne shell's predecessor) on UNIX to simulate neurophysiological experiments on a virtual cat (in Prof. Ron Baecker's Interactive Computer Graphics class).

I became pretty adept with all these tools, but I had some reservations about UNIX's ''tinkertoy'' approach to utility programs, which struck me as an example of a fundamentally good idea taken to an undesirable extreme.

As a case in point, in the Bourne shell you have to use the external expr command to do simple arithmetic. The variable-incrementing idiom was (and still is):

value=`expr $value + $inc_val`

Just imagine how efficient that approach is, at the cost of one extra (synchronous) process per calculation, when you have to total a series of numbers. It's pathetic!

So when AWK came out in 1977, I was intrigued by its potential for improving the state of UNIX programming, with features such as:

  1. Program simplification through an implicit input-reading loop,
  2. Automatic parsing of input into fields (forever ending sed's monopoly on a manual approach, based on cumbersome "\(.*\)" -based techniques),
  3. The Pattern/Action model of programming, that links pattern-matches to code blocks, and
  4. Built-in support for basic mathematical functions, including floating point calculations.

I rapidly became a dedicated AWKaholic, promoting its use wherever I went. And if there had been a Nobel Prize for Artificial Languages, I would have nominated Aho, Weinberger, and Kernighan for it!

The AWK approach is just so good that I'm convinced modern programmers would currently be using languages with names like Turbo-AWK, AWK++, Visual AWK, Objective AWK, and perhaps even JAWKA, PythAWK, and AWK#, if not for an egregious travesty of high-tech justice.

Which is simply that this ingenious 1977 language was not properly documented until 1988, when Prentice-Hall's AWK Programming Language book came out. What a tragedy! But on the other hand, perhaps Larry Wall would have missed his chance with Perl if things had been otherwise. I guess that's the silver lining.

But getting back to my story, I wasn't really affected by the AWK documentational snafu. That's because I got the chance to make a career change from a university ''CS Professorship'' to a ''UNIX Course Developer and Instructor'' position with Western Electric (the branch of the Bell System that owned UNIX). They hired me in 1982 to develop and teach classes on UNIX topics, providing me access to internal documentation and bona fide UNIX ''Subject Matter Experts''. So I rapidly became an accomplished AWK programmer, and developed lots of nifty examples of its use for the training materials I created.

One especially useful program I wrote was a shell syntax checker and beautifier. I wrote this out of necessity, after a huge shell script stopped working due to a misplaced single quote that I just couldn't find. It saved many programming projects for me over the years, and then sadly, it was lost forever in a disk crash.

You have a very interesting background, Tim. But where does Perl fit into all of this?

Believe it or not, I was getting to that. I began dabbling with Perl in the early 1990s, but frankly had a hard time feeling comfortable with some of its more unconventional features.

I objected to what I saw as superfluous deviations from UNIX standards (like tagging all scalars with $), an overabundance of syntactically equivalent ways of writing the same thing (e.g., forwards vs. backwards loops and conditionals), and the unnecessary inclusion of radical new concepts (esp. LIST vs. SCALAR contexts).

For me, learning Perl was like watching a movie where I found the initial developments sufficiently disjointed and deranged that I had serious doubts that the writer would ever be able to make sense of it all for me, and ultimately reward me for my attention.

The bottom line is I just wasn't confident that Larry's programming mentality was compatible with mine, and without that faith, I wasn't willing to make the considerable effort to learn a new, and rather peculiar, programming language.

Moreover, as a C, Shell, and AWK guy since the mid-70s, I figured I could do everything I needed with those tools already -- given a sufficient number of User Processes and Development Time!  So I didn't really feel the need for a One Language Does Everything solution.

But by 1997 Perl usage was growing by leaps and bounds, and many were waxing poetic about what a joy it was to write in a language that freed them from the micro-management of minutiae and just ''did the right thing'' most of the time.

And, on top of that, Perl offered the capability of doing UNIX-style network programming, which was rapidly escalating in importance, without resorting to the travails of C.

So suddenly, I came to see Perl as my dream language. It was like AWK with sockets! What more could one ask for?

You received a White Camel for developing and starting SPUG, the Seattle Perl Users Group. What were your reasons for creating this users group?

When I finally decided to get serious about learning Perl, I realized that what I needed most was to improve my capacity for PerlThink. (That's Larry's term for using Perl's features judiciously, and then getting out of the way so it can do its magic.)

I figured the best way to achieve this goal was to hang out with people who were already PerlThinking, so in late 1997 I started looking for a Perl SIG in Seattle. But I quickly learned there wasn't a group, just a web page dedicated to the proposition that there should be a group, and it had been sitting there for a long time, collecting comments from would-be members!

Many months later, while cooking breakfast in an escaping steam vent atop a smoke-spewing volcano in Indonesia (no kidding!), I gave this situation some more thought, and decided that, if necessary, I'd step forward to start the group myself.

Hmm ... how can I convey to you just how excited I was about taking on this role? I'm reminded of a play by Woody Allen in which a distraught woman makes a moving soliloquy about her desperate need for intimate contact with a Man. Just when she's on the verge of descending into a deep depression, an actor planted in the audience shouts out:

     I'll sleep with that girl, if nobody else will!
That's exactly how excited I was about starting SPUG

I had never created an organization before, so I found that proposition itself rather daunting. And on top of that I was concerned that such unpleasant activities as begging, pleading, imploring, beseeching, and ultimately arm-twisting would be required of me to sign up prospective speakers -- and, unfortunately, I was right!

(I later learned they'd invariably thank me afterwards for pressuring them into giving talks, once they realized how much the exercise helped solidify their knowledge, and how much fun they had sharing it.)

O'Reilly Is Perl Still Relevant

Subject: Is Perl relevant any longer?

With the emergence of .NET, J2EE, Python, PHP, et. al, has Perl lost its niche as a scripting glue language? The buzz is all around PHP these days and also around Python. The complaints about Perl 6's complexity are only getting louder. Besides, Perl does not occupy the central position in O'Reilly's offerings that it once did.

Is Perl on its way out?

Jag


Hi Jag,

While I agree that the long wait for Perl 6 has harmed Perl, and many Perl programmers do in fact find what they've seen to be unnecessarily complex (one well-known Perl programmer of my acquaintance referred to it as "performance art"), I've learned never to count Perl out. There was a similar slowdown in Perl in the mid-90s, and it saw a huge resurgence as "the duct tape of the internet." Perl is so useful that there may yet come another new market for which it is uniquely suited. It's a powerful, adaptable language, and the folks creating Perl 6 have a history of "seeing around corners" and developing features that turn out to be just right for some emerging market. So when Perl 6 comes out, we certainly won't be on the publishing sidelines. We'd love to be in the position to do some substantial updates to our bestselling Perl books!

That being said, there has always been an element of snobbery in the Perl market--I remember trying to persuade the authors of the second edition of Programming Perl, back in 1996, to pay more attention to the web. I was told that web programming was "trivial" and didn't require any special treatment. Of course, languages like PHP, which considered the web to be central, eventually came to occupy that niche. If book sales are any indicator, PHP is twice as popular as Perl.

I've always believed that one of the most important things about scripting languages is that they (potentially) make a new class of applications more accessible to people who didn't previously think of themselves as programmers. Languages then grow up, get computer-science envy, and forget their working-class roots.

In terms of the competitive landscape among programming languages, in addition to PHP, Python has long been gaining on Perl. From about 1/6 the size of the Perl market when I first began tracking it, it's now about 2/3 the size of the Perl book market. The other scripting language (in addition to Perl, Python, and PHP) that we're paying a lot more attention to these days is Ruby. The Ruby On Rails framework is taking the world by storm, and has gone one up on PHP in terms of making database backed application programming a piece of cake.

And while JavaScript is not generally thought of as an alternative to these fuller-featured languages, the conjunction of JavaScript and XML that has so meme-felicitously been named AJAX is driving a new surge of interest. The JavaScript book market is now slightly larger than the Perl book market--quite a bit larger if you consider JavaScript variants such as Macromedia's ActionScript.

I recently wrote about the relative market share of programming languages in my O'Reilly Radar blog. The posting focuses on the rise of open source Java books, but includes a graph showing the relative share of all programming language books, in terms of sell-through data from Neilsen BookScan. (See also this blog entry for a description of BookScan and our technology trend tracking tools.)

Tim O'Reilly

Continued...

 

Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     

Internal Links

Search engines:

Download

Top sites:

Reference: see below

See also

Top community sites:

Products, projects and resources:

People:

Press and Perl related sites


Reference

Reference cards

 

Annotated man pages

Perl Man Pages on the WEB

Perl Reference Guide & Perl 5 Desktop Guide   see also mirror Perl5 Reference Guide

Perl Recipes

Perl modules file


  • Etc

    perl.com Critique of the Perl 6 RFC Process [Oct. 31, 2000]

    Conferences

    Second Perl conference [added November 4, 1998]



    Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    Standard disclaimer:

    Last updated: July 03, 2009