Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Perl Tips/Snippets

News Recommended Links Perl Style Perl Programming Environment Perl as a command line utility tool Perl Debugging
Perl Xref Perl One Liners Perl Options Perl options processing Pipes in Perl Perl POD documentation

Using VIM with Perl

AWK one liners Shell Tips and Tricks VIM Tips Humor Etc

 

Here are some of the most useful Perl tips and snippets that I collected:

  1. If you use OFMs it is easy to save stokes in checking Perl scripts. Configure extension pl  or usemenu item p  to invoke perl -cw !.! (!.! is FAR idiom, other OFMs like Midnight Commander use different macros).

     

  2. Special variable $^O contains the name of your operating system in the format provided by uname.

    No need for something like:

    $OS=`uname`; chomp $OS;

    Some additional scalars that Perl defines for you:

  3. Create a log file and write important messages to the log file
     

  4. In more or less complex script control printing of debugging information using some variable (for example $debug ). Design and maintain your own system of diagnostic output from various subroutines of the program

    For more or less complex program diagnostic output using special print statements is the most efficient debugging method. It should be controlled by special variable, for example $debug, which can be integer or bit value. For example:

    ($debug) && print "text=$test";
    You can also use binary numbers and & operator which permits you operating with small sets of debug flags one for each section of a program The following code snippet demonstrates this:
    # A $debug eight bits (one byte)
    $debug=0b10110000;
    
    if ( $debug & 0b10000000) {
       print "Some dignistic output\n";
    } elsif ( $debug & 0b0100000) {
       print "Other (possiblly more detailed diagnistic output)
    }
  5. Initializing list of words use qw
    @mylist=qw(one, two, three, four);
  6. You can check Perl syntax in VIM on each save
    au BufWritePost *.pl,*.pm !perl -c %

    Every time you save a .pl or .pm file, it executes perl -c and shows you the output.

    ~~
    naChoZ

  7. Dynamic activation of the debugger (from "Perl debugged" book):
    while (<INPUT>) {
       $DB::trace = 1, next if /debug/;
       $DB::trace = 0, next if /nodebug/;
       # more code
    }

    When run under the debugger, this enables tracing when the loop encounters an input line containing "debug" and ceases tracing upon reading one containing "nodebug".

    You can switch to interactive debugging  by using:

    $DB::single = 1

    instead. That also provide a way you can debug code in BEGIN blocks (which otherwise are executed before control is given to the debugger).

  8. Sometimes it makes sense to use regular expressions instead of substr. One such task is extraction of component of date, for example:
    $cur_date='20060325';
    (year, $month, $day)=$cur_date=~/(\d{4})(\d\d)(\d\d)/;
  9. Getting Perl cross-reference reports. The B::Xref module can be used to generate cross-reference reports for
    Perl programs.
    perl -MO=Xref[,OPTIONS] scriptname.plx
  10. Setting a value of parameter to default value:
    # --- process the second parameter
    $msglevel=($ARGV[1]) ? $ARGV[1] : $msglevel; # defaults is the three digit constant(see below)
    ($msglevel1, $msglevel2, $testing) = split(//,$msglevel); # get one byte flags
  11. Creating timestamp
    # Timestamp
    #
    ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localtime(time);
    $year+=1900;
    $mon++;
    for ($mon, $mday, $hour, $min, $sec) {
       if (length($_)==1) {
           $_="0$_";
       }
    } 
  12. Move via link/unlink (should be the same filesystem):
    link($_[0], $target);
    if (-e $target) {
       unlink($_[0]);
    } else {
       logger("SFail to move the file '$_[0]' to '$home/$_[1]/$target' \n");
       return;
    }
  13. Removing duplicates: here the second part will be executed only if $new{$match} is still undefined:
    if ( $new{$match}++ || !( $tags{$match} = sprintf( "%s\t%s\t?^%s\$?\n", $match, $ARGV, $_ ) ) )

Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Oct 19, 2017] How can I translate a shell script to Perl

Oct 19, 2017 | stackoverflow.com

down vote accepted

> ,

I'll answer seriously. I do not know of any program to translate a shell script into Perl, and I doubt any interpreter module would provide the performance benefits. So I'll give an outline of how I would go about it.

Now, you want to reuse your code as much as possible. In that case, I suggest selecting pieces of that code, write a Perl version of that, and then call the Perl script from the main script. That will enable you to do the conversion in small steps, assert that the converted part is working, and improve gradually your Perl knowledge.

As you can call outside programs from a Perl script, you can even replace some bigger logic with Perl, and call smaller shell scripts (or other commands) from Perl to do something you don't feel comfortable yet to convert. So you'll have a shell script calling a perl script calling another shell script. And, in fact, I did exactly that with my own very first Perl script.

Of course, it's important to select well what to convert. I'll explain, below, how many patterns common in shell scripts are written in Perl, so that you can identify them inside your script, and create replacements by as much cut&paste as possible.

First, both Perl scripts and Shell scripts are code+functions. Ie, anything which is not a function declaration is executed in the order it is encountered. You don't need to declare functions before use, though. That means the general layout of the script can be preserved, though the ability to keep things in memory (like a whole file, or a processed form of it) makes it possible to simplify tasks.

A Perl script, in Unix, starts with something like this:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
#other libraries

(rest of the code)

The first line, obviously, points to the commands to be used to run the script, just like normal shells do. The following two "use" lines make then language more strict, which should decrease the amount of bugs you encounter because you don't know the language well (or plain did something wrong). The third use line imports the "Dumper" function of the "Data" module. It's useful for debugging purposes. If you want to know the value of an array or hash table, just print Dumper(whatever).

Note also that comments are just like shell's, lines starting with "#".

Now, you call external programs and pipe to or pipe from them. For example:

open THIS, "cat $ARGV[0] |";

That will run cat, passing " $ARGV[0] ", which would be $1 on shell -- the first argument passed to it. The result of that will be piped into your Perl script through "THIS", which you can use to read that from it, as I'll show later.

You can use "|" at the beginning or end of line, to indicate the mode "pipe to" or "pipe from", and specify a command to be run, and you can also use ">" or ">>" at the beginning, to open a file for writing with or without truncation, "<" to explicitly indicate opening a file for reading (the default), or "+<" and "+>" for read and write. Notice that the later will truncate the file first.

Another syntax for "open", which will avoid problems with files with such characters in their names, is having the opening mode as a second argument:

open THIS, "-|", "cat $ARGV[0]";

This will do the same thing. The mode "-|" stands for "pipe from" and "|-" stands for "pipe to". The rest of the modes can be used as they were ( >, >>, <, +>, +< ). While there is more than this to open, it should suffice for most things.

But you should avoid calling external programs as much as possible. You could open the file directly, by doing open THIS, "$ARGV[0]"; , for example, and have much better performance.

So, what external programs you could cut out? Well, almost everything. But let's stay with the basics: cat, grep, cut, head, tail, uniq, wc, sort.

CAT

Well, there isn't much to be said about this one. Just remember that, if possible, read the file only once and keep it in memory. If the file is huge you won't do that, of course, but there are almost always ways to avoid reading a file more than once.

Anyway, the basic syntax for cat would be:

my $filename = "whatever";
open FILE, "$filename" or die "Could not open $filename!\n";
while(<FILE>) {
  print $_;
}
close FILE;

This opens a file, and prints all it's contents (" while(<FILE>) " will loop until EOF, assigning each line to " $_ "), and close it again.

If I wanted to direct the output to another file, I could do this:

my $filename = "whatever";
my $anotherfile = "another";
open (FILE, "$filename") || die "Could not open $filename!\n";
open OUT, ">", "$anotherfile" or die "Could not open $anotherfile for writing!\n";
while(<FILE>) {
  print OUT $_;
}
close FILE;

This will print the line to the file indicated by " OUT ". You can use STDIN , STDOUT and STDERR in the appropriate places as well, without having to open them first. In fact, " print " defaults to STDOUT , and " die " defaults to " STDERR ".

Notice also the " or die ... " and " || die ... ". The operators or and || means it will only execute the following command if the first returns false (which means empty string, null reference, 0, and the like). The die command stops the script with an error message.

The main difference between " or " and " || " is priority. If " or " was replaced by " || " in the examples above, it would not work as expected, because the line would be interpreted as:

open FILE, ("$filename" || die "Could not open $filename!\n");

Which is not at all what is expected. As " or " has a lower priority, it works. In the line where " || " is used, the parameters to open are passed between parenthesis, making it possible to use " || ".

Alas, there is something which is pretty much what cat does:

while(<>) {
  print $_;
}

That will print all files in the command line, or anything passed through STDIN.

GREP

So, how would our "grep" script work? I'll assume "grep -E", because that's easier in Perl than simple grep. Anyway:

my $pattern = $ARGV[0];
shift @ARGV;
while(<>) {
        print $_ if /$pattern/o;
}

The "o" passed to $patttern instructs Perl to compile that pattern only once, thus gaining you speed. Not the style "something if cond". It means it will only execute "something" if the condition is true. Finally, " /$pattern/ ", alone, is the same as " $_ =~ m/$pattern/ ", which means compare $_ with the regex pattern indicated. If you want standard grep behavior, ie, just substring matching, you could write:

print $_ if $_ =~ "$pattern";

CUT

Usually, you do better using regex groups to get the exact string than cut. What you would do with "sed", for instance. Anyway, here are two ways of reproducing cut:

while(<>) {
  my @array = split ",";
  print $array[3], "\n";
}

That will get you the fourth column of every line, using "," as separator. Note @array and $array[3] . The @ sigil means "array" should be treated as an, well, array. It will receive an array composed of each column in the currently processed line. Next, the $ sigil means array[3] is a scalar value. It will return the column you are asking for.

This is not a good implementation, though, as "split" will scan the whole string. I once reduced a process from 30 minutes to 2 seconds just by not using split -- the lines where rather large, though. Anyway, the following has a superior performance if the lines are expected to be big, and the columns you want are low:

while(<>) {
  my ($column) = /^(?:[^,]*,){3}([^,]*),/;
  print $column, "\n";
}

This leverages regular expressions to get the desired information, and only that.

If you want positional columns, you can use:

while(<>) {
  print substr($_, 5, 10), "\n";
}

Which will print 10 characters starting from the sixth (again, 0 means the first character).

HEAD

This one is pretty simple:

my $printlines = abs(shift);
my $lines = 0;
my $current;
while(<>) {
  if($ARGV ne $current) {
    $lines = 0;
    $current = $ARGV;
  }
  print "$_" if $lines < $printlines;
  $lines++;
}

Things to note here. I use "ne" to compare strings. Now, $ARGV will always point to the current file, being read, so I keep track of them to restart my counting once I'm reading a new file. Also note the more traditional syntax for "if", right along with the post-fixed one.

I also use a simplified syntax to get the number of lines to be printed. When you use "shift" by itself it will assume "shift @ARGV". Also, note that shift, besides modifying @ARGV, will return the element that was shifted out of it.

As with a shell, there is no distinction between a number and a string -- you just use it. Even things like "2"+"2" will work. In fact, Perl is even more lenient, cheerfully treating anything non-number as a 0, so you might want to be careful there.

This script is very inefficient, though, as it reads ALL file, not only the required lines. Let's improve it, and see a couple of important keywords in the process:

my $printlines = abs(shift);
my @files;
if(scalar(@ARGV) == 0) {
  @files = ("-");
} else {
  @files = @ARGV;
}
for my $file (@files) {
  next unless -f $file && -r $file;
  open FILE, "<", $file or next;
  my $lines = 0;

  while(<FILE>) {
    last if $lines == $printlines;
    print "$_";
    $lines++;
  }

  close FILE;
}

The keywords "next" and "last" are very useful. First, "next" will tell Perl to go back to the loop condition, getting the next element if applicable. Here we use it to skip a file unless it is truly a file (not a directory) and readable. It will also skip if we couldn't open the file even then.

Then "last" is used to immediately jump out of a loop. We use it to stop reading the file once we have reached the required number of lines. It's true we read one line too many, but having "last" in that position shows clearly that the lines after it won't be executed.

There is also "redo", which will go back to the beginning of the loop, but without reevaluating the condition nor getting the next element.

TAIL

I'll do a little trick here.

my $skiplines = abs(shift);
my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
  shift @lines if $#lines == $skiplines;
}
print @lines;

Ok, I'm combining "push", which appends a value to an array, with "shift", which takes something from the beginning of an array. If you want a stack, you can use push/pop or shift/unshift. Mix them, and you have a queue. I keep my queue with at most 10 elements with $#lines which will give me the index of the last element in the array. You could also get the number of elements in @lines with scalar(@lines) .

UNIQ

Now, uniq only eliminates repeated consecutive lines, which should be easy with what you have seen so far. So I'll eliminate all of them:

my $current = "";
my %lines;
while(<>) {
  if($ARGV ne $current) {
    undef %lines;
    $current = $ARGV;
  }
  print $_ unless defined($lines{$_});
  $lines{$_} = "";
}

Now here I'm keeping the whole file in memory, inside %lines . The use of the % sigil indicates this is a hash table. I'm using the lines as keys, and storing nothing as value -- as I have no interest in the values. I check where the key exist with "defined($lines{$_})", which will test if the value associated with that key is defined or not; the keyword "unless" works just like "if", but with the opposite effect, so it only prints a line if the line is NOT defined.

Note, too, the syntax $lines{$_} = "" as a way to store something in a hash table. Note the use of {} for hash table, as opposed to [] for arrays.

WC

This will actually use a lot of stuff we have seen:

my $current;
my %lines;
my %words;
my %chars;
while(<>) {
  $lines{"$ARGV"}++;
  $chars{"$ARGV"} += length($_);
  $words{"$ARGV"} += scalar(grep {$_ ne ""} split /\s/);
}

for my $file (keys %lines) {
  print "$lines{$file} $words{$file} $chars{$file} $file\n";
}

Three new things. Two are the "+=" operator, which should be obvious, and the "for" expression. Basically, a "for" will assign each element of the array to the variable indicated. The "my" is there to declare the variable, though it's unneeded if declared previously. I could have an @array variable inside those parenthesis. The "keys %lines" expression will return as an array they keys (the filenames) which exist for the hash table "%lines". The rest should be obvious.

The third thing, which I actually added only revising the answer, is the "grep". The format here is:

grep { code } array

It will run "code" for each element of the array, passing the element as "$_". Then grep will return all elements for which the code evaluates to "true" (not 0, not "", etc). This avoids counting empty strings resulting from consecutive spaces.

Similar to "grep" there is "map", which I won't demonstrate here. Instead of filtering, it will return an array formed by the results of "code" for each element.

SORT

Finally, sort. This one is easy too:

my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print sort @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
}
print sort @lines;

Here, "sort" will sort the array. Note that sort can receive a function to define the sorting criteria. For instance, if I wanted to sort numbers I could do this:

my @lines;
my $current = "";
while(<>) {
  if($ARGV ne $current) {
    print sort @lines;
    undef @lines;
    $current = $ARGV;
  }
  push @lines, $_;
}
print sort {$a <=> $b} @lines;

Here " $a " and " $b " receive the elements to be compared. " <=> " returns -1, 0 or 1 depending on whether the number is less than, equal to or greater than the other. For strings, "cmp" does the same thing.

HANDLING FILES, DIRECTORIES & OTHER STUFF

As for the rest, basic mathematical expressions should be easy to understand. You can test certain conditions about files this way:

for my $file (@ARGV) {
  print "$file is a file\n" if -f "$file";
  print "$file is a directory\n" if -d "$file";
  print "I can read $file\n" if -r "$file";
  print "I can write to $file\n" if -w "$file";
}

I'm not trying to be exaustive here, there are many other such tests. I can also do "glob" patterns, like shell's "*" and "?", like this:

for my $file (glob("*")) {
  print $file;
  print "*" if -x "$file" && ! -d "$file";
  print "/" if -d "$file";
  print "\t";
}

If you combined that with "chdir", you can emulate "find" as well:

sub list_dir($$) {
  my ($dir, $prefix) = @_;
  my $newprefix = $prefix;
  if ($prefix eq "") {
    $newprefix = $dir;
  } else {
    $newprefix .= "/$dir";
  }
  chdir $dir;
  for my $file (glob("*")) {
    print "$prefix/" if $prefix ne "";
    print "$dir/$file\n";
    list_dir($file, $newprefix) if -d "$file";
  }
  chdir "..";
}

list_dir(".", "");

Here we see, finally, a function. A function is declared with the syntax:

sub name (params) { code }

Strictly speakings, "(params)" is optional. The declared parameter I used, " ($$) ", means I'm receiving two scalar parameters. I could have " @ " or " % " in there as well. The array " @_ " has all the parameters passed. The line " my ($dir, $prefix) = @_ " is just a simple way of assigning the first two elements of that array to the variables $dir and $prefix .

This function does not return anything (it's a procedure, really), but you can have functions which return values just by adding " return something; " to it, and have it return "something".

The rest of it should be pretty obvious.

MIXING EVERYTHING

Now I'll present a more involved example. I'll show some bad code to explain what's wrong with it, and then show better code.

For this first example, I have two files, the names.txt file, which names and phone numbers, the systems.txt, with systems and the name of the responsible for them. Here they are:

names.txt

John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555

systems.txt

Sales, Jane Doe
Inventory, John Doe
Payment, That Guy

I want, then, to print the first file, with the system appended to the name of the person, if that person is responsible for that system. The first version might look like this:

#!/usr/bin/perl

use strict;
use warnings;

open FILE, "names.txt";

while(<FILE>) {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

sub get_system($) {
  my ($name) = @_;
  my $system = "";

  open FILE, "systems.txt";

  while(<FILE>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close FILE;

  return $system;
}

This code won't work, though. Perl will complain that the function was used too early for the prototype to be checked, but that's just a warning. It will give an error on line 8 (the first while loop), complaining about a readline on a closed filehandle. What happened here is that " FILE " is global, so the function get_system is changing it. Let's rewrite it, fixing both things:

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while(<FILE>) {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

This won't give any error or warnings, nor will it work. It returns just the sysems, but not the names and phone numbers! What happened? Well, what happened is that we are making a reference to " $_ " after calling get_system , but, by reading the file, get_system is overwriting the value of $_ !

To avoid that, we'll make $_ local inside get_system . This will give it a local scope, and the original value will then be restored once returned from get_system :

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";
  local $_;

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while(<FILE>) {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  print $_ . ", $system\n";
}

close FILE;

And that still doesn't work! It prints a newline between the name and the system. Well, Perl reads the line including any newline it might have. There is a neat command which will remove newlines from strings, " chomp ", which we'll use to fix this problem. And since not every name has a system, we might, as well, avoid printing the comma when that happens:

#!/usr/bin/perl

use strict;
use warnings;

sub get_system($) {
  my ($name) = @_;
  my $system = "";
  local $_;

  open my $filehandle, "systems.txt";

  while(<$filehandle>) {
    next unless /$name/o;
    ($system) = /([^,]*)/;
  }

  close $filehandle;

  return $system;
}

open FILE, "names.txt";

while(<FILE>) {
  my ($name) = /^([^,]*),/;
  my $system = get_system($name);
  chomp;
  print $_;
  print ", $system" if $system ne "";
  print "\n";
}

close FILE;

That works, but it also happens to be horribly inefficient. We read the whole systems file for every line in the names file. To avoid that, we'll read all data from systems once, and then use that to process names.

Now, sometimes a file is so big you can't read it into memory. When that happens, you should try to read into memory any other file needed to process it, so that you can do everything in a single pass for each file. Anyway, here is the first optimized version of it:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
  my ($system, $name) = /([^,]*),(.*)/;
  $systems{$name} = $system;
}
close SYSTEMS;

open NAMES, "names.txt";
while(<NAMES>) {
  my ($name) = /^([^,]*),/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

Unfortunately, it doesn't work. No system ever appears! What has happened? Well, let's look into what " %systems " contains, by using Data::Dumper :

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
  my ($system, $name) = /([^,]*),(.*)/;
  $systems{$name} = $system;
}
close SYSTEMS;

print Dumper(%systems);

open NAMES, "names.txt";
while(<NAMES>) {
  my ($name) = /^([^,]*),/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

The output will be something like this:

$VAR1 = ' Jane Doe';
$VAR2 = 'Sales';
$VAR3 = ' That Guy';
$VAR4 = 'Payment';
$VAR5 = ' John Doe';
$VAR6 = 'Inventory';
John Doe, (555) 1234-4321
Jane Doe, (555) 5555-5555
The Boss, (666) 5555-5555

Those $VAR1/$VAR2/etc is how Dumper displays a hash table. The odd numbers are the keys, and the succeeding even numbers are the values. Now we can see that each name in %systems has a preceeding space! Silly regex mistake, let's fix it:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt";
while(<SYSTEMS>) {
  my ($system, $name) = /^\s*([^,]*?)\s*,\s*(.*?)\s*$/;
  $systems{$name} = $system;
}
close SYSTEMS;

open NAMES, "names.txt";
while(<NAMES>) {
  my ($name) = /^\s*([^,]*?)\s*,/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined $systems{$name};
  print "\n";
}
close NAMES;

So, here, we are aggressively removing any spaces from the beginning or end of name and system. There are other ways to form that regex, but that's beside the point. There is still one problem with this script, which you'll have seen if your "names.txt" and/or "systems.txt" files have an empty line at the end. The warnings look like this:

Use of uninitialized value in hash element at ./exemplo3e.pl line 10, <SYSTEMS> line 4.
Use of uninitialized value in hash element at ./exemplo3e.pl line 10, <SYSTEMS> line 4.
John Doe, (555) 1234-4321, Inventory
Jane Doe, (555) 5555-5555, Sales
The Boss, (666) 5555-5555
Use of uninitialized value in hash element at ./exemplo3e.pl line 19, <NAMES> line 4.

What happened here is that nothing went into the " $name " variable when the empty line was processed. There are many ways around that, but I choose the following:

#!/usr/bin/perl

use strict;
use warnings;

our %systems;
open SYSTEMS, "systems.txt" or die "Could not open systems.txt!";
while(<SYSTEMS>) {
  my ($system, $name) = /^\s*([^,]+?)\s*,\s*(.+?)\s*$/;
  $systems{$name} = $system if defined $name;
}
close SYSTEMS;

open NAMES, "names.txt" or die "Could not open names.txt!";
while(<NAMES>) {
  my ($name) = /^\s*([^,]+?)\s*,/;
  chomp;
  print $_;
  print ", $systems{$name}" if defined($name) && defined($systems{$name});
  print "\n";
}
close NAMES;

The regular expressions now require at least one character for name and system, and we test to see if " $name " is defined before we use it.

CONCLUSION

Well, then, these are the basic tools to translate a shell script. You can do MUCH more with Perl, but that was not your question, and it wouldn't fit here anyway.

Just as a basic overview of some important topics,

There is a free, good, hands-on, hard & fast book about Perl called Learning Perl The Hard Way . It's style is similar to this very answer. It might be a good place to go from here.

I hope this helped.

DISCLAIMER

I'm NOT trying to teach Perl, and you will need to have at least some reference material. There are guidelines to good Perl habits, such as using " use strict; " and " use warnings; " at the beginning of the script, to make it less lenient of badly written code, or using STDOUT and STDERR on the print lines, to indicate the correct output pipe.

This is stuff I agree with, but I decided it would detract from the basic goal of showing patterns for common shell script utilities.

[Oct 16, 2017] Indenting Here Documents

Oct 16, 2017 | docstore.mik.ua
1.11. Indenting Here Documents Problem

When using the multiline quoting mechanism called a here document , the text must be flush against the margin, which looks out of place in the code. You would like to indent the here document text in the code, but not have the indentation appear in the final string value. Solution

Use a s///

# all in one
($var = <<HERE_TARGET) =~ s/^\s+//gm;
    your text
    goes here
HERE_TARGET

# or with two steps
$var = <<HERE_TARGET;
    your text
    goes here
HERE_TARGET
$var =~ s/^\s+//gm;
Discussion

The substitution is straightforward. It removes leading whitespace from the text of the here document. The /m modifier lets the ^ character match at the start of each line in the string, and the /g modifier makes the pattern matching engine repeat the substitution as often as it can (i.e., for every line in the here document).

($definition = <<'FINIS') =~ s/^\s+//gm;
    The five varieties of camelids
    are the familiar camel, his friends
    the llama and the alpaca, and the
    rather less well-known guanaco
    and vicuЯa.
FINIS

Be warned: all the patterns in this recipe use \s \s with [^\S\n] in the patterns.

The substitution makes use of the property that the result of an assignment can be used as the left-hand side of =~ . This lets us do it all in one line, but it only works when you're assigning to a variable. When you're using the here document directly, it would be considered a constant value and you wouldn't be able to modify it. In fact, you can't change a here document's value unless you first put it into a variable.

Not to worry, though, because there's an easy way around this, particularly if you're going to do this a lot in the program. Just write a subroutine to do it:

sub fix {
    my $string = shift;
    $string =~ s/^\s+//gm;
    return $string;
}

print fix(<<"END");
    My stuff goes here
END

# With function predeclaration, you can omit the parens:
print fix <<"END";
    My stuff goes here
END

As with all here documents, you have to place this here document's target (the token that marks its end, END in this case) flush against the left-hand margin. If you want to have the target indented also, you'll have to put the same amount of whitespace in the quoted string as you use to indent the token.

($quote = <<'    FINIS') =~ s/^\s+//gm;
        ...we will have peace, when you and all your works have
        perished--and the works of your dark master to whom you would
        deliver us. You are a liar, Saruman, and a corrupter of men's
        hearts.  --Theoden in /usr/src/perl/taint.c
    FINIS
$quote =~ s/\s+--/\n--/;      #move attribution to line of its own

If you're doing this to strings that contain code you're building up for an eval , or just text to print out, you might not want to blindly strip off all leading whitespace because that would destroy your indentation. Although eval wouldn't care, your reader might.

Another embellishment is to use a special leading string for code that stands out. For example, here we'll prepend each line with @@@ , properly indented:

if ($REMEMBER_THE_MAIN) {
    $perl_main_C = dequote<<'    MAIN_INTERPRETER_LOOP';
        @@@ int
        @@@ runops() {
        @@@     SAVEI32(runlevel);
        @@@     runlevel++;
        @@@     while ( op = (*op->op_ppaddr)() ) ;
        @@@     TAINT_NOT;
        @@@     return 0;
        @@@ }
    MAIN_INTERPRETER_LOOP
    # add more code here if you want
}

Destroying indentation also gets you in trouble with poets.

sub dequote;
$poem = dequote<<EVER_ON_AND_ON;
       Now far ahead the Road has gone,
          And I must follow, if I can,
       Pursuing it with eager feet,
          Until it joins some larger way
       Where many paths and errands meet.
          And whither then? I cannot say.
                --Bilbo in /usr/src/perl/pp_ctl.c
EVER_ON_AND_ON
print "Here's your poem:\n\n$poem\n";

Here is its sample output:




Here's your poem:  








Now far ahead the Road has gone,








   And I must follow, if I can,








Pursuing it with eager feet,








   Until it joins some larger way








Where many paths and errands meet.








   And whither then? I cannot say.








         --Bilbo in /usr/src/perl/pp_ctl.c



The following dequote

sub dequote {
    local $_ = shift;
    my ($white, $leader);  # common whitespace and common leading string
    if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {
        ($white, $leader) = ($2, quotemeta($1));
    } else {
        ($white, $leader) = (/^(\s+)/, '');
    }
    s/^\s*?$leader(?:$white)?//gm;
    return $_;
}

If that pattern makes your eyes glaze over, you could always break it up and add comments by adding /x :

    if (m{
            ^                       # start of line
            \s *                    # 0 or more whitespace chars
            (?:                     # begin first non-remembered grouping
                 (                  #   begin save buffer $1
                    [^\w\s]         #     one byte neither space nor word
                    +               #     1 or more of such
                 )                  #   end save buffer $1
                 ( \s* )            #   put 0 or more white in buffer $2
                 .* \n              #   match through the end of first line
             )                      # end of first grouping
             (?:                    # begin second non-remembered grouping
                \s *                #   0 or more whitespace chars
                \1                  #   whatever string is destined for $1
                \2 ?                #   what'll be in $2, but optionally
                .* \n               #   match through the end of the line
             ) +                    # now repeat that group idea 1 or more
             $                      # until the end of the line
          }x
       )
    {
        ($white, $leader) = ($2, quotemeta($1));
    } else {
        ($white, $leader) = (/^(\s+)/, '');
    }
    s{
         ^                          # start of each line (due to /m)
         \s *                       # any amount of leading whitespace
            ?                       #   but minimally matched
         $leader                    # our quoted, saved per-line leader
         (?:                        # begin unremembered grouping
            $white                  #    the same amount
         ) ?                        # optionalize in case EOL after leader
    }{}xgm;

There, isn't that much easier to read? Well, maybe not; sometimes it doesn't help to pepper your code with insipid comments that mirror the code. This may be one of those cases. See Also

The "Scalar Value Constructors" section of perldata (1) and the "Here Documents" section of Chapter 2 of Programming Perl ; the s/// operator in perlre (1) and perlop (1), and the "Pattern Matching" section of Chapter 2 of Programming Perl

[Oct 16, 2017] HERE documents

Oct 16, 2017 | www.perlmeme.org

http://platform.twitter.com/widgets/tweet_button.f7323036818f270c17ea2eebc8e6be4f.en.html#dnt=false&id=twitter-widget-0&lang=en&original_referer=http%3A%2F%2Fwww.perlmeme.org%2Fhowtos%2Fsyntax%2Fhere_document.html&size=m&text=HERE%20documents&time=1508128162389&type=share&url=http%3A%2F%2Fwww.perlmeme.org%2Fhowtos%2Fsyntax%2Fhere_document.html

Introduction

If you're tempted to write multi-line output with multiple print() statements, because that's what you're used to in some other language, consider using a HERE-document instead.

Inspired by the here-documents in the Unix command line shells, Perl HERE-documents provide a convenient way to handle the quoting of multi-line values.

So you can replace this:

    print "Welcome to the MCG Carpark.\n";
    print "\n";
    print "There are currently 2,506 parking spaces available.\n";
    print "Please drive up to a booth and collect a ticket.\n";

with this:

    print <<'EOT';
    Welcome to the MCG Carpark.

    There are currently 2,506 parking spaces available.
    Please drive up to a booth and collect a ticket.
    EOT

The EOT in this example is an arbitrary string that you provide to indicate the start and end of the text being quoted. The terminating string must appear on a line by itself.

Quoting conventions.

The usual Perl quoting conventions apply, so if you want to interpolate variables in a here-document, use double quotes around your chosen terminating string:

    print <<"EOT";
    Welcome to the MCG Carpark.

    There are currently $available_places parking spaces available.
    Please drive up to booth and collect a ticket.
    EOT

Note that whilst you can quote your terminator with " or ' , you cannot use the equivalent qq() and q() operators. So this code is invalid:

    # This example will fail
    print <<qq(EOT);
    Welcome to the MCG Carpark.

    There are currently $available_places parking spaces available.
    Please drive up to booth and collect a ticket.
    EOT
The terminating string

Naturally, all of the text you supply to a here-document is quoted by the starting and ending strings. This means that any indentation you provide becomes part of the text that is used. In this example, each line of the output will contain four leading spaces.

    # Let's indent the text to be displayed. The leading spaces will be
    # preserved in the output.
    print <<"EOT";
        Welcome to the MCG Carpark.

        CAR PARK FULL. 
    EOT

The terminating string must appear on a line by itself, and it must have no whitespace before or after it. In this example, the terminating string EOT is preceded by four spaces, so Perl will not find it:

    # Let's indent the following lines. This introduces an error
        print <<"EOT";
        Welcome to the MCG Carpark.

        CAR PARK FULL. 
        EOT
    Can't find string terminator "EOT" anywhere before EOF at ....
Assignment

The here-document mechanism is just a generalized means of quoting text, so you can just as easily use it in an assignment:

    my $message =  <<"EOT";
    Welcome to the MCG Carpark.

    CAR PARK FULL. 
    EOT

    print $message;

And don't let the samples you've seen so far stop from considering the full range of possibilities. The terminating tag doesn't have to appear at the end of a statement.

Here is an example from CPAN.pm that conditionally assigns some text to $msg .

    $msg = <<EOF unless $configpm =~ /MyConfig/;

    # This is CPAN.pm's systemwide configuration file. This file provides
    # defaults for users, and the values can be changed in a per-user
    # configuration file. The user-config file is being looked for as
    # ~/.cpan/CPAN/MyConfig.pm.

    EOF

And this example from Module::Build::PPMMaker uses a here-document to construct the format string for sprintf() :

    $ppd .= sprintf(<<'EOF', $perl_version, $^O, $self->_varchname($build->config) );
        <PERLCORE VERSION="%s" />
        <OS NAME="%s" />
        <ARCHITECTURE NAME="%s" />
    EOF
See Also
     perldoc -q "HERE documents"
     perldoc perlop (see the <<EOF section).
     Perl for Newbies - Lecture 4

[Sep 27, 2017] Integer ASCII value to character in BASH using printf

Sep 27, 2017 | stackoverflow.com

user14070 , asked May 20 '09 at 21:07

Character to value works:
$ printf "%d\n" \'A
65
$

I have two questions, the first one is most important:

broaden , answered Nov 18 '09 at 10:10

One line
printf "\x$(printf %x 65)"

Two lines

set $(printf %x 65)
printf "\x$1"

Here is one if you do not mind using awk

awk 'BEGIN{printf "%c", 65}'

mouviciel , answered May 20 '09 at 21:12

For this kind of conversion, I use perl:
perl -e 'printf "%c\n", 65;'

user2350426 , answered Sep 22 '15 at 23:16

This works (with the value in octal):
$ printf '%b' '\101'
A

even for (some: don't go over 7) sequences:

$ printf '%b' '\'{101..107}
ABCDEFG

A general construct that allows (decimal) values in any range is:

$ printf '%b' $(printf '\\%03o' {65..122})
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz

Or you could use the hex values of the characters:

$ printf '%b' $(printf '\\x%x' {65..122})
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz

You also could get the character back with xxd (use hexadecimal values):

$ echo "41" | xxd -p -r
A

That is, one action is the reverse of the other:

$ printf "%x" "'A" | xxd -p -r
A

And also works with several hex values at once:

$ echo "41 42 43 44 45 46 47 48 49 4a" | xxd -p -r
ABCDEFGHIJ

or sequences (printf is used here to get hex values):

$ printf '%x' {65..90} | xxd -r -p 
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Or even use awk:

$ echo 65 | awk '{printf("%c",$1)}'
A

even for sequences:

$ seq 65 90 | awk '{printf("%c",$1)}'
ABCDEFGHIJKLMNOPQRSTUVWXYZ

David Hu , answered Dec 1 '11 at 9:43

For your second question, it seems the leading-quote syntax ( \'A ) is specific to printf :

If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.

From http://pubs.opengroup.org/onlinepubs/009695399/utilities/printf.html

Naaff , answered May 20 '09 at 21:21

One option is to directly input the character you're interested in using hex or octal notation:
printf "\x41\n"
printf "\101\n"

MagicMercury86 , answered Feb 21 '12 at 22:49

If you want to save the ASCII value of the character: (I did this in BASH and it worked)
{
char="A"

testing=$( printf "%d" "'${char}" )

echo $testing}

output: 65

chand , answered Nov 20 '14 at 10:05

Here's yet another way to convert 65 into A (via octal):
help printf  # in Bash
man bash | less -Ip '^[[:blank:]]*printf'

printf "%d\n" '"A'
printf "%d\n" "'A"

printf '%b\n' "$(printf '\%03o' 65)"

To search in man bash for \' use (though futile in this case):

man bash | less -Ip "\\\'"  # press <n> to go through the matches

,

If you convert 65 to hexadecimal it's 0x41 :

$ echo -e "\x41" A

[Sep 27, 2017] linux - How to convert DOS-Windows newline (CRLF) to Unix newline in a Bash script

Notable quotes:
"... Technically '1' is your program, b/c awk requires one when given option. ..."

Koran Molovik , asked Apr 10 '10 at 15:03

How can I programmatically (i.e., not using vi ) convert DOS/Windows newlines to Unix?

The dos2unix and unix2dos commands are not available on certain systems. How can I emulate these with commands like sed / awk / tr ?

Jonathan Leffler , answered Apr 10 '10 at 15:13

You can use tr to convert from DOS to Unix; however, you can only do this safely if CR appears in your file only as the first byte of a CRLF byte pair. This is usually the case. You then use:
tr -d '\015' <DOS-file >UNIX-file

Note that the name DOS-file is different from the name UNIX-file ; if you try to use the same name twice, you will end up with no data in the file.

You can't do it the other way round (with standard 'tr').

If you know how to enter carriage return into a script ( control-V , control-M to enter control-M), then:

sed 's/^M$//'     # DOS to Unix
sed 's/$/^M/'     # Unix to DOS

where the '^M' is the control-M character. You can also use the bash ANSI-C Quoting mechanism to specify the carriage return:

sed $'s/\r$//'     # DOS to Unix
sed $'s/$/\r/'     # Unix to DOS

However, if you're going to have to do this very often (more than once, roughly speaking), it is far more sensible to install the conversion programs (e.g. dos2unix and unix2dos , or perhaps dtou and utod ) and use them.

ghostdog74 , answered Apr 10 '10 at 15:21

tr -d "\r" < file

take a look here for examples using sed :

# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format.
sed 's/.$//'               # assumes that all lines end with CR/LF
sed 's/^M$//'              # in bash/tcsh, press Ctrl-V then Ctrl-M
sed 's/\x0D$//'            # works on ssed, gsed 3.02.80 or higher

# IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format.
sed "s/$/`echo -e \\\r`/"            # command line under ksh
sed 's/$'"/`echo \\\r`/"             # command line under bash
sed "s/$/`echo \\\r`/"               # command line under zsh
sed 's/$/\r/'                        # gsed 3.02.80 or higher

Use sed -i for in-place conversion e.g. sed -i 's/..../' file .

Steven Penny , answered Apr 30 '14 at 10:02

Doing this with POSIX is tricky:

To remove carriage returns:

ex -bsc '%!awk "{sub(/\r/,\"\")}1"' -cx file

To add carriage returns:

ex -bsc '%!awk "{sub(/$/,\"\r\")}1"' -cx file

Norman Ramsey , answered Apr 10 '10 at 22:32

This problem can be solved with standard tools, but there are sufficiently many traps for the unwary that I recommend you install the flip command, which was written over 20 years ago by Rahul Dhesi, the author of zoo . It does an excellent job converting file formats while, for example, avoiding the inadvertant destruction of binary files, which is a little too easy if you just race around altering every CRLF you see...

Gordon Davisson , answered Apr 10 '10 at 17:50

The solutions posted so far only deal with part of the problem, converting DOS/Windows' CRLF into Unix's LF; the part they're missing is that DOS use CRLF as a line separator , while Unix uses LF as a line terminator . The difference is that a DOS file (usually) won't have anything after the last line in the file, while Unix will. To do the conversion properly, you need to add that final LF (unless the file is zero-length, i.e. has no lines in it at all). My favorite incantation for this (with a little added logic to handle Mac-style CR-separated files, and not molest files that're already in unix format) is a bit of perl:
perl -pe 'if ( s/\r\n?/\n/g ) { $f=1 }; if ( $f || ! $m ) { s/([^\n])\z/$1\n/ }; $m=1' PCfile.txt

Note that this sends the Unixified version of the file to stdout. If you want to replace the file with a Unixified version, add perl's -i flag.

codaddict , answered Apr 10 '10 at 15:09

Using AWK you can do:
awk '{ sub("\r$", ""); print }' dos.txt > unix.txt

Using Perl you can do:

perl -pe 's/\r$//' < dos.txt > unix.txt

anatoly techtonik , answered Oct 31 '13 at 9:40

If you don't have access to dos2unix , but can read this page, then you can copy/paste dos2unix.py from here.
#!/usr/bin/env python
"""\
convert dos linefeeds (crlf) to unix (lf)
usage: dos2unix.py <input> <output>
"""
import sys

if len(sys.argv[1:]) != 2:
  sys.exit(__doc__)

content = ''
outsize = 0
with open(sys.argv[1], 'rb') as infile:
  content = infile.read()
with open(sys.argv[2], 'wb') as output:
  for line in content.splitlines():
    outsize += len(line) + 1
    output.write(line + '\n')

print("Done. Saved %s bytes." % (len(content)-outsize))

Cross-posted from superuser .

nawK , answered Sep 4 '14 at 0:16

An even simpler awk solution w/o a program:
awk -v ORS='\r\n' '1' unix.txt > dos.txt

Technically '1' is your program, b/c awk requires one when given option.

UPDATE : After revisiting this page for the first time in a long time I realized that no one has yet posted an internal solution, so here is one:

while IFS= read -r line;
do printf '%s\n' "${line%$'\r'}";
done < dos.txt > unix.txt

Santosh , answered Mar 12 '15 at 22:36

This worked for me
tr "\r" "\n" < sampledata.csv > sampledata2.csv

ThorSummoner , answered Jul 30 '15 at 17:38

Super duper easy with PCRE;

As a script, or replace $@ with your files.

#!/usr/bin/env bash
perl -pi -e 's/\r\n/\n/g' -- $@

This will overwrite your files in place!

I recommend only doing this with a backup (version control or otherwise)

Ashley Raiteri , answered May 19 '14 at 23:25

For Mac osx if you have homebrew installed [ http://brew.sh/][1]
brew install dos2unix

for csv in *.csv; do dos2unix -c mac ${csv}; done;

Make sure you have made copies of the files, as this command will modify the files in place. The -c mac option makes the switch to be compatible with osx.

lzc , answered May 31 '16 at 17:15

TIMTOWTDI!
perl -pe 's/\r\n/\n/; s/([^\n])\z/$1\n/ if eof' PCfile.txt

Based on @GordonDavisson

One must consider the possibility of [noeol] ...

kazmer , answered Nov 6 '16 at 23:30

You can use awk. Set the record separator ( RS ) to a regexp that matches all possible newline character, or characters. And set the output record separator ( ORS ) to the unix-style newline character.
awk 'BEGIN{RS="\r|\n|\r\n|\n\r";ORS="\n"}{print}' windows_or_macos.txt > unix.txt

user829755 , answered Jul 21 at 9:21

interestingly in my git-bash on windows sed "" did the trick already:
$ echo -e "abc\r" >tst.txt
$ file tst.txt
tst.txt: ASCII text, with CRLF line terminators
$ sed -i "" tst.txt
$ file tst.txt
tst.txt: ASCII text

My guess is that sed ignores them when reading lines from input and always writes unix line endings on output.

Gannet , answered Jan 24 at 8:38

As an extension to Jonathan Leffler's Unix to DOS solution, to safely convert to DOS when you're unsure of the file's current line endings:
sed '/^M$/! s/$/^M/'

This checks that the line does not already end in CRLF before converting to CRLF.

vmsnomad , answered Jun 23 at 18:37

Had just to ponder that same question (on Windows-side, but equally applicable to linux.) Surprisingly nobody mentioned a very much automated way of doing CRLF<->LF conversion for text-files using good old zip -ll option (Info-ZIP):
zip -ll textfiles-lf.zip files-with-crlf-eol.*
unzip textfiles-lf.zip

NOTE: this would create a zip file preserving the original file names but converting the line endings to LF. Then unzip would extract the files as zip'ed, that is with their original names (but with LF-endings), thus prompting to overwrite the local original files if any.

Relevant excerpt from the zip --help :

zip --help
...
-l   convert LF to CR LF (-ll CR LF to LF)
I tried sed 's/^M$//' file.txt on OSX as well as several other methods ( http://www.thingy-ma-jig.co.uk/blog/25-11-2010/fixing-dos-line-endings or http://hintsforums.macworld.com/archive/index.php/t-125.html ). None worked, the file remained unchanged (btw Ctrl-v Enter was needed to reproduce ^M). In the end I used TextWrangler. Its not strictly command line but it works and it doesn't complain.

[Sep 27, 2017] g flag in Perl regular expressions

/g transforms m// and s/// to different commands with different contextual behaviour!
Sep 27, 2017 | www.perlmonks.org

Hello perl-diddler ,

If you look at the documentation for qr// , youll see that the /g modifier is not supported:

qr/ STRING /msixpodualn
-- perlop#Regexp-Quote-Like-Operators

Which makes sense: qr turns STRING into a regular expression, which may then be used in any number of m{...} and s{...}{...} constructs. The appropriate place to add a /g modifier is at the point of use:

use strict; use warnings; use P; my $re = qr{ (\w+) }x; my $dat = "Just another cats meow"; my @matches = $dat =~ /$re/g; P "#matches=%s, matches=%s", scalar(@matches), \@matches; exit scalar(@matches); [download]

Output:

12:53 >perl 1645_SoPW.pl #matches=4, matches=["Just", "another", "cats", "meow"] 12:54 > [download]

Update:

P.S. - I also just noticed that in addition to stripping out the 'g' option, the 'x' option doesn't seem to work in the regex's parens, i.e. - (?x).

I dont understand what youre saying here. Can you give some example code?

Hope that helps,

[Sep 27, 2017] qq qw qr qx

crookedtimber.org

q// is generally the same thing as using single quotes - meaning it doesn't interpolate values inside the delimiters.
qq// is the same as double quoting a string. It interpolates.
qw// return a list of white space delimited words. @q = qw/this is a test/ is functionally the same as @q = ('this', 'is', 'a', 'test')
qx// is the same thing as using the backtick operators.

[Sep 17, 2017] Function pos() and finding the postion at which the pattern matched the string

Notable quotes:
"... Using raw input $string in a regexp will act weird if somebody types in special characters (accidentally or maliciously). Consider using /\Q$string/gi to avoid treating $string as a regexp. ..."
May 16, 2017 | perldoc.perl.org
function. For example,
  1. $x = "cat dog house" ; # 3 words
  2. while ( $x =~ /(\w+)/g ) {
  3. print "Word is $1, ends at position " , pos $x , "\n" ;
  4. }

prints

  1. Word is cat, ends at position 3
  2. Word is dog, ends at position 7
  3. Word is house, ends at position 13

A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the //c , as in /regex/gc .

In Perl, how do you find the position of a match in a string, if forced to use a foreach loop pos - Stack Overflow

favorite 2 I have to find all the positions of matching strings within a larger string using a while loop, and as a second method using a foreach loop. I have figured out the while loop method, but I am stuck on a foreach method. Here is the 'while' method:

....

my
 $sequence 
=
'AACAAATTGAAACAATAAACAGAAACAAAAATGGATGCGATCAAGAAAAAGATGC'
.
'AGGCGATGAAAATCGAGAAGGATAACGCTCTCGATCGAGCCGATGCCGCGGAAGA'
.
'AAAAGTACGTCAAATGACGGAAAAGTTGGAACGAATCGAGGAAGAACTACGTGAT'
.
'ACCCAGAAAAAGATGATGCNAACTGAAAATGATTTAGATAAAGCACAGGAAGATT'
.
'TATCTGTTGCAAATACCAACTTGGAAGATAAGGAAAAGAAAGTTCAAGAGGCGGA'
.
'GGCTGAGGTAGCANCCCTGAATCGTCGTATGACACTTCTGGAAGAGGAATTGGAA'
.
'CGAGCTGAGGAACGTTTGAAGATTGCAACGGATAAATTGGAAGAAGCAACACATA'
.
'CAGCTGATGAATCTGAACGTGTTCGCNAGGTTATGGAAA'
;
my
 $string 
=
<
STDIN
>;

chomp $string
;
while
(
$sequence 
=~
/
$string
/
gi 
)
{

 printf 
"Sequence found at position: %d\n"
,
 pos
(
$sequence
)-
 length
(
$string
);
}

Here is my foreach method:


foreach
(
$sequence 
=~
/
$string
/
gi 
)
 

 printf 
"Sequence found at position: %d\n"
,
 pos
(
$sequence
)
-
 length
(
$string
);
}

Could someone please give me a clue on why it doesn't work the same way? Thanks!

My Output if I input "aaca":


Part
1
 using a 
while
 loop
Sequence
 found at position
:
0
Sequence
 found at position
:
10
Sequence
 found at position
:
17
Sequence
 found at position
:
23
Sequence
 found at position
:
377
Part
2
 using a 
foreach
 loop
Sequence
 found at position
:
-
4
Sequence
 found at position
:
-
4
Sequence
 found at position
:
-
4
Sequence
 found at position
:
-
4
Sequence
 found at position
:
-
4

perl foreach
share | improve this question asked Jan 31 '11 at 21:38 user83598 66 3 9
2
Using raw input $string in a regexp will act weird if somebody types in special characters (accidentally or maliciously). Consider using /\Q$string/gi to avoid treating $string as a regexp. aschepler Jan 31 '11 at 22:15
add a comment

Answers 2 active oldest votes

up vote 9 down vote accepted Your problem here is context. In the while loop, the condition is in scalar context. In scalar context, the match operator in g mode will sequentially match along the string. Thus checking pos within the loop does what you want.

In the foreach loop, the condition is in list context. In list context, the match operator in g mode will return a list of all matches (and it will calculate all of the matches before the loop body is ever entered). foreach is then loading the matches one by one into $_ for you, but you are never using the variable. pos in the body of the loop is not useful as it contains the result after the matches have ended.

The takeaway here is that if you want pos to work, and you are using the g modifier, you should use the while loop which imposes scalar context and makes the regex iterate across the matches in the string.

Sinan inspired me to write a few foreach examples:

  • This one is fairly succinct using split in separator retention mode:
    
    my
     $pos 
    =
    0
    ;
    foreach
    (
    split 
    /(
    $string
    )/
    i 
    =>
     $sequence
    )
    {
    print
    "Sequence found at position: $pos\n"
    if
     lc eq lc $string
    ;
    
        $pos 
    +=
     length
    ;
    }
    
    
  • A regex equivalent of the split solution:
    
    my
     $pos 
    =
    0
    ;
    foreach
    (
    $sequence 
    =~
    /(
    \Q$string\E
    |(?:(?!
    \Q$string\E
    ).)+)/
    gi
    )
    {
    print
    "Sequence found at position: $pos\n"
    if
     lc eq lc $string
    ;
    
        $pos 
    +=
     length
    ;
    }
    
    
  • But this is clearly the best solution for your problem:
    
    {
    package
    Dumb
    ::
    Homework
    ;
    sub
     TIEARRAY 
    {
    
            bless 
    {
    
                haystack 
    =>
     $_
    [
    1
    ],
    
                needle   
    =>
     $_
    [
    2
    ],
    
                size     
    =>
    2
    **
    31
    -
    1
    ,
    
                pos      
    =>
    [],
    }
    }
    sub
     FETCH 
    {
    my
    (
    $self
    ,
     $index
    )
    =
    @_
    ;
    my
    (
    $pos
    ,
     $needle
    )
    =
    @$self
    {
    qw
    (
    pos needle
    )};
    
    
            return $$pos
    [
    $index
    ]
    if
     $index 
    <
    @$pos
    ;
    while
    (
    $index 
    +
    1
    >=
    @$pos
    )
    {
    unless
    (
    $$self
    {
    haystack
    }
    =~
    /
    \Q$needle
    /
    gi
    )
    {
    
                    $$self
    {
    size
    }
    =
    @$pos
    ;
    last
    }
    
                push 
    @$pos
    ,
     pos 
    (
    $$self
    {
    haystack
    })
    -
     length $needle
    ;
    }
    
            $$pos
    [
    $index
    ]
    }
    sub
     FETCHSIZE 
    {
    $_
    [
    0
    ]{
    size
    }}
    }
    
    
    tie 
    my
    @pos
    ,
    'Dumb::Homework'
    =>
     $sequence
    ,
     $string
    ;
    print
    "Sequence found at position: $_\n"
    foreach
    @pos
    ;
    # look how clean it is
    
    

    The reason its the best is because the other two solutions have to process the entire global match first, before you ever see a result. For large inputs (like DNA) that could be a problem. The Dumb::Homework package implements an array that will lazily find the next position each time the foreach iterator asks for it. It will even store the positions so you can get to them again without reprocessing. (In truth it looks one match past the requested match, this allows it to end properly in the foreach , but still much better than processing the whole list)

  • Actually, the best solution is still to not use foreach as it is not the correct tool for the job.

[Jun 28, 2017] A Short Guide to DBI

Notable quotes:
"... Structured Query Language ..."
"... database handle ..."
"... statement handle ..."
Jun 18, 2017 | www.perl.com
By Mark-Jason Dominus on October 22, 1999 12:00 AM Short guide to DBI (The Perl Database Interface Module) General information about relational databases

Relational databases started to get to be a big deal in the 1970's, and they're still a big deal today, which is a little peculiar, because they're a 1960's technology.

A relational database is a bunch of rectangular tables. Each row of a table is a record about one person or thing; the record contains several pieces of information called fields . Here is an example table:

        LASTNAME   FIRSTNAME   ID   POSTAL_CODE   AGE  SEX
        Gauss      Karl        119  19107         30   M
        Smith      Mark        3    T2V 3V4       53   M
        Noether    Emmy        118  19107         31   F
        Smith      Jeff        28   K2G 5J9       19   M
        Hamilton   William     247  10139         2    M

The names of the fields are LASTNAME , FIRSTNAME , ID , POSTAL_CODE , AGE , and SEX . Each line in the table is a record , or sometimes a row or tuple . For example, the first row of the table represents a 30-year-old male whose name is Karl Gauss, who lives at postal code 19107, and whose ID number is 119.

Sometimes this is a very silly way to store information. When the information naturally has a tabular structure it's fine. When it doesn't, you have to squeeze it into a table, and some of the techniques for doing that are more successful than others. Nevertheless, tables are simple and are easy to understand, and most of the high-performance database systems you can buy today operate under this 1960's model.

About SQL

SQL stands for Structured Query Language . It was invented at IBM in the 1970's. It's a language for describing searches and modifications to a relational database.

SQL was a huge success, probably because it's incredibly simple and anyone can pick it up in ten minutes. As a result, all the important database systems support it in some fashion or another. This includes the big players, like Oracle and Sybase, high-quality free or inexpensive database systems like MySQL, and funny hacks like Perl's DBD::CSV module, which we'll see later.

There are four important things one can do with a table:

SELECT
Find all the records that have a certain property

INSERT

DELETE
Remove old records

UPDATE
Modify records that are already there

Those are the four most important SQL commands, also called queries . Suppose that the example table above is named people . Here are examples of each of the four important kinds of queries:

 SELECT firstname FROM people WHERE lastname = 'Smith'

(Locate the first names of all the Smiths.)

 DELETE FROM people WHERE id = 3

(Delete Mark Smith from the table)

 UPDATE people SET age = age+1 WHERE id = 247

(William Hamilton just had a birthday.)



   

(Add Leonhard Euler to the table.)

There are a bunch of other SQL commands for creating and discarding tables, for granting and revoking access permissions, for committing and abandoning transactions, and so forth. But these four are the important ones. Congratulations; you are now a SQL programmer. For the details, go to any reasonable bookstore and pick up a SQL quick reference.

About Databases --

Every database system is a little different. You talk to some databases over the network and make requests of the database engine; other databases you talk to through files or something else.

Typically when you buy a commercial database, you get a library with it. The vendor has written some functions for talking to the database in some language like C, compiled the functions, and the compiled code is the library. You can write a C program that calls the functions in the library when it wants to talk to the database.

There's a saying that any software problem can be solved by adding a layer of indirection. That's what Perl's DBI (`Database Interface') module is all about. It was written by Tim Bunce.

DBI is designed to protect you from the details of the vendor libraries. It has a very simple interface for saying what SQL queries you want to make, and for getting the results back. DBI doesn't know how to talk to any particular database, but it does know how to locate and load in DBD modules have the vendor libraries in them and know how to talk to the real databases; there is one DBD module for every different database.

When you ask DBI module, which spins around three times or drinks out of its sneaker or whatever is necessary to communicate with the real database. When it gets the results back, it passes them to DBI . Then DBI gives you the results. Since your program only has to deal with DBI , and not with the real database, you don't have to worry about barking like a chicken.

Here's your program talking to the DBI library. You are using two databases at once. One is an Oracle database server on some other machine, and another is a DBD::CSV database that stores the data in a bunch of plain text files on the local disk.

Your program sends a query to DBI , which forwards it to the appropriate DBD module; let's say it's DBD::Oracle . DBD::Oracle knows how to translate what it gets from DBI into the format demanded by the Oracle library, which is built into it. The library forwards the request across the network, gets the results back, and returns them to DBD::Oracle . DBD::Oracle returns the results to DBI as a Perl data structure. Finally, your program can get the results from DBI .

On the other hand, suppose that your program was querying the text files. It would prepare the same sort of query in exactly the same way, and send it to DBI in exactly the same way. DBI would see that you were trying to talk to the DBD::CSV database and forward the request to the DBD::CSV module. The DBD::CSV module has Perl functions in it that tell it how to parse SQL and how to hunt around in the text files to find the information you asked for. It then returns the results to DBI as a Perl data structure. Finally, your program gets the results from DBI in exactly the same way that it would have if you were talking to Oracle instead.

There are two big wins that result from this organization. First, you don't have to worry about the details of hunting around in text files or talking on the network to the Oracle server or dealing with Oracle's library. You just have to know how to talk to DBI .

Second, if you build your program to use Oracle, and then the following week upper management signs a new Strategic Partnership with Sybase, it's easy to convert your code to use Sybase instead of Oracle. You change exactly one line in your program, the line that tells DBI to talk to DBD::Oracle , and have it use DBD::Sybase instead. Or you might build your program to talk to a cheap, crappy database like MS Access, and then next year when the application is doing well and getting more use than you expected, you can upgrade to a better database next year without changing any of your code.

There are DBD modules for talking to every important kind of SQL database. DBD::Oracle will talk to Oracle, and DBD::Sybase will talk to Sybase. DBD::ODBC will talk to any ODBC database including Microsoft Acesss. (ODBC is a Microsoft invention that is analogous to DBI itself. There is no DBD module for talking to Access directly.) DBD::CSV allows SQL queries on plain text files. DBD::mysql talks to the excellent MySQL database from TCX DataKonsultAB in Sweden. (MySQL is a tremendous bargain: It's $200 for commercial use, and free for noncommerical use.)

Example of How to Use DBI

Here's a typical program. When you run it, it waits for you to type a last name. Then it searches the database for people with that last name and prints out the full name and ID number for each person it finds. For example:

 Enter name> Noether
                118: Emmy Noether

        Enter name> Smith
                3: Mark Smith
                28: Jeff Smith

        Enter name> Snonkopus
                No names matched `Snonkopus'.
       
        Enter name> ^D

Here is the code:

 use DBI;

        my $dbh = DBI->connect('DBI:Oracle:payroll')
                or die "Couldn't connect to database: " . DBI->errstr;
        my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?')
                or die "Couldn't prepare statement: " . $dbh->errstr;

        print "Enter name> ";
        while ($lastname = <>) {               # Read input from the user
          my @data;
          chomp $lastname;
          $sth->execute($lastname)             # Execute the query
            or die "Couldn't execute statement: " . $sth->errstr;

          # Read the matching records and print them out         
          while (@data = $sth->fetchrow_array()) {
            my $firstname = $data[1];
            my $id = $data[2];
            print "\t$id: $firstname $lastname\n";
          }

          if ($sth->rows == 0) {
            print "No names matched `$lastname'.\n\n";
          }

          $sth->finish;
          print "\n";
          print "Enter name> ";
        }
         
        $dbh->disconnect;

Explanation of the Example --

 use DBI;

This loads in the DBI module. Notice that we don't have to load in any DBD module. DBI will do that for us when it needs to.

 my $dbh = DBI->connect('DBI:Oracle:payroll');
                or die "Couldn't connect to database: " . DBI->errstr;

The connect call tries to connect to a database. The first argument, DBI:Oracle:payroll , tells DBI what kind of database it is connecting to. The Oracle part tells it to load DBD::Oracle and to use that to communicate with the database. If we had to switch to Sybase next week, this is the one line of the program that we would change. We would have to change Oracle to Sybase .

payroll is the name of the database we will be searching. If we were going to supply a username and password to the database, we would do it in the connect call:

 my $dbh = DBI->connect('DBI:Oracle:payroll', 'username', 'password')
                or die "Couldn't connect to database: " . DBI->errstr;

If DBI connects to the database, it returns a database handle object, which we store into $dbh . This object represents the database connection. We can be connected to many databases at once and have many such database connection objects.

If DBI can't connect, it returns an undefined value. In this case, we use die to abort the program with an error message. DBI->errstr returns the reason why we couldn't connect-``Bad password'' for example.

 my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?')
                or die "Couldn't prepare statement: " . $dbh->errstr;

The prepare call prepares a query to be executed by the database. The argument is any SQL at all. On high-end databases, prepare will send the SQL to the database server, which will compile it. If prepare is successful, it returns a statement handle object which represents the statement; otherwise it returns an undefined value and we abort the program. $dbh->errstr will return the reason for failure, which might be ``Syntax error in SQL''. It gets this reason from the actual database, if possible.

The ? in the SQL will be filled in later. Most databases can handle this. For some databases that don't understand the ? , the DBD module will emulate it for you and will pretend that the database understands how to fill values in later, even though it doesn't.

 print "Enter name> ";

Here we just print a prompt for the user.

 while ($lastname = <>) {               # Read input from the user
          ...
        }

This loop will repeat over and over again as long as the user enters a last name. If they type a blank line, it will exit. The Perl <> symbol means to read from the terminal or from files named on the command line if there were any.

 my @data;

This declares a variable to hold the data that we will get back from the database.

 chomp $lastname;

This trims the newline character off the end of the user's input.

 $sth->execute($lastname)             # Execute the query
            or die "Couldn't execute statement: " . $sth->errstr;

execute executes the statement that we prepared before. The argument $lastname is substituted into the SQL in place of the ? that we saw earlier. execute returns a true value if it succeeds and a false value otherwise, so we abort if for some reason the execution fails.

 while (@data = $sth->fetchrow_array()) {
            ...
           }

fetchrow_array returns one of the selected rows from the database. You get back an array whose elements contain the data from the selected row. In this case, the array you get back has six elements. The first element is the person's last name; the second element is the first name; the third element is the ID, and then the other elements are the postal code, age, and sex.

Each time we call fetchrow_array , we get back a different record from the database. When there are no more matching records, fetchrow_array returns the empty list and the while loop exits.

 my $firstname = $data[1];
             my $id = $data[2];

These lines extract the first name and the ID number from the record data.

 print "\t$id: $firstname $lastname\n";

This prints out the result.

 if ($sth->rows == 0) {
            print "No names matched `$lastname'.\n\n";
          }

The rows method returns the number of rows of the database that were selected. If no rows were selected, then there is nobody in the database with the last name that the user is looking for. In that case, we print out a message. We have to do this after the while loop that fetches whatever rows were available, because with some databases you don't know how many rows there were until after you've gotten them all.

 $sth->finish;
          print "\n";
          print "Enter name> ";

Once we're done reporting about the result of the query, we print another prompt so that the user can enter another name. finish tells the database that we have finished retrieving all the data for this query and allows it to reinitialize the handle so that we can execute it again for the next query.

 $dbh->disconnect;

When the user has finished querying the database, they type a blank line and the main while loop exits. disconnect closes the connection to the database.

Cached Queries

Here's a function which looks up someone in the example table, given their ID number, and returns their age:

 sub age_by_id {
          # Arguments: database handle, person ID number
          my ($dbh, $id) = @_;
          my $sth = $dbh->prepare('SELECT age FROM people WHERE id = ?')
            or die "Couldn't prepare statement: " . $dbh->errstr;

 $sth->execute($id)
            or die "Couldn't execute statement: " . $sth->errstr;

 my ($age) = $sth->fetchrow_array();
          return $age;
        }

It prepares the query, executes it, and retrieves the result.

There's a problem here though. Even though the function works correctly, it's inefficient. Every time it's called, it prepares a new query. Typically, preparing a query is a relatively expensive operation. For example, the database engine may parse and understand the SQL and translate it into an internal format. Since the query is the same every time, it's wasteful to throw away this work when the function returns.

Here's one solution:

 { my $sth;
          sub age_by_id {
            # Arguments: database handle, person ID number
            my ($dbh, $id) = @_;

 if (! defined $sth) {
              $sth = $dbh->prepare('SELECT age FROM people WHERE id = ?')
                or die "Couldn't prepare statement: " . $dbh->errstr;
            }

 $sth->execute($id)
              or die "Couldn't execute statement: " . $sth->errstr;

 my ($age) = $sth->fetchrow_array();
            return $age;
          }
        }

There are two big changes to this function from the previous version. First, the $sth variable has moved outside of the function; this tells Perl that its value should persist even after the function returns. Next time the function is called, $sth will have the same value as before.

Second, the prepare code is in a conditional block. It's only executed if $sth does not yet have a value. The first time the function is called, the prepare code is executed and the statement handle is stored into $sth . This value persists after the function returns, and the next time the function is called, $sth still contains the statement handle and the prepare code is skipped.

Here's another solution:

 sub age_by_id {
          # Arguments: database handle, person ID number
          my ($dbh, $id) = @_;
          my $sth = $dbh->prepare_cached('SELECT age FROM people WHERE id = ?')
            or die "Couldn't prepare statement: " . $dbh->errstr;

 $sth->execute($id)
            or die "Couldn't execute statement: " . $sth->errstr;

 my ($age) = $sth->fetchrow_array();
          return $age;
        }

Here the only change to to replace prepare with prepare_cached . The prepare_cached call is just like prepare , except that it looks to see if the query is the same as last time. If so, it gives you the statement handle that it gave you before.

Transactions

Many databases support transactions . This means that you can make a whole bunch of queries which would modify the databases, but none of the changes are actually made. Then at the end you issue the special SQL query COMMIT , and all the changes are made simultaneously. Alternatively, you can issue the query ROLLBACK , in which case all the queries are thrown away.

As an example of this, consider a function to add a new employee to a database. The database has a table called employees that looks like this:

 FIRSTNAME  LASTNAME   DEPARTMENT_ID
        Gauss      Karl       17
        Smith      Mark       19
        Noether    Emmy       17
        Smith      Jeff       666
        Hamilton   William    17

and a table called departments that looks like this:

 ID   NAME               NUM_MEMBERS
        17   Mathematics        3
        666  Legal              1
        19   Grounds Crew       1

The mathematics department is department #17 and has three members: Karl Gauss, Emmy Noether, and William Hamilton.

Here's our first cut at a function to insert a new employee. It will return true or false depending on whether or not it was successful:

 sub new_employee {
          # Arguments: database handle; first and last names of new employee;
          # department ID number for new employee's work assignment
          my ($dbh, $first, $last, $department) = @_;
          my ($insert_handle, $update_handle);

 my $insert_handle =
            $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)');
          my $update_handle =
            $dbh->prepare_cached('UPDATE departments
                                     SET num_members = num_members + 1
                                   WHERE id = ?');

 die "Couldn't prepare queries; aborting"
            unless defined $insert_handle && defined $update_handle;

 $insert_handle->execute($first, $last, $department) or return 0;
          $update_handle->execute($department) or return 0;
          return 1;   # Success
        }

We create two handles, one for an insert query that will insert the new employee's name and department number into the employees table, and an update query that will increment the number of members in the new employee's department in the department table. Then we execute the two queries with the appropriate arguments.

There's a big problem here: Suppose, for some reason, the second query fails. Our function returns a failure code, but it's too late, it has already added the employee to the employees table, and that means that the count in the departments table is wrong. The database now has corrupted data in it.

The solution is to make both updates part of the same transaction. Most databases will do this automatically, but without an explicit instruction about whether or not to commit the changes, some databases will commit the changes when we disconnect from the database, and others will roll them back. We should specify the behavior explicitly.

Typically, no changes will actually be made to the database until we issue a commit . The version of our program with commit looks like this:

 sub new_employee {
          # Arguments: database handle; first and last names of new employee;
          # department ID number for new employee's work assignment
          my ($dbh, $first, $last, $department) = @_;
          my ($insert_handle, $update_handle);

 my $insert_handle =
            $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)');
          my $update_handle =
            $dbh->prepare_cached('UPDATE departments
                                     SET num_members = num_members + 1
                                   WHERE id = ?');

 die "Couldn't prepare queries; aborting"
            unless defined $insert_handle && defined $update_handle;

 my $success = 1;
          $success &&= $insert_handle->execute($first, $last, $department);
          $success &&= $update_handle->execute($department);

 my $result = ($success ? $dbh->commit : $dbh->rollback);
          unless ($result) {
            die "Couldn't finish transaction: " . $dbh->errstr
          }
          return $success;
        }

We perform both queries, and record in $success whether they both succeeded. $success will be true if both queries succeeded, false otherwise. If the queries succeded, we commit the transaction; otherwise, we roll it back, cancelling all our changes.

The problem of concurrent database access is also solved by transactions. Suppose that queries were executed immediately, and that some other program came along and examined the database after our insert but before our update. It would see inconsistent data in the database, even if our update would eventually have succeeded. But with transactions, all the changes happen simultaneously when we do the commit , and the changes are committed automatically, which means that any other program looking at the database either sees all of them or none.

Miscellaneous --

do

If you're doing an UPDATE , INSERT , or DELETE there is no data that comes back from the database, so there is a short cut. You can say

 $dbh->do('DELETE FROM people WHERE age > 65');

for example, and DBI will prepare the statement, execute it, and finish it. do returns a true value if it succeeded, and a false value if it failed. Actually, if it succeeds it returns the number of affected rows. In the example it would return the number of rows that were actually deleted. ( DBI plays a magic trick so that the value it turns is true even when it is 0. This is bizarre, because 0 is usually false in Perl. But it's convenient because you can use it either as a number or as a true-or-false success code, and it works both ways.)

AutoCommit

If your transactions are simple, you can save yourself the trouble of having to issue a lot of commit s. When you make the connect call, you can specify an AutoCommit option which will perform an automatic commit operation after every successful query. Here's what it looks like:

 my $dbh = DBI->connect('DBI:Oracle:payroll',
                               {AutoCommit => 1},
                              )
                or die "Couldn't connect to database: " . DBI->errstr;

Automatic Error Handling

When you make the connect call, you can specify a RaiseErrors option that handles errors for you automatically. When an error occurs, DBI will abort your program instead of returning a failure code. If all you want is to abort the program on an error, this can be convenient:

 my $dbh = DBI->connect('DBI:Oracle:payroll',
                               {RaiseError => 1},
                              )
                or die "Couldn't connect to database: " . DBI->errstr;

Don't do This

People are always writing code like this:

 while ($lastname = <>) {
          my $sth = $dbh->prepare("SELECT * FROM people
                                   WHERE lastname = '$lastname'");
          $sth->execute();
          # and so on ...
        }

Here we interpolated the value of $lastname directly into the SQL in the prepare call.

This is a bad thing to do for three reasons.

First, prepare calls can take a long time. The database server has to compile the SQL and figure out how it is going to run the query. If you have many similar queries, that is a waste of time.

Second, it will not work if $lastname contains a name like O'Malley or D'Amico or some other name with an ' . The ' has a special meaning in SQL, and the database will not understand when you ask it to prepare a statement that looks like

 SELECT * FROM people WHERE lastname = 'O'Malley'

It will see that you have three ' s and complain that you don't have a fourth matching ' somewhere else.

Finally, if you're going to be constructing your query based on a user input, as we did in the example program, it's unsafe to simply interpolate the input directly into the query, because the user can construct a strange input in an attempt to trick your program into doing something it didn't expect. For example, suppose the user enters the following bizarre value for $input :

 x' or lastname = lastname or lastname = 'y

Now our query has become something very surprising:

 SELECT * FROM people WHERE lastname = 'x'
         or lastname = lastname or lastname = 'y'

The part of this query that our sneaky user is interested in is the second or clause. This clause selects all the records for which lastname is equal to lastname ; that is, all of them. We thought that the user was only going to be able to see a few records at a time, and now they've found a way to get them all at once. This probably wasn't what we wanted.

References

A complete list of DBD modules are available here
You can download these modules here
DBI modules are available here
You can get MySQL from www.tcx.se

People go to all sorts of trouble to get around these problems with interpolation. They write a function that puts the last name in quotes and then backslashes any apostrophes that appear in it. Then it breaks because they forgot to backslash backslashes. Then they make their escape function better. Then their code is a big message because they are calling the backslashing function every other line. They put a lot of work into it the backslashing function, and it was all for nothing, because the whole problem is solved by just putting a ? into the query, like this

 SELECT * FROM people WHERE lastname = ?

All my examples look like this. It is safer and more convenient and more efficient to do it this way.

[Jun 28, 2017] Bless My Referents by Damian Conway

September 16, 1999 | www.perl.com
Introduction

Damian Conway is the author of the newly released Object Oriented Perl , the first of a new series of Perl books from Manning.

Object-oriented programming in Perl is easy. Forget the heavy theory and the sesquipedalian jargon: classes in Perl are just regular packages, objects are just variables, methods are just subroutines. The syntax and semantics are a little different from regular Perl, but the basic building blocks are completely familiar.

The one problem most newcomers to object-oriented Perl seem to stumble over is the notion of references and referents, and how the two combine to create objects in Perl. So let's look at how references and referents relate to Perl objects, and see who gets to be blessed and who just gets to point the finger.

Let's start with a short detour down a dark alley...

References and referents

Sometimes it's important to be able to access a variable indirectly- to be able to use it without specifying its name. There are two obvious motivations: the variable you want may not have a name (it may be an anonymous array or hash), or you may only know which variable you want at run-time (so you don't have a name to offer the compiler).

To handle such cases, Perl provides a special scalar datatype called a reference . A reference is like the traditional Zen idea of the "finger pointing at the moon". It's something that identifies a variable, and allows us to locate it. And that's the stumbling block most people need to get over: the finger (reference) isn't the moon (variable); it's merely a means of working out where the moon is.

Making a reference

When you prefix an existing variable or value with the unary \ operator you get a reference to the original variable or value. That original is then known as the referent to which the reference refers.

For example, if $s is a scalar variable, then \$s is a reference to that scalar variable (i.e. a finger pointing at it) and $s is that finger's referent. Likewise, if @a in an array, then \@a is a reference to it.

In Perl, a reference to any kind of variable can be stored in another scalar variable. For example:

$slr_ref = \$s;     
# scalar $slr_ref stores a reference to scalar $s
$arr_ref = \@a;     
# scalar $arr_ref stores a reference to array @a
$hsh_ref = \%h;     
# scalar $hsh_ref stores a reference to hash %h

Figure 1 shows the relationships produced by those assignments.

Note that the references are separate entities from the referents at which they point. The only time that isn't the case is when a variable happens to contain a reference to itself:

$self_ref = \$self_ref;
     # $self_ref stores a reference to itself!

That (highly unusual) situation produces an arrangement shown in Figure 2.

Once you have a reference, you can get back to the original thing it refers to-it's referent-simply by prefixing the variable containing the reference (optionally in curly braces) with the appropriate variable symbol. Hence to access $s , you could write $$slr_ref or ${$slr_ref} . At first glance, that might look like one too many dollar signs, but it isn't. The $slr_ref tells Perl which variable has the reference; the extra $ tells Perl to follow that reference and treat the referent as a scalar.

Similarly, you could access the array @a as @{$arr_ref} , or the hash %h as %{$hsh_ref} . In each case, the $whatever_ref is the name of the scalar containing the reference, and the leading @ or % indicates what type of variable the referent is. That type is important: if you attempt to prefix a reference with the wrong symbol (for example, @{$slr_ref} or ${$hsh_ref} ), Perl produces a fatal run-time error.

[A series of scalar variables with arrows pointing to
other variables]
Figure 1: References and their referents

[A scalar variable with an arrow pointing back to
itself]
Figure 2: A reference that is its own referent

The "arrow" operator

Accessing the elements of an array or a hash through a reference can be awkward using the syntax shown above. You end up with a confusing tangle of dollar signs and brackets:

${$arr_ref}[0] = ${$hsh_ref}{"first"};  
# i.e. $a[0] = $h{"first"}

So Perl provides a little extra syntax to make life just a little less cluttered:
$arr_ref->[0] = $hsh_ref->{"first"};    
# i.e. $a[0] = $h{"first"}

The "arrow" operator ( -> ) takes a reference on its left and either an array index (in square brackets) or a hash key (in curly braces) on its right. It locates the array or hash that the reference refers to, and then accesses the appropriate element of it.

Identifying a referent

Because a scalar variable can store a reference to any kind of data, and because dereferencing a reference with the wrong prefix leads to fatal errors, it's sometimes important to be able to determine what type of referent a specific reference refers to. Perl provides a built-in function called ref that takes a scalar and returns a description of the kind of reference it contains. Table 1 summarizes the string that is returned for each type of reference.

If $slr_ref contains... then ref($slr_ref) returns... undef
a reference to a scalar
a reference to an array "ARRAY"
a reference to a hash "HASH"
a reference to a subroutine "CODE"
a reference to a filehandle "IO" or "IO::Handle"
a reference to a typeglob "GLOB"
a reference to a precompiled pattern "Regexp"
a reference to another reference "REF"


Table 1: What ref returns

As Table 1 indicates, you can create references to many kinds of Perl constructs, apart from variables.

If a reference is used in a context where a string is expected, then the ref function is called automatically to produce the expected string, and a unique hexadecimal value (the internal memory address of the thing being referred to) is appended. That means that printing out a reference:

print $hsh_ref, "\n";
produces something like:

HASH(0x10027588)

since each element of print 's argument list is stringified before printing.

The ref function has a vital additional role in object-oriented Perl, where it can be used to identify the class to which a particular object belongs. More on that in a moment.

References, referents, and objects

References and referents matter because they're both required when you come to build objects in Perl. In fact, Perl objects are just referents (i.e. variables or values) that have a special relationship with a particular package. References come into the picture because Perl objects are always accessed via a reference, using an extension of the "arrow" notation.

But that doesn't mean that Perl's object-oriented features are difficult to use (even if you're still unsure of references and referents). To do real, useful, production-strength, object-oriented programming in Perl you only need to learn about one extra function, one straightforward piece of additional syntax, and three very simple rules. Let's start with the rules...

Rule 1: To create a class, build a package

Perl packages already have a number of class-like features:

In Perl, those features are sufficient to allow a package to act like a class.

Suppose you wanted to build an application to track faults in a system. Here's how to declare a class named "Bug" in Perl:

package Bug;
That's it! In Perl, classes are packages. No magic, no extra syntax, just plain, ordinary packages. Of course, a class like the one declared above isn't very interesting or useful, since its objects will have no attributes or behaviour.

That brings us to the second rule...

Rule 2: To create a method, write a subroutine

In object-oriented theory, methods are just subroutines that are associated with a particular class and exist specifically to operate on objects that are instances of that class. In Perl, a subroutine that is declared in a particular package is already associated with that package. So to write a Perl method, you just write a subroutine within the package that is acting as your class.

For example, here's how to provide an object method to print Bug objects:

package Bug;
sub print_me
{
       # The code needed to print the Bug goes here
}

Again, that's it. The subroutine print_me is now associated with the package Bug, so whenever Bug is used as a class, Perl automatically treats Bug::print_me as a method.

Invoking the Bug::print_me method involves that one extra piece of syntax mentioned above-an extension to the existing Perl "arrow" notation. If you have a reference to an object of class Bug, you can access any method of that object by using a -> symbol, followed by the name of the method.

For example, if the variable $nextbug holds a reference to a Bug object, you could call Bug::print_me on that object by writing:

$nextbug->print_me();
Calling a method through an arrow should be very familiar to any C++ programmers; for the rest of us, it's at least consistent with other Perl usages:
$hsh_ref->{"key"};           
# Access the hash referred to by $hashref
$arr_ref->[$index];          
# Access the array referred to by $arrayref
$sub_ref->(@args);           
# Access the sub referred to by $subref

$obj_ref->method(@args);     
# Access the object referred to by $objref


The only difference with the last case is that the referent (i.e. the object) pointed to by $objref has many ways of being accessed (namely, its various methods). So, when you want to access that object, you have to specify which particular way-which method-should be used. Hence, the method name after the arrow.

When a method like Bug::print_me is called, the argument list that it receives begins with the reference through which it was called, followed by any arguments that were explicitly given to the method. That means that calling Bug::print_me("logfile") is not the same as calling $nextbug->print_me("logfile") . In the first case, print_me is treated as a regular subroutine so the argument list passed to Bug::print_me is equivalent to:

( "logfile" )
In the second case, print_me is treated as a method so the argument list is equivalent to:
( $objref, "logfile" )
Having a reference to the object passed as the first parameter is vital, because it means that the method then has access to the object on which it's supposed to operate. Hence you'll find that most methods in Perl start with something equivalent to this:
package Bug;
sub print_me
{
    my ($self) = shift;

    # The @_ array now stores the arguments passed to &Bug::print_me
    # The rest of &print_me uses the data referred to by $self 
    # and the explicit arguments (still in @_)
}
or, better still:
package Bug;
sub print_me
{
    my ($self, @args) = @_;

    # The @args array now stores the arguments passed to &Bug::print_me
    # The rest of &print_me uses the data referred to by $self
    # and the explicit arguments (now in @args)
}
This second version is better because it provides a lexically scoped copy of the argument list ( @args ). Remember that the @_ array is "magical"-changing any element of it actually changes the caller's version of the corresponding argument. Copying argument values to a lexical array like @args prevents nasty surprises of this kind, as well as improving the internal documentation of the subroutine (especially if a more meaningful name than @args is chosen).

The only remaining question is: how do you create the invoking object in the first place?

Rule 3: To create an object, bless a referent

Unlike other object-oriented languages, Perl doesn't require that an object be a special kind of record-like data structure. In fact, you can use any existing type of Perl variable-a scalar, an array, a hash, etc.-as an object in Perl.

Hence, the issue isn't how to create the object, because you create them exactly like any other Perl variable: declare them with a my , or generate them anonymously with a [ ... ] or { ... } . The real problem is how to tell Perl that such an object belongs to a particular class. That brings us to the one extra built-in Perl function you need to know about. It's called bless , and its only job is to mark a variable as belonging to a particular class.

The bless function takes two arguments: a reference to the variable to be marked, and a string containing the name of the class. It then sets an internal flag on the variable, indicating that it now belongs to the class.

For example, suppose that $nextbug actually stores a reference to an anonymous hash:

$nextbug = {
                id    => "00001",
                type  => "fatal",
                descr => "application does not compile",
           };
To turn that anonymous hash into an object of class Bug you write:
bless $nextbug, "Bug";
And, once again, that's it! The anonymous array referred to by $nextbug is now marked as being an object of class Bug. Note that the variable $nextbug itself hasn't been altered in any way; only the nameless hash it refers to has been marked. In other words, bless sanctified the referent, not the reference. Figure 3 illustrates where the new class membership flag is set.

You can check that the blessing succeeded by applying the built-in ref function to $nextbug . As explained above, when ref is applied to a reference, it normally returns the type of that reference. Hence, before $nextbug was blessed, ref($nextbug) would have returned the string 'HASH' .

Once an object is blessed, ref returns the name of its class instead. So after the blessing, ref($nextbug) will return 'Bug' . Of course the object itself still is a hash, but now it's a hash that belongs to the Bug class. The various entries of the hash become the attributes of the newly created Bug object.

[A picture of an anonymous hash having a flag set within it]
Figure 3: What changes when an object is blessed

Creating a constructor

Given that you're likely to want to create many such Bug objects, it would be convenient to have a subroutine that took care of all the messy, blessy details. You could pass it the necessary information, and it would then wrap it in an anonymous hash, bless the hash, and give you back a reference to the resulting object.

And, of course, you might as well put such a subroutine in the Bug package itself, and call it something that indicates its role. Such a subroutine is known as a constructor, and it generally looks like this:

package Bug;
sub new
{
    my $class = $_[0];
    my $objref = {
                     id    => $_[1],
                     type  => $_[2],
                     descr => $_[3],
                 };
    bless $objref, $class;
    return $objref;
}
Note that the middle bits of the subroutine (in bold) look just like the raw blessing that was handed out to $nextbug in the previous example.

The bless function is set up to make writing constructors like this a little easier. Specifically, it returns the reference that's passed as its first argument (i.e. the reference to whatever referent it just blessed into object-hood). And since Perl subroutines automatically return the value of their last evaluated statement, that means that you could condense the definition of Bug::new to this:

sub Bug::new
{
        bless { id => $_[1], type => $_[2], descr => $_[3] }, $_[0];
}
This version has exactly the same effects: slot the data into an anonymous hash, bless the hash into the class specified first argument, and return a reference to the hash.

Regardless of which version you use, now whenever you want to create a new Bug object, you can just call:

$nextbug = Bug::new("Bug", $id, $type, $description);
That's a little redundant, since you have to type "Bug" twice. Fortunately, there's another feature of the "arrow" method-call syntax that solves this problem. If the operand to the left of the arrow is the name of a class -rather than an object reference-then the appropriate method of that class is called. More importantly, if the arrow notation is used, the first argument passed to the method is a string containing the class name. That means that you could rewrite the previous call to Bug::new like this:
$nextbug = Bug->new($id, $type, $description);
There are other benefits to this notation when your class uses inheritance, so you should always call constructors and other class methods this way.

Method enacting

Apart from encapsulating the gory details of object creation within the class itself, using a class method like this to create objects has another big advantage. If you abide by the convention of only ever creating new Bug objects by calling Bug::new , you're guaranteed that all such objects will always be hashes. Of course, there's nothing to prevent us from "manually" blessing arrays, or scalars as Bug objects, but it turns out to make life much easier if you stick to blessing one type of object into each class.

For example, if you can be confident that any Bug object is going to be a blessed hash, you can (finally!) fill in the missing code in the Bug:: print_me method:

package Bug;
sub print_me
{
    my ($self) = @_;
    print "ID: $self->{id}\n";
    print "$self->{descr}\n";
    print "(Note: problem is fatal)\n" if $self->{type} eq "fatal";
}
Now, whenever the print_me method is called via a reference to any hash that's been blessed into the Bug class, the $self variable extracts the reference that was passed as the first argument and then the print statements access the various entries of the blessed hash.

Till death us do part...

Objects sometimes require special attention at the other end of their lifespan too. Most object-oriented languages provide the ability to specify a subroutine that is called automatically when an object ceases to exist. Such subroutines are usually called destructors , and are used to undo any side-effects caused by the previous existence of an object. That may include:

In Perl, you can set up a destructor for a class by defining a subroutine named DESTROY in the class's package. Any such subroutine is automatically called on an object of that class, just before that object's memory is reclaimed. Typically, this happens when the last variable holding a reference to the object goes out of scope, or has another value assigned to it.

For example, you could provide a destructor for the Bug class like this:

package Bug;
# other stuff as before

sub DESTROY
{
        my ($self) = @_;
        print "<< Squashed the bug: $self->{id} >>\n\n";
}

Now, every time an object of class Bug is about to cease to exist, that object will automatically have its DESTROY method called, which will print an epitaph for the object. For example, the following code:
package main;
use Bug;

open BUGDATA, "Bug.dat" or die "Couldn't find Bug data";

while (<BUGDATA>)
{
    my @data = split ',', $_;       
# extract comma-separated Bug data
    my $bug = Bug->new(@data);      
# create a new Bug object
    $bug->print_me();               
# print it out
} 

print "(end of list)\n";
prints out something like this:
ID: HW000761
"Cup holder" broken
Note: problem is fatal
<< Squashed the bug HW000761 >>

ID: SW000214
Word processor trashing disk after 20 saves.
<< Squashed the bug SW000214 >> 

ID: OS000633
Can't change background colour (blue) on blue screen of death.
<< Squashed the bug OS000633 >> 

(end of list)
That's because, at the end of each iteration of the while loop, the lexical variable $bug goes out of scope, taking with it the only reference to the Bug object created earlier in the same loop. That object's reference count immediately becomes zero and, because it was blessed, the corresponding DESTROY method (i.e. Bug::DESTROY ) is automatically called on the object.

Where to from here?

Of course, these fundamental techniques only scratch the surface of object-oriented programming in Perl. Simple hash-based classes with methods, constructors, and destructors may be enough to let you solve real problems in Perl, but there's a vast array of powerful and labor-saving techniques you can add to those basic components: autoloaded methods, class methods and class attributes, inheritance and multiple inheritance, polymorphism, multiple dispatch, enforced encapsulation, operator overloading, tied objects, genericity, and persistence.

Perl's standard documentation includes plenty of good material- perlref , perlreftut , perlobj , perltoot , perltootc , and perlbot to get you started. But if you're looking for a comprehensive tutorial on everything you need to know, you may also like to consider my new book, Object Oriented Perl , from which this article has been adapted.

[Jun 28, 2017] Whats Wrong with sort and How to Fix It by Tom Christiansen

Unicode pose some tricky problems... Perl 5.14 introduced unicode_strings feature Perl.com June 2011 Archives
Aug 31, 2011 | www.perl.com
By now, you may have read Considerations on Using Unicode Properly in Modern Perl Applications . Still think doing things correctly is easy? Tom Christiansen demonstrates that even sorting can be trickier than you think.

NOTE : The following is an excerpt from the draft manuscript of Programming Perl , 4ᵗʰ edition

Calling sort without a comparison function is quite often the wrong thing to do, even on plain text. That's because if you use a bare sort, you can get really strange results. It's not just Perl either: almost all programming languages work this way, even the shell command. You might be surprised to find that with this sort of nonsense sort, B comes before a not after it, comes before d, and ff comes after zz. There's no end to such silliness, either; see the default sort tables at the end of this article to see what I mean.

There are situations when a bare sort is appropriate, but fewer than you think. One scenario is when every string you're sorting contains nothing but the 26 lowercase (or uppercase, but not both) Latin letters from a-z, without any whitespace or punctuation.

Another occasion when a simple, unadorned sort is appropriate is when you have no other goal but to iterate in an order that is merely repeatable, even if that order should happen to be completely arbitrary. In other words, yes, it's garbage, but it's the same garbage this time as it was last time. That's because the default sort resorts to an unmediated cmp operator, which has the "predictable garbage" characteristics I just mentioned.

The last situation is much less frequent than the first two. It requires that the things you're sorting be special‐purpose, dedicated binary keys whose bit sequences have with excruciating care been arranged to sort in some prescribed fashion. This is also the strategy for any reasonable use of the cmp operator.

So what's wrong with sort anyway?

I know, I know. I can hear everyone saying, "But it's called sort , so how could that ever be wrong?" Sure it's called sort , but you still have to know how to use it to get useful results out. Probably the most surprising thing about sort is that it does not by default do an alphabetic, an alphanumeric, or a numeric sort. What it actually does is something else altogether, and that something else is of surprisingly limited usefulness.

Imagine you have an array of records. It does you virtually no good to write:

@sorted_recs = sort @recs;

Because Perl's cmp operator does only a bit comparison not an alphabetic one, it does nearly as little good to write your record sort this way:

@srecs = sort {
    $b->{AGE}      <=>  $b->{AGE}
                   ||
    $a->{SURNAME}  cmp  $b->{SURNAME}
} @recs;

The problem is that that cmp for the record's SURNAME field is not an alphabetic comparison. It's merely a code point comparison. That means it works like C's strcmp function or Java's String.compareTo method. Although commonly referred to as a "lexicographic" comparison, this is a gross misnomer: it's about as far away from the way real lexicographers sort dictionary entries as you can get without flipping a coin.

Fortunately, you don't have to come up with your own algorithm for dictionary sorting, because Perl provides a standard class to do this for you: Unicode::Collate . Don't let the name throw you, because while it was first invented for Unicode, it works great on regular ASCII text, too, and does a better job at making lexicographers happy than a plain old sort ever manages.

If you have code that purports to sort text that looks like this:

@sorted_lines = sort @lines;

Then all you have to get a dictionary sort is write instead:

use Unicode::Collate;
@sorted_lines = Unicode::Collate::->new->sort(@lines);

For structured records, like those with ages and surnames in them, you have to be a bit fancier. One way to fix it would be to use the class's own cmp operator instead of the built‐in one.

use Unicode::Collate;
my $collator = Unicode::Collate::->new();
@srecs = sort {
    $b->{AGE}  <=>  $b->{AGE}
          ||
    $collator->cmp( $a->{SURNAME}, $b->{SURNAME} )
} @recs;

However, that makes a fairly expensive method call for every possible comparison. Because Perl's adaptive merge sort algorithm usually runs in O(n log n) time given n items, and because each comparison requires two different computed keys, that can be a lot of duplicate effort. Our sorting class therefore provide a convenient getSortKey method that calculates a special binary key which you can cache and later pass to the normal cmp operator on your own. This trick lets you use cmp yet get a truly alphabetic sort out of it for a change.

Here is a simple but sufficient example of how to do that:

use Unicode::Collate;
my $collator = Unicode::Collate::->new();

# first calculate the magic sort key for each text field, and cache it
for my $rec (@recs) {
    $rec->{SURNAME_key} = $collator->getSortKey( $rec->{SURNAME} );
} 

# now sort the records as before, but for the surname field,
# use the cached sort key instead
@srecs = sort {
    $b->{AGE}          <=>  $b->{AGE}
                      ||
    $a->{SURNAME_key}  cmp  $b->{SURNAME_key}
} @recs;

That's what I meant about very carefully preparing a mediated sort key that contains the precomputed binary key.

English Card Catalogue Sorts

The simple code just demonstrated assumes you want to sort names the same way you do regular text. That isn't a good assumption, however. Many countries, languages, institutions, and sometimes even librarians have their own notions about how a card catalogue or a phonebook ought to be sorted.

For example, in the English language, surnames with Scottish patronymics starting with Mc or Mac, like MacKinley and McKinley , not only count as completely identical synonyms for sorting purposes, they go before any other surname that begins with M, and so precede surnames like Mables or Machado .

Yes, really.

That means that the following names are sorted correctly -- for English:

Lewis, C.S.
McKinley, Bill
MacKinley, Ron
Mables, Martha
Machado, Jos
Macon, Bacon

Yes, it's true. Check out your local large English‐language bookseller or library -- presuming you can find one. If you do, best make sure to blow the dust off first.

Sorting Spanish Names

It's a good thing those names follow English rules for sorting names. If this were Spanish, we would have to deal with double‐barrelled surnames, where the patronym sorts before the matronym, which in turn sorts before any given names. That means that if Seor Machado's full name were, like the poet's, Antonio Cipriano Jos Mara y Francisco de Santa Ana Machado y Ruiz , then you would have to sort him with the other Machados but then consider Ruiz before Antonio if there were any other Machados . Similarly, the poet Federico del Sagrado Corazn de Jess Garca Lorca sorts before the writer Gabriel Jos de la Concordia Garca Mrquez .

On the other hand, if your records are not full multifield hashes but only simple text that don't happen to be surnames, your task is a lot simpler, since now all you have to is get the cmp operator to behave sensibly. That you can do easily enough this way:

use Unicode::Collate;
@sorted_text = Unicode::Collate::->new->sort(@text);

Sorting Text, Not Binary

Imagine you had this list of German‐language authors:

@germans = qw{
    Bll
    Born
    Bhme
    Bodmer
    Brandis
    Bttcher
    Borchert
    Bobrowski
};

If you just sorted them with an unmediated sort operator, you would get this utter nonsense:

Bobrowski
Bodmer
Borchert
Born
Brandis
Brant
Bhme
Bll
Bttcher

Or maybe this equally nonsensical answer:

Bobrowski
Bodmer
Borchert
Born
Bll
Brandis
Brant
Bhme
Bttcher

Or even this still completely nonsensical answer:

Bobrowski
Bodmer
Borchert
Born
Bhme
Bll
Brandis
Brant
Bttcher

The crucial point to all that is that it's text not binary , so not only can you never judge what its bit patterns hold just by eyeballing it, more importantly, it has special rules to make it sort alphabetically (some might say sanely), an ordering no nave code‐point sort will never come even close to getting right, especially on Unicode.

The correct ordering is:

Bobrowski
Bodmer
Bhme
Bll
Borchert
Born
Bttcher
Brandis
Brant

And that is precisely what

use Unicode::Collate;
@sorted_germans = Unicode::Collate::->new->sort(@german_names);

gives you: a correctly sorted list of those Germans' names.

Sorting German Names

Hold on, though.

Correct in what language? In English, yes, the order given is now correct. But considering that these authors wrote in the German language, it is quite conceivable that you should be following the rules for ordering German names in German , not in English. That produces this ordering:

Bobrowski
Bodmer
Bhme
Bll
Bttcher
Borchert
Born
Brandis
Brant

How come Bttcher now came before Borchert ? Because Bttcher is supposed to be the same as Boettcher . In a German phonebook or other German list of German names, things like and oe are considered synonyms, which is not at all how it works in English. To get the German phonebook sort, you merely have to modify your constructor this way:

use Unicode::Collate::Locale;
@sorted_germans = Unicode::Collate::Locale::
                      ->new(locale => "de_phonebook")
                      ->sort(@german_names);

Isn't this fun?

Be glad you're not sorting names. Sorting names is hard.

Default Sort Tables

Here are most of the Latin letters, ordered using the default sort :

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j 
k l m n o p q r s t u v w x y z                     
                                    
        Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě 
Ĝ ĝ Ğ ğ Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ 
ŀ Ł ł Ń ń Ņ ņ Ň ň Ŋ ŋ Ō ō Ŏ ŏ Ő ő   Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş   Ţ ţ Ť 
ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ  Ź ź Ż ż   ſ ƀ Ɓ Ƃ ƃ Ƈ ƈ Ɖ Ɗ Ƌ 
ƌ ƍ Ǝ Ə Ɛ Ƒ  Ɠ Ɣ ƕ Ɩ Ɨ Ƙ ƙ ƚ ƛ Ɯ Ɲ ƞ Ƥ ƥ Ʀ ƫ Ƭ ƭ Ʈ Ư ư Ʊ Ʋ Ƴ ƴ Ƶ ƶ Ʒ Ƹ 
ƹ ƺ ƾ ƿ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ Ǡ ǡ Ǣ ǣ 
Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ ǰ DZ Dz dz Ǵ ǵ Ƿ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ 
ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ Ȑ ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ Ș ș Ț ț Ȝ ȝ Ȟ ȟ Ƞ ȡ Ȥ ȥ Ȧ ȧ Ȩ ȩ Ȫ ȫ Ȭ ȭ Ȯ 
ȯ Ȱ ȱ Ȳ ȳ ȴ ȵ ȶ ȷ Ⱥ Ȼ ȼ Ƚ Ⱦ ɐ ɑ ɒ ɓ ɕ ɖ ɗ ɘ ə ɚ ɛ ɜ ɝ ɞ ɟ ɠ ɡ ɢ ɣ ɤ ɥ ɦ 
ɧ ɨ ɩ ɪ ɫ ɬ ɭ ɮ ɯ ɰ ɱ ɲ ɳ ɴ ɶ ɹ ɺ ɻ ɼ ɽ ɾ ɿ ʀ ʁ ʂ ʃ ʄ ʅ ʆ ʇ ʈ ʉ ʊ ʋ ʌ ʍ 
ʎ ʏ ʐ ʑ ʒ ʓ ʙ ʚ ʛ ʜ ʝ ʞ ʟ ʠ ʣ ʤ ʥ ʦ ʧ ʨ ʩ ʪ ʫ ˡ ˢ ˣ ᴀ ᴁ ᴂ ᴃ ᴄ ᴅ ᴆ ᴇ ᴈ ᴉ 
ᴊ ᴋ ᴌ ᴍ ᴎ ᴏ ᴑ ᴓ ᴔ ᴘ ᴙ ᴚ ᴛ ᴜ ᴝ ᴞ ᴟ ᴠ ᴡ ᴢ ᴣ ᴬ ᴭ ᴮ ᴯ ᴰ ᴱ ᴲ ᴳ ᴴ ᴵ ᴶ ᴷ ᴸ ᴹ ᴺ 
ᴻ ᴼ ᴾ ᴿ ᵀ ᵁ ᵂ ᵃ ᵄ ᵅ ᵆ ᵇ ᵈ ᵉ ᵊ ᵋ ᵌ ᵍ ᵎ ᵏ ᵐ ᵑ ᵒ ᵖ ᵗ ᵘ ᵙ ᵚ ᵛ ᵢ ᵣ ᵤ ᵥ ᵫ ᵬ ᵭ 
ᵮ ᵯ ᵰ ᵱ ᵲ ᵳ ᵴ ᵵ ᵶ Ḁ ḁ Ḃ ḃ Ḅ ḅ Ḇ ḇ Ḉ ḉ Ḋ ḋ Ḍ ḍ Ḏ ḏ Ḑ ḑ Ḓ ḓ Ḕ ḕ Ḗ ḗ Ḙ ḙ Ḛ 
ḛ Ḝ ḝ Ḟ ḟ Ḡ ḡ Ḣ ḣ Ḥ ḥ Ḧ ḧ Ḩ ḩ Ḫ ḫ Ḭ ḭ Ḯ ḯ Ḱ ḱ Ḳ ḳ Ḵ ḵ Ḷ ḷ Ḹ ḹ Ḻ ḻ Ḽ ḽ Ḿ 
ḿ Ṁ ṁ Ṃ ṃ Ṅ ṅ Ṇ ṇ Ṉ ṉ Ṋ ṋ Ṍ ṍ Ṏ ṏ Ṑ ṑ Ṓ ṓ Ṕ ṕ Ṗ ṗ Ṙ ṙ Ṛ ṛ Ṝ ṝ Ṟ ṟ Ṡ ṡ Ṣ 
ṣ Ṥ ṥ Ṧ ṧ Ṩ ṩ Ṫ ṫ Ṭ ṭ Ṯ ṯ Ṱ ṱ Ṳ ṳ Ṵ ṵ Ṷ ṷ Ṹ ṹ Ṻ ṻ Ṽ ṽ Ṿ ṿ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ẇ 
ẇ Ẉ ẉ Ẋ ẋ Ẍ ẍ Ẏ ẏ Ẑ ẑ Ẓ ẓ Ẕ ẕ ẖ ẗ ẘ ẙ ẚ ẛ ẞ ẟ Ạ ạ Ả ả Ấ ấ Ầ ầ Ẩ ẩ Ẫ ẫ Ậ 
ậ Ắ ắ Ằ ằ Ẳ ẳ Ẵ ẵ Ặ ặ Ẹ ẹ Ẻ ẻ Ẽ ẽ Ế ế Ề ề Ể ể Ễ ễ Ệ ệ Ỉ ỉ Ị ị Ọ ọ Ỏ ỏ Ố 
ố Ồ ồ Ổ ổ Ỗ ỗ Ộ ộ Ớ ớ Ờ ờ Ở ở Ỡ ỡ Ợ ợ Ụ ụ Ủ ủ Ứ ứ Ừ ừ Ử ử Ữ ữ Ự ự Ỳ ỳ Ỵ 
ỵ Ỷ ỷ Ỹ ỹ K Å Ⅎ ⅎ Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ Ⅼ Ⅽ Ⅾ Ⅿ ⅰ ⅱ ⅲ ⅳ ⅴ 
ⅵ ⅶ ⅷ ⅸ ⅹ ⅺ ⅻ ⅼ ⅽ ⅾ ⅿ ff fi fl ffi ffl ſt st A B C D E F G H I
J K L M N O P Q R S T U V W X Y Z a b c d e f g h i
j k l m n o p q r s t u v w x y z

As you can see, those letters are scattered all over the place. Sure, it's not completely random, but it's not useful either, because it is full of arbitrary placement that makes no alphabetical sense. That's because it is not an alphabetic sort at all. However, with the special kind of sort I've just shown you above, the ones that call the sort method from the Unicode::Collate class, you do get an alphabetic sort. Using that method, the Latin letters I just showed you now come out in alphabetical order, which is like this:

a a A A  ᵃ ᴬ     ă Ă ắ Ắ ằ Ằ ẵ Ẵ ẳ Ẳ   ấ Ấ ầ Ầ ẫ Ẫ ẩ Ẩ ǎ Ǎ   
Å ǻ Ǻ   ǟ Ǟ   ȧ Ȧ ǡ Ǡ ą Ą ā Ā ả Ả ȁ Ȁ ȃ Ȃ ạ Ạ ặ Ặ ậ Ậ ḁ Ḁ   ᴭ ǽ Ǽ 
ǣ Ǣ ẚ ᴀ Ⱥ ᴁ ᴂ ᵆ ɐ ᵄ ɑ ᵅ ɒ b b B B ᵇ ᴮ ḃ Ḃ ḅ Ḅ ḇ Ḇ ʙ ƀ ᴯ ᴃ ᵬ ɓ Ɓ ƃ Ƃ c 
c ⅽ C C Ⅽ ć Ć ĉ Ĉ č Č ċ Ċ   ḉ Ḉ ᴄ ȼ Ȼ ƈ Ƈ ɕ d d ⅾ D D Ⅾ ᵈ ᴰ ď Ď ḋ 
Ḋ ḑ Ḑ ḍ Ḍ ḓ Ḓ ḏ Ḏ đ Đ   dz ʣ Dz DZ dž Dž DŽ ʥ ʤ ᴅ ᴆ ᵭ ɖ Ɖ ɗ Ɗ ƌ Ƌ ȡ ẟ e e E 
E ᵉ ᴱ     ĕ Ĕ   ế Ế ề Ề ễ Ễ ể Ể ě Ě   ẽ Ẽ ė Ė ȩ Ȩ ḝ Ḝ ę Ę ē Ē ḗ 
Ḗ ḕ Ḕ ẻ Ẻ ȅ Ȅ ȇ Ȇ ẹ Ẹ ệ Ệ ḙ Ḙ ḛ Ḛ ᴇ ǝ Ǝ ᴲ ə Ə ᵊ ɛ Ɛ ᵋ ɘ ɚ ɜ ᴈ ᵌ ɝ ɞ ʚ ɤ 
f f F F ḟ Ḟ ff ffi ffl fi fl ʩ ᵮ  Ƒ ⅎ Ⅎ g g G G ᵍ ᴳ ǵ Ǵ ğ Ğ ĝ Ĝ ǧ Ǧ ġ Ġ ģ 
Ģ ḡ Ḡ ɡ ɢ ǥ Ǥ ɠ Ɠ ʛ ɣ Ɣ h h H H ᴴ ĥ Ĥ ȟ Ȟ ḧ Ḧ ḣ Ḣ ḩ Ḩ ḥ Ḥ ḫ Ḫ ẖ ħ Ħ ʜ 
ƕ ɦ ɧ i i ⅰ I I Ⅰ ᵢ ᴵ     ĭ Ĭ   ǐ Ǐ   ḯ Ḯ ĩ Ĩ İ į Į ī Ī ỉ Ỉ ȉ 
Ȉ ȋ Ȋ ị Ị ḭ Ḭ ⅱ Ⅱ ⅲ Ⅲ ij IJ ⅳ Ⅳ ⅸ Ⅸ ı ɪ ᴉ ᵎ ɨ Ɨ ɩ Ɩ j j J J ᴶ ĵ Ĵ ǰ ȷ ᴊ 
ʝ ɟ ʄ k k K K K ᵏ ᴷ ḱ Ḱ ǩ Ǩ ķ Ķ ḳ Ḳ ḵ Ḵ ᴋ ƙ Ƙ ʞ l l ⅼ L L Ⅼ ˡ ᴸ ĺ Ĺ 
ľ Ľ ļ Ļ ḷ Ḷ ḹ Ḹ ḽ Ḽ ḻ Ḻ ł Ł ŀ Ŀ lj Lj LJ ʪ ʫ ʟ ᴌ ƚ Ƚ ɫ ɬ ɭ ȴ ɮ ƛ ʎ m m ⅿ M 
M Ⅿ ᵐ ᴹ ḿ Ḿ ṁ Ṁ ṃ Ṃ ᴍ ᵯ ɱ n n N N ᴺ ń Ń ǹ Ǹ ň Ň   ṅ Ṅ ņ Ņ ṇ Ṇ ṋ Ṋ ṉ 
Ṉ nj Nj NJ ɴ ᴻ ᴎ ᵰ ɲ Ɲ ƞ Ƞ ɳ ȵ ŋ Ŋ ᵑ o o O O  ᵒ ᴼ     ŏ Ŏ   ố Ố ồ 
Ồ ỗ Ỗ ổ Ổ ǒ Ǒ   ȫ Ȫ ő Ő   ṍ Ṍ ṏ Ṏ ȭ Ȭ ȯ Ȯ ȱ Ȱ   ǿ Ǿ ǫ Ǫ ǭ Ǭ ō Ō ṓ 
Ṓ ṑ Ṑ ỏ Ỏ ȍ Ȍ ȏ Ȏ ớ Ớ ờ Ờ ỡ Ỡ ở Ở ợ Ợ ọ Ọ ộ Ộ   ᴏ ᴑ ɶ ᴔ ᴓ p p P P ᵖ 
ᴾ ṕ Ṕ ṗ Ṗ ᴘ ᵱ ƥ Ƥ q q Q Q ʠ ĸ r r R R ᵣ ᴿ ŕ Ŕ ř Ř ṙ Ṙ ŗ Ŗ ȑ Ȑ ȓ Ȓ ṛ 
Ṛ ṝ Ṝ ṟ Ṟ ʀ Ʀ ᴙ ᵲ ɹ ᴚ ɺ ɻ ɼ ɽ ɾ ᵳ ɿ ʁ s s S S ˢ ś Ś ṥ Ṥ ŝ Ŝ   ṧ Ṧ ṡ 
Ṡ ş Ş ṣ Ṣ ṩ Ṩ ș Ș ſ ẛ  ẞ st ſt ᵴ ʂ ʃ ʅ ʆ t t T T ᵗ ᵀ ť Ť ẗ ṫ Ṫ ţ Ţ ṭ Ṭ 
ț Ț ṱ Ṱ ṯ Ṯ ʨ ƾ ʦ ʧ ᴛ ŧ Ŧ Ⱦ ᵵ ƫ ƭ Ƭ ʈ Ʈ ȶ ʇ u u U U ᵘ ᵤ ᵁ     ŭ Ŭ 
  ǔ Ǔ ů Ů   ǘ Ǘ ǜ Ǜ ǚ Ǚ ǖ Ǖ ű Ű ũ Ũ ṹ Ṹ ų Ų ū Ū ṻ Ṻ ủ Ủ ȕ Ȕ ȗ Ȗ ư Ư 
ứ Ứ ừ Ừ ữ Ữ ử Ử ự Ự ụ Ụ ṳ Ṳ ṷ Ṷ ṵ Ṵ ᴜ ᴝ ᵙ ᴞ ᵫ ʉ ɥ ɯ Ɯ ᵚ ᴟ ɰ ʊ Ʊ v v ⅴ V 
V Ⅴ ᵛ ᵥ ṽ Ṽ ṿ Ṿ ⅵ Ⅵ ⅶ Ⅶ ⅷ Ⅷ ᴠ ʋ Ʋ ʌ w w W W ᵂ ẃ Ẃ ẁ Ẁ ŵ Ŵ ẘ ẅ Ẅ ẇ Ẇ ẉ 
Ẉ ᴡ ʍ x x ⅹ X X Ⅹ ˣ ẍ Ẍ ẋ Ẋ ⅺ Ⅺ ⅻ Ⅻ y y Y Y   ỳ Ỳ ŷ Ŷ ẙ   ỹ Ỹ ẏ 
Ẏ ȳ Ȳ ỷ Ỷ ỵ Ỵ ʏ ƴ Ƴ z z Z Z ź Ź ẑ Ẑ   ż Ż ẓ Ẓ ẕ Ẕ ƍ ᴢ ƶ Ƶ ᵶ ȥ Ȥ ʐ ʑ 
ʒ Ʒ ǯ Ǯ ᴣ ƹ Ƹ ƺ ʓ ȝ Ȝ   ƿ Ƿ

Isn't that much nicer?

Romani Ite Domum

In case you're wondering what that last row of distinctly un‐Roman Latin letters might possibly be, they're called respectively ezh ʒ, yogh ȝ, thorn , and wynn ƿ. They had to go somewhere, so they ended up getting stuck after z

Some are still used in certain non‐English (but still Latin) alphabets today, such as Icelandic, and even though you probably won't bump into them in contemporary English texts, you might see some if you're reading the original texts of famous medieval English poems like Beowulf , Sir Gawain and the Green Knight , or Brut .

The last of those, Brut , was written by a fellow named Laȝamon , a name whose third letter is a yogh. Famous though he was, I wouldn't suggest changing your name to Laȝamon in his honor, as I doubt the phone company would be amused.

[Jun 18, 2017] Making Perl Reusable with Modules

Notable quotes:
"... Figure 1. Creating the resultant of 5 and 3 ..."
"... Music-Resultant ..."
"... Music-Resultant ..."
"... Music-Resultant/lib/Music/Resultant.pm ..."
"... Resultant.pm ..."
"... Music-Resultant ..."
"... Music-Resultant/t/00-load.t ..."
"... Music-Resultant/ ..."
Jun 18, 2017 | www.perl.com
By Andy Sylvester on August 7, 2007 12:00 AM
Perl software development can occur at several levels. When first developing the idea for an application, a Perl developer may start with a short program to flesh out the necessary algorithms. After that, the next step might be to create a package to support object-oriented development. The final work is often to create a Perl module for the package to make the logic available to all parts of the application. Andy Sylvester explores this topic with a simple mathematical function. Creating a Perl Subroutine

I am working on ideas for implementing some mathematical concepts for a method of composing music. The ideas come from the work of Joseph Schillinger . At the heart of the method is being able to generate patterns using mathematical operations and using those patterns in music composition. One of the basic operations described by Schillinger is creating a "resultant," or series of numbers, based on two integers (or "generators"). Figure 1 shows a diagram of how to create the resultant of the integers 5 and 3.

creating the resultant of 5 and 3
Figure 1. Creating the resultant of 5 and 3

Figure 1 shows two line patterns with units of 5 and units of 3. The lines continue until both lines come down (or "close") at the same time. The length of each line corresponds to the product of the two generators (5 x 3 = 15). If you draw dotted lines down from where each of the two generator lines change state, you can create a third line that changes state at each of the dotted line points. The lengths of the segments of the third line make up the resultant of the integers 5 and 3 (3, 2, 1, 3, 1, 2, 3).

Schillinger used graph paper to create resultants in his System of Musical Composition . However, another convenient way of creating a resultant is to calculate the modulus of a counter and then calculate a term in the resultant series based on the state of the counter. An algorithm to create the terms in a resultant might resemble:

Read generators from command line
Determine total number of counts for resultant
   (major_generator * minor_generator)
Initialize resultant counter = 0
For MyCounts from 1 to the total number of counts
   Get the modulus of MyCounts to the major and minor generators
   Increment the resultant counter
   If either modulus = 0
     Save the resultant counter to the resultant array
     Re-initialize resultant counter = 0
   End if
End for

From this design, I wrote a short program using the Perl modulus operator ( % ):

#!/usr/bin/perl
#*******************************************************
#
# FILENAME: result01.pl
#
# USAGE: perl result01.pl major_generator minor_generator
#
# DESCRIPTION:
#    This Perl script will generate a Schillinger resultant
#    based on two integers for the major generator and minor
#    generator.
#
#    In normal usage, the user will input the two integers
#    via the command line. The sequence of numbers representing
#    the resultant will be sent to standard output (the console
#    window).
#
# INPUTS:
#    major_generator - First generator for the resultant, input
#                      as the first calling argument on the
#                      command line.
#
#    minor_generator - Second generator for the resultant, input
#                      as the second calling argument on the
#                      command line.
#
# OUTPUTS:
#    resultant - Sequence of numbers written to the console window
#
#**************************************************************

   use strict;
   use warnings;

   my $major_generator = $ARGV[0];
   my $minor_generator = $ARGV[1];

   my $total_counts   = $major_generator * $minor_generator;
   my $result_counter = 0;
   my $major_mod      = 0;
   my $minor_mod      = 0;
   my $i              = 0;
   my $j              = 0;
   my @resultant;

   print "Generator Total = $total_counts\n";

   while ($i < $total_counts) {
       $i++;
       $result_counter++;
       $major_mod = $i % $major_generator;
       $minor_mod = $i % $minor_generator;
       if (($major_mod == 0) || ($minor_mod == 0)) {
          push(@resultant, $result_counter);
          $result_counter = 0;
       }
       print "$i \n";
       print "Modulus of $major_generator is $major_mod \n";
       print "Modulus of $minor_generator is $minor_mod \n";
   }

   print "\n";
   print "The resultant is @resultant \n";

Run the program with 5 and 3 as the inputs ( perl result01.pl 5 3 ):

Generator Total = 15
1
Modulus of 5 is 1
Modulus of 3 is 1
2
Modulus of 5 is 2
Modulus of 3 is 2
3
Modulus of 5 is 3
Modulus of 3 is 0
4
Modulus of 5 is 4
Modulus of 3 is 1
5
Modulus of 5 is 0
Modulus of 3 is 2
6
Modulus of 5 is 1
Modulus of 3 is 0
7
Modulus of 5 is 2
Modulus of 3 is 1
8
Modulus of 5 is 3
Modulus of 3 is 2
9
Modulus of 5 is 4
Modulus of 3 is 0
10
Modulus of 5 is 0
Modulus of 3 is 1
11
Modulus of 5 is 1
Modulus of 3 is 2
12
Modulus of 5 is 2
Modulus of 3 is 0
13
Modulus of 5 is 3
Modulus of 3 is 1
14
Modulus of 5 is 4
Modulus of 3 is 2
15
Modulus of 5 is 0
Modulus of 3 is 0

The resultant is 3 2 1 3 1 2 3

This result matches the resultant terms as shown in the graph in Figure 1, so it looks like the program generates the correct output.

Creating a Perl Package from a Program

With a working program, you can create a Perl package as a step toward being able to reuse code in a larger application. The initial program has two pieces of input data (the major generator and the minor generator). The single output is the list of numbers that make up the resultant. These three pieces of data could be combined in an object. The program could easily become a subroutine to generate the terms in the resultant. This could be a method in the class contained in the package. Creating a class implies adding a constructor method to create a new object. Finally, there should be some methods to get the major generator and minor generator from the object to use in generating the resultant (see the perlboot and perltoot tutorials for background on object-oriented programming in Perl).

From these requirements, the resulting package might be:

#!/usr/bin/perl
#*******************************************************
#
# Filename: result01a.pl
#
# Description:
#    This Perl script creates a class for a Schillinger resultant
#    based on two integers for the major generator and the
#    minor generator.
#
# Class Name: Resultant
#
# Synopsis:
#
# use Resultant;
#
# Class Methods:
#
#   $seq1 = Resultant ->new(5, 3)
#
#      Creates a new object with a major generator of 5 and
#      a minor generator of 3. These parameters need to be
#      initialized when a new object is created, as there
#      are no methods to set these elements within the object.
#
#   $seq1->generate()
#
#      Generates a resultant and saves it in the ResultList array
#
# Object Data Methods:
#
#   $major_generator = $seq1->get_major()
#
#      Returns the major generator
#
#   $minor_generator = $seq1->get_minor()
#
#      Returns the minor generator
#
#
#**************************************************************

{ package Resultant;
  use strict;
  sub new {
    my $class           = shift;
    my $major_generator = shift;
    my $minor_generator = shift;

    my $self = {Major => $major_generator,
                Minor => $minor_generator,
                ResultantList => []};

    bless $self, $class;
    return $self;
  }

  sub get_major {
    my $self = shift;
    return $self->{Major};
  }

  sub get_minor {
    my $self = shift;
    return $self->{Minor};
  }

  sub generate {
    my $self         = shift;
    my $total_counts = $self->get_major * $self->get_minor;
    my $i            = 0;
    my $major_mod;
    my $minor_mod;
    my @result;
    my $result_counter = 0;

   while ($i < $total_counts) {
       $i++;
       $result_counter++;
       $major_mod = $i % $self->get_major;
       $minor_mod = $i % $self->get_minor;

       if (($major_mod == 0) || ($minor_mod == 0)) {
          push(@result, $result_counter);
          $result_counter = 0;
       }
   }

   @{$self->{ResultList}} = @result;
  }
}

#
# Test code to check out class methods
#

# Counter declaration
my $j;

# Create new object and initialize major and minor generators
my $seq1 = Resultant->new(5, 3);

# Print major and minor generators
print "The major generator is ", $seq1->get_major(), "\n";
print "The minor generator is ", $seq1->get_minor(), "\n";

# Generate a resultant
$seq1->generate();

# Print the resultant
print "The resultant is ";
foreach $j (@{$seq1->{ResultList}}) {
  print "$j ";
}
print "\n";

Execute the file ( perl result01a.pl ):

The major generator is 5
The minor generator is 3
The resultant is 3 2 1 3 1 2 3

This output text shows the same resultant terms as produced by the first program.

Creating a Perl Module

From a package, you can create a Perl module to make the package fully reusable in an application. Also, you can modify our original test code into a series of module tests to show that the module works the same as the standalone package and the original program.

I like to use the Perl module Module::Starter to create a skeleton module for the package code. To start, install the Module::Starter module and its associated modules from CPAN, using the Perl Package Manager, or some other package manager. To see if you already have the Module::Starter module installed, type perldoc Module::Starter in a terminal window. If the man page does not appear, you probably do not have the module installed.

Select a working directory to create the module directory. This can be the same directory that you have been using to develop your Perl program. Type the following command (though with your own name and email address):

$ 
module-starter --module=Music::Resultant --author="John Doe" \
    --email=john@johndoe.com

Perl should respond with:

Created starter directories and files

In the working directory, you should see a folder or directory called Music-Resultant . Change your current directory to Music-Resultant , then type the commands:

$ 
perl Makefile.PL

$ 
make

These commands will create the full directory structure for the module. Now paste the text from the package into the module template at Music-Resultant/lib/Music/Resultant.pm . Open Resultant.pm in a text editor and paste the subroutines from the package after the lines:

=head1 FUNCTIONS

=head2 function1

=cut

When you paste the package source code, remove the opening brace from the package, so that the first lines appear as:

 package Resultant;
  sub new {
    use strict;
    my $class = shift;

and the last lines of the source appears without the the final closing brace as:

   @{$self->{ResultList}} = @result;
  }

After making the above changes, save Resultant.pm . This is all that you need to do to create a module for your own use. If you eventually release your module to the Perl community or upload it to CPAN , you should do some more work to prepare the module and its documentation (see the perlmod and perlmodlib documentation for more information).

After modifying Resultant.pm , you need to install the module to make it available for other Perl applications. To avoid configuration issues, install the module in your home directory, separate from your main Perl installation.

  1. In your home directory, create a lib/ directory, then create a perl/ directory within the lib/ directory. The result should resemble:
    /home/myname/lib/perl
    
    
  2. Go to your module directory ( Music-Resultant ) and re-run the build process with a directory path to tell Perl where to install the module:
    $ 
    perl Makefile.PL LIB=/home/myname/lib/perl
     $
    make install
    
    

    Once this is complete, the module will be installed in the directory.

The final step in module development is to add tests to the .t file templates created in the module directory. The Perl distribution includes several built-in test modules, such as Test::Simple and Test::More to help test Perl subroutines and modules.

To test the module, open the file Music-Resultant/t/00-load.t . The initial text in this file is:

#!perl -T

use Test::More tests => 1;

BEGIN {
    use_ok( 'Music::Resultant' );
}

diag( "Testing Music::Resultant $Music::Resultant::VERSION, Perl $], $^X" );

You can run this test file from the t/ directory using the command:

perl -I/home/myname/lib/perl -T 00-load.t

The -I switch tells the Perl interpreter to look for the module Resultant.pm in your alternate installation directory. The directory path must immediately follow the -I switch, or Perl may not search your alternate directory for your module. The -T switch is necessary because there is a -T switch in the first line of the test script, which turns on taint checking. (Taint checking only works when enabled at Perl startup; perl will exit with an error if you try to enable it later.) Your results should resemble the following(your Perl version may be different).

1..1
ok 1 - use Music::Resultant;
# Testing Music::Resultant 0.01, Perl 5.008006, perl

The test code from the second listing is easy to convert to the format used by Test::More . Change the number at the end of the tests line from 1 to 4, as you will be adding three more tests to this file. The template file has an initial test to show that the module exists. Next, add tests after the BEGIN block in the file:

# Test 2:
my $seq1 = Resultant->new(5, 3);  # create an object
isa_ok ($seq1, Resultant);        # check object definition

# Test 3: check major generator
my $local_major_generator = $seq1->get_major();
is ($local_major_generator, 5, 'major generator is correct' );

# Test 4: check minor generator
my $local_minor_generator = $seq1->get_minor();
is ($local_minor_generator, 3, 'minor generator is correct' );

To run the tests, retype the earlier command line in the Music-Resultant/ directory:

$ 
perl -I/home/myname/lib/perl -T t/00-load.t

You should see the results:

1..4
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
# Testing Music::Resultant 0.01, Perl 5.008006, perl

These tests create a Resultant object with a major generator of 5 and a minor generator of 3 (Test 2), and check to see that the major generator in the object is correct (Test 3), and that the minor generator is correct (Test 4). They do not cover the resultant terms. One way to check the resultant is to add the test code used in the second listing to the .t file:

# Generate a resultant
$seq1->generate();

# Print the resultant
my $j;
print "The resultant is ";
foreach $j (@{$seq1->{ResultList}}) {
  print "$j ";
}
print "\n";

You should get the following results:

1..4
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
The resultant is 3 2 1 3 1 2 3
# Testing Music::Resultant 0.01, Perl 5.008006, perl

That's not valid test output, so it needs a little bit of manipulation. To check the elements of a list using a testing function, install the Test::Differences module and its associated modules from CPAN, using the Perl Package Manager, or some other package manager. To see if you already have the Test::Differences module installed, type perldoc Test::Differences in a terminal window. If the man page does not appear, you probably do not have the module installed.

Once that module is part of your Perl installation, change the number of tests from 4 to 5 on the Test::More statement line and add a following statement after the use Test::More statement:

use Test::Differences;

Finally, replace the code that prints the resultant with:

# Test 5: (uses Test::Differences and associated modules)
$seq1->generate();
my @result   = @{$seq1->{ResultList}};
my @expected = (3, 2, 1, 3, 1, 2, 3);
eq_or_diff \@result, \@expected, "resultant terms are correct";

Now when the test file runs, you can confirm that the resultant is correct:

1..5
ok 1 - use Music::Resultant;
ok 2 - The object isa Resultant
ok 3 - major generator is correct
ok 4 - minor generator is correct
ok 5 - resultant terms are correct
# Testing Music::Resultant 0.01, Perl 5.008006, perl

Summary

There are multiple levels of Perl software development. Once you start to create modules to enable reuse of your Perl code, you will be able to leverage your effort into larger applications. By using Perl testing modules, you can ensure that your code works the way you expect and provide a way to ensure that the modules continue to work as you add more features.

Resources

Here are some other good resources on creating Perl modules:

Here are some good resources for using Perl testing modules like Test::Simple and Test::More :

[May 16, 2017] Perl - regex - Position of first nonmatching character

May 16, 2017 | stackoverflow.com

I think thats exactly what the pos function is for.

NOTE: pos only works if you use the /g flag


my
 $x 
=
'abcdefghijklmnopqrstuvwxyz'
;
my
 $end 
=
0
;
if
(
 $x 
=~
/
$ARGV
[
0
]/
g 
)
{

    $end 
=
 pos
(
$x
);
}
print
"End of match is: $end\n"
;

Gives the following output


[
@centos5
~]
$ perl x
.
pl
End
 of match is
:
0
[
@centos5
~]
$ perl x
.
pl def
End
 of match is
:
6
[
@centos5
~]
$ perl x
.
pl xyz
End
 of match is
:
26
[
@centos5
~]
$ perl x
.
pl aaa
End
 of match is
:
0
[
@centos5
~]
$ perl x
.
pl ghi
End
 of match is
:
9

No, it only works when a match was successful. tripleee Oct 10 '11 at 15:24

Sorry, I misread the question. The actaul question is very tricky, especially if the regex is more complicated than just /gho/ , especially if it contains [ or ( . Should I delete my irrelevant answer? Sodved Oct 10 '11 at 15:27

I liked the possibility to see an example of how pos works, as I didn't know about it before - so now I can understand why it also doesn't apply to the question; so thanks for this answer! :) sdaau Jun 8 '12 at 18:26

[May 16, 2017] Perl - positions of regex match in string - Stack Overflow

May 16, 2017 | stackoverflow.com
Perl - positions of regex match in string Ask Question

if
(
my
@matches
=
 $input_string 
=~
/
$metadata
[
$_
]{
"pattern"
}/
g
)
{
print
 $
-[
1
]
.
"\n"
;
# this gives me error uninitialized ...
}

print scalar @matches; gaves me 4, that is ok, but if i use $-[1] to get start of first match, it gaves me error. Where is problem?

EDIT1: How i can get positions of each match in string? If i have string "ahoj ahoj ahoj" and regexp /ahoj/g, how i can get positions of start and end of each "ahoj" in string? perl regex

share | improve this question edited Feb 22 '13 at 20:40 asked Feb 22 '13 at 20:27 Krab 2,643 21 48
What error does it give you? user554546 Feb 22 '13 at 20:29
$-[1] is the position of the 1st subpattern (something in parentheses within the regular expression). You're probably looking for $-[0] , the position of the whole pattern? Scott Lamb Feb 22 '13 at 20:32
scott lamb: no i was thinking if i have string "ahoj ahoj ahoj", then i can get position 0, 5, 10 etc inside $-[n], if regex is /ahoj/g Krab Feb 22 '13 at 20:34
add a comment | 1 Answer 1 active oldest votes
up vote 8 down vote accepted The array @- contains the offset of the start of the last successful match (in $-[0] ) and the offset of any captures there may have been in that match (in $-[1] , $-[2] etc.).

There are no captures in your string, so only $-[0] is valid, and (in your case) the last successful match is the fourth one, so it will contain the offset of the fourth instance of the pattern.

The way to get the offsets of individual matches is to write


my
@matches
;
while
(
"ahoj ahoj ahoj"
=~
/(
ahoj
)/
g
)
{

  push 
@matches
,
 $1
;
print
 $
-[
0
],
"\n"
;
}

output


0
5
10

Or if you don't want the individual matched strings, then


my
@matches
;

push 
@matches
,
 $
-[
0
]
while
"ahoj ahoj ahoj"
=~
/
ahoj
/
g
;

[May 07, 2017] Example Code from Beginning Perl for Bioinformatics

While example are genome sequencing specific most code is good illustiontion of string processing in Perl and as such has a wider appeal. See also molecularevolution.org
May 07, 2017 | uwf.edu

This page contains an uncompressed copy of example code from your course text, downloaded on January 15, 2003. Please see the official Beginning Perl for Bioinformatics Website under the heading " Examples and Exercises " for any updates to this code.


General files
Chapter 4 NOTE: Examples 4-5 to 4-7 also require the protein sequence data file: NM_021964fragment.pep.txt - To match the example in your book, save the file out with the name: NM_021964fragment.pep

Chapter 5

NOTE: Example 5-3 also requires the protein sequence data file: NM_021964fragment.pep.txt - To match the example in your book, save the file out with the name: NM_021964fragment.pep
NOTE: Example 5-4, 5-6 and 5-7 also require the DNA file: small.dna.txt - To match the example in your book, save the file out with the name: small.dna

Chapter 6

NOTE: BeginPerlBioinfo.pm may be needed to execute some code examples from this chapter. Place this file in the same directory as your .pl files.

Chapter 7

NOTE: BeginPerlBioinfo.pm may be needed to execute some code examples from this chapter. Place this file in the same directory as your .pl files.

Chapter 8

NOTE: BeginPerlBioinfo.pm may be needed to execute some code examples from this chapter. Place this file in the same directory as your .pl files.

NOTE: Example 8-2,8-3 and 8-4 also require the DNA file: sample.dna.txt - To match the example in your book, save the file out with the name: sample.dna

Chapter 9

NOTE: BeginPerlBioinfo.pm may be needed to execute some code examples from this chapter. Place this file in the same directory as your .pl files.

[May 07, 2017] Why is Perl used so extensively in biology research

Jan 15, 2016 | stackoverflow.com

Lincoln Stein highlighted some of the saving graces of Perl for bioinformatics in his article: How Perl Saved the Human Genome Project .

From his analysis:

I think several factors are responsible:
  1. Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. Even DNA sequences are textlike. Interconverting incompatible data formats is a matter of text mangling combined with some creative guesswork. Perl's powerful regular expression matching and string manipulation operators simplify this job in a way that isn't equalled by any other modern language.
  2. Perl is forgiving. Biological data is often incomplete, fields can be missing, or a field that is expected to be present once occurs several times (because, for example, an experiment was run in duplicate), or the data was entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to pick up and correct a variety of common errors in data entry. Of course this flexibility can be also be a curse. I talk more about the problems with Perl below.
  3. Perl is component-oriented. Perl encourages people to write their software in small modules, either using Perl library modules or with the classic Unix tool-oriented approach. External programs can easily be incorporated into a Perl script using a pipe, system call or socket. The dynamic loader introduced with Perl5 allows people to extend the Perl language with C routines or to make entire compiled libraries available for the Perl interpreter. An effort is currently under way to gather all the world's collected wisdom about biological data into a set of modules called "bioPerl" (discussed at length in an article to be published later in the Perl Journal).
  4. Perl is easy to write and fast to develop in. The interpreter doesn't require you to declare all your function prototypes and data types in advance, new variables spring into existence as needed, calls to undefined functions only cause an error when the function is needed. The debugger works well with Emacs and allows a comfortable interactive style of development.
  5. Perl is a good prototyping language. Because Perl is quick and dirty, it often makes sense to prototype new algorithms in Perl before moving them to a fast compiled language. Sometimes it turns out that Perl is fast enough so that of the algorithm doesn't have to be ported; more frequently one can write a small core of the algorithm in C, compile it as a dynamically loaded module or external executable, and leave the rest of the application in Perl (for an example of a complex genome mapping application implemented in this way, see http://waldo.wi.mit.edu/ftp/distribution/software/rhmapper/ ).
  6. Perl is a good language for Web CGI scripting, and is growing in importance as more labs turn to the Web for publishing their data.

I use lots of Perl for dealing with qualitative and quantitative data in social science research. In terms of getting things done (largely with text) quickly, finding libraries on CPAN (nice central location), and generally just getting things done quickly, it can't be surpassed.

Perl is also excellent glue, so if you have some instrumental records, and you need to glue them to data analysis routines, then Perl is your language. Perl is very powerful when it comes to deal with text and it's present in almost every Linux/Unix distribution. In bioinformatics, not only are sequence data very easy to manipulate with Perl, but also most of the bionformatics algorithms will output some kind of text results.

Then, the biggest bioinformatics centers like the EBI had that great guy, Ewan Birney, who was leading the BioPerl project. That library has lots of parsers for every kind of popular bioinformatics algorithms' results, and for manipulating the different sequence formats used in major sequence databases.

Nowadays, however, Perl is not the only language used by bioinformaticians: along with sequence data, labs produce more and more different kinds of data types and other languages are more often used in those areas.

The R statistics programming language for example, is widely used for statistical analysis of microarray and qPCR data (among others). Again, why are we using it so much? Because it has great libraries for that kind of data (see bioconductor project).

Now when it comes to web development, CGI is not really state of the art today, but people who know Perl may stick to it. In my company though it is no longer used...

I hope this helps.

Bioinformatics deals primarily in text parsing and Perl is the best programming language for the job as it is made for string parsing. As the O'Reilly book (Beginning Perl for Bioinformatics) says that "With [Perl]s highly developed capacity to detect patterns in data, Perl has become one of the most popular languages for biological data analysis." This seems to be a pretty comprehensive response. Perhaps one thing missing, however, is that most biologists (until recently, perhaps) don't have much programming experience at all. The learning curve for Perl is much lower than for compiled languages (like C or Java), and yet Perl still provides a ton of features when it comes to text processing. So what if it takes longer to run? Biologists can definitely handle that. Lab experiments routinely take one hour or more finish, so waiting a few extra minutes for that data processing to finish isn't going to kill them!

Just note that I am talking here about biologists that program out of necessity. I understand that there are some very skilled programmers and computer scientists out there that use Perl as well, and these comments may not apply to them.

===

People missed out DBI , the Perl abstract database interface that makes it really easy to work with bioinformatic databases.

There is also the one-liner angle. You can write something to reformat data in a single line in Perl and just use the -pe flag to embed that at the command line. Many people using AWK and sed moved to Perl. Even in full programs, file I/O is incredibly easy and quick to write, and text transformation is expressive at a high level compared to any engineering language around. People who use Java or even Python for one-off text transformation are just too lazy to learn another language. Java especially has a high dependence on the JVM implementation and its I/O performance.

At least you know how fast or slow Perl will be everywhere, slightly slower than C I/O. Don't learn grep , cut , sed , or AWK ; just learn Perl as your command line tool, even if you don't produce large programs with it. Regarding CGI, Perl has plenty of better web frameworks such as Catalyst and Mojolicious , but the mindshare definitely came from CGI and bioinformatics being one of the earliest heavy users of the Internet.

===

Perl is very easy to learn as compared to other languages. It can fully exploit the biological data which is becoming the big data. It can manipulate big data and perform good for manipulation data curation and all type of DNA programming, automation of biology has become easy due languages like Perl, Python and Ruby . It is very easy for those who are knowing biology, but not knowing how to program that in other programming languages.

Personally, and I know this will date me, but it's because I learned Perl first. I was being asked to take FASTA files and mix with other FASTA files. Perl was the recommended tool when I asked around.

At the time I'd been through a few computer science classes, but I didn't really know programming all that well.

Perl proved fairly easy to learn. Once I'd gotten regular expressions into my head I was parsing and making new FASTA files within a day.

As has been suggested, I was not a programmer. I was a biochemistry graduate working in a lab, and I'd made the mistake of setting up a Linux server where everyone could see me. This was back in the day when that was an all-day project.

Anyway, Perl became my goto for anything I needed to do around the lab. It was awesome, easy to use, super flexible, other Perl guys in other labs we're a lot like me.

So, to cut it short, Perl is easy to learn, flexible and forgiving, and it did what I needed.

Once I really got into bioinformatics I picked up R, Python, and even Java. Perl is not that great at helping to create maintainable code, mostly because it is so flexible. Now I just use the language for the job, but Perl is still one of my favorite languages, like a first kiss or something.

To reiterate, most bioinformatics folks learned coding by just kluging stuff together, and most of the time you're just trying to get an answer for the principal investigator (PI), so you can't spend days on code design. Perl is superb at just getting an answer, it probably won't work a second time, and you will not understand anything in your own code if you see it six months later; BUT if you need something now, then it is a good choice even though I mostly use Python now.

I hope that gives you an answer from someone who lived it.

[May 07, 2017] A useful capability of Perl substr function

Perl subst function can used as pseudo function on the left side of assignment, That allow to insert a substring into arbitrary point of the string

For example, the code fragment:

$test_string='<cite>xxx<blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>';
print "Before: $test_string\n"; 
substr($test_string,length('<cite>xxx'),0)='</cite>';
print "After: $test_string\n"; 
will print
Before: <cite>xxx<blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>
After:  <cite>xxx</cite><blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>

Please note that is you found the symbol of string bafore which you need to insert the string you need to substrac one from the found position

$pos=index($test_string,'<blockquote>;);
if( $pos > -1 ){
    substr($test_string,$pos-1,0)='</cite>';
}

[Mar 20, 2017] Cultured Perl Debugging Perl with ease

Mar 20, 2017 | www.ibm.com

Teodor Zlatanov
Published on November 01, 2000

The Perl debugger comes with its own help ('h' or 'h h', for the long and short help screens, respectively). The perldoc perldebug page (type "perldoc perldebug" at your prompt) has a more complete description of the Perl debugger.

So let's start with a buggy program and take a look at how the Perl debugger works. First, it'll attempt to print the first 20 lines in a file.

1 2 3 4 5 6 7 8 9 10 #!/usr/bin/perl -w use strict; foreach (0..20) { my $line = ; print "$_ : $line"; }

When run by itself, buggy.pl fails with the message: "Use of uninitialized value in concatenation (.) at ./buggy.pl line 8, line 9." More mysteriously, it prints "9:" on a line by itself and waits for user input.

Now what does that mean? You may already have spotted the bugbear that came along when we fired up the Perl debugger.

First let's simply make sure the bug is repeatable. We'll set an action on line 8 to print $line where the error occurred, and run the program.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 > perl -d ./buggy.pl buggy.pl Default die handler restored. Loading DB routines from perl5db.pl version 1.07 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(./buggy.pl:5): foreach (0..20) main::(./buggy.pl:6): { DB<1> use Data::Dumper DB<1> a 8 print 'The line variable is now ', Dumper $line

The Data::Dumper module loads so that the autoaction can use a nice output format. The autoaction is set to do a print statement every time line 8 is reached. Now let's watch the show.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 DB c The line variable is now $VAR1 = '#!/usr/bin/perl -w '; 0 : #!/usr/bin/perl -w The line variable is now $VAR1 = ' '; 1 : The line variable is now $VAR1 = 'use strict; '; 2 : use strict; The line variable is now $VAR1 = ' '; 3 : The line variable is now $VAR1 = 'foreach (0..20) '; 4 : foreach (0..20) The line variable is now $VAR1 = '{ '; 5 : { The line variable is now $VAR1 = ' my $line = ; '; 6 : my $line = ; The line variable is now $VAR1 = ' print "$_ : $line"; '; 7 : print "$_ : $line"; The line variable is now $VAR1 = '} '; 8 : } The line variable is now $VAR1 = undef; Use of uninitialized value in concatenation (.) at ./buggy.pl line 8, <> line 9. 9 :

It's clear now that the problem occurred when the line variable was undefined. Furthermore, the program waited for more input. And pressing the Return key eleven more times created the following output:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 The line variable is now $VAR1 = ' '; 10 : The line variable is now $VAR1 = ' '; 11 : The line variable is now $VAR1 = ' '; 12 : The line variable is now $VAR1 = ' '; 13 : The line variable is now $VAR1 = ' '; 14 : The line variable is now $VAR1 = ' '; 15 : The line variable is now $VAR1 = ' '; 16 : The line variable is now $VAR1 = ' '; 17 : The line variable is now $VAR1 = ' '; 18 : The line variable is now $VAR1 = ' '; 19 : The line variable is now $VAR1 = ' '; 20 : Debugged program terminated. Use q to quit or R to restart, use O inhibit_exit to avoid stopping after program termination, h q, h R or h O to get additional info. DB<3>

By now it's obvious that the program is buggy because it unconditionally waits for 20 lines of input, even though there are cases in which the lines will not be there. The fix is to test the $line after reading it from the filehandle:

1 2 3 4 5 6 7 8 9 #!/usr/bin/perl -w use strict; foreach (0..20) { my $line = ; last unless defined $line; # exit loop if $line is not defined print "$_ : $line"; }

As you see, the fixed program works properly in all cases!

Concluding notes on the Perl debugger

The Emacs editor supports the Perl debugger and makes using it a somewhat better experience. You can read more about the GUD Emacs mode inside Emacs with Info (type M-x info). GUD is a universal debugging mode that works with the Perl debugger (type M-x perldb while editing a Perl program in Emacs).

With a little work, the vi family of editors will also support the Perl debuggers. See the perldoc perldebug page for more information. For other editors, consult each editor's documentation.

The Perl built-in debugger is a powerful tool and can do much more than the simple usage we just looked at. It does, however, require a fair amount of Perl expertise. Which is why we are now going to look at some simpler tools that will better suit beginning and intermediate Perl programmers.

Devel::ptkdb

To use the Devel::ptkdb debugger you first have to download it from CPAN ( see Related topics below) and install it on your system. (Some of you may also need to install the Tk module, also in CPAN.) On a personal note, Devel::ptkdb works best on UNIX systems like Linux. (Although it's not theoretically limited to UNIX-compatible systems, I have never heard of anyone successfully using Devel::ptkdb on Windows. As the old saying goes, anything is possible except skiing through a revolving door.)

If you can't get your system administrator to perform the installation for you (because, for instance, you are the system administrator), you can try doing the following at your prompt (you may need to run this as root):

1 2 perl -MCPAN -e'install Tk' perl -MCPAN -e'install Devel::ptkdb'

After some initial questions, if this is your first time running the CPAN installation routines, you will download and install the appropriate modules automatically.

You can run a program with the ptkdb debugger as follows (using our old buggy.pl example):

1 perl -d:ptkdb buggy.pl buggy.pl

To read the documentation for the Devel::ptkdb modules, use the command "perldoc Devel::ptkdb". We are using version 1.1071 here. (Although updated versions may come out at any time, they should not look very different from the one we're using.)

A window will come up with the program's source code on the left and a list of watched expressions (initially empty) on the right. Enter the word "$line" in the "Enter Expr:" box. Then click on the "Step Over" button to watch the program execute.

The "Run" button will run the program until it finishes or hits a breakpoint. Clicking on the line number in the source-listing window sets or deletes breakpoints. If you select the "BrkPts" tab on the right, you can edit the list of breakpoints and make them conditional upon a variable or a function. (This is a very easy way to set up conditional breakpoints.)

Ptkdb also has File, Control, Data, Stack, and Bookmarks menus. These menus are all explained in the perldoc documentation. Because it's so easy to use, Ptkdb is an absolute must for beginner and intermediate Perl programmers. It can even be useful for Perl gurus (as long as they don't tell anyone that they're using those new-fangled graphical interfaces).

Writing your own Perl shell

Sometimes using a debugger is overkill. If, for example, you want to test something simple, in isolation from the rest of a large program, a debugger would be too complex for the task. This is where a Perl shell can come in handy.

While other valid approaches to a Perl shell certainly exist, we're going to look at a general solution that works well for most daily work (and I use it all the time). Once you understand the tool, you should feel free to tailor it to your own needs and preferences.

The following code requires the Term::ReadLine module. You can download it from CPAN and install it almost the same way as you did Devel::ptkdb.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 #!/usr/bin/perl -w use Term::ReadLine; use Data::Dumper; my $historyfile = $ENV{HOME} . '/.phistory'; my $term = new Term::ReadLine 'Perl Shell'; sub save_list { my $f = shift; my $l = shift; open F, $f; print F "$_\n" foreach @$l } if (open H, $historyfile) { @h = ; chomp @h; close H; $h{$_} = 1 foreach @h; $term->addhistory($_) foreach keys %h; } while ( defined ($_ = $term->readline("My Perl Shell> ")) ) { my $res = eval($_); warn $@ if $@; unless ($@) { open H, ">>$historyfile"; print H "$_\n"; close H; print "\n", Data::Dumper->Dump([$res], ['Result']); } $term->addhistory($_) if /\S/; }

This Perl shell does several things well, and some things decently.

First of all, it keeps a unique history of commands already entered in your home directory in a file called ".phistory". If you enter a command twice, only one copy will remain (see the function that opens $historyfile and reads history lines from it).

With each entry of a new command, the command list is saved to the .phistory file. So if you enter a command that crashes the shell, the history of your last session is not lost.

The Term::ReadLine module makes it easy to enter commands for execution. Because commands are limited to only one line at a time, it's possible to write good old buggy.pl as:

1 2 3 4 5 6 Da Perl Shell> use strict $Result = undef; Perl Shell> print "$_: " . foreach (0..20) 0: ... 1: ...

The problem of course is that the input operator ends up eating the shell's own input. So don't use or STDIN in the Perl shell, because they'll make things more difficult. Try this instead:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Perl Shell> open F, "buggy.pl" $Result = 1; Perl Shell> foreach (0..20) { last if eof(F); print "$_: " . ; } 0: #!/usr/bin/perl -w 1: 2: use strict; 3: 4: foreach (0..20) 5: { 6: my $line = ; 7: last unless defined $line; # exit loop if $line is not defined 8: print "$_ : $line"; 9: } $Result = undef;

As you can see, the shell works for cases where you can easily condense statements into one line. It's also a surprisingly common solution to isolating bugs and provides a great learning environment. Do a little exercise and see if you can write a Perl shell for debugging on your own, and see how much you learn!

Building an arsenal of tools

We've only covered the very basics here of the built-in Perl debugger, Devel::ptkdb, and related tools. There are many more ways to debug Perl. What's important is that you gain an understanding of the debugging process: how a bug is observed, solved, and then fixed. Of course the single most important thing is to make sure you have a comprehensive understanding of your program's requirements.

The Perl built-in debugger is very powerful, but it's not good for beginner or intermediate Perl programmers. (With the exception of Emacs, where it can be a useful tool even for beginners as long as they understand debugging under Emacs.)

The Devel::ptkdb module and debugger are (because of power and usability) by far the best choice for beginning and intermediate programmers. Perl shells, on the other hand, are personalized debugging solutions for isolated problems with small pieces of code.

Every software tester builds his own arsenal of debugging tools, whether it's the Emacs editor with GUD, or a Perl shell, or print statements throughout the code. Hopefully the tools we've looked at here will make your debugging experience a little easier.

[Mar 20, 2017] Cultured Perl One-liners 102

Mar 20, 2017 | www.ibm.com
One-liners 102

More one-line Perl scripts

Teodor Zlatanov
Published on March 12, 2003 Share this page

Facebook Twitter Linked In Google+ E-mail this page Comments 0

This article, as regular readers may have guessed, is the sequel to " One-liners 101 ," which appeared in a previous installment of "Cultured Perl". The earlier article is an absolute requirement for understanding the material here, so please take a look at it before you continue.

The goal of this article, as with its predecessor, is to show legible and reusable code, not necessarily the shortest or most efficient version of a program. With that in mind, let's get to the code!

Tom Christiansen's list

Tom Christiansen posted a list of one-liners on Usenet years ago, and that list is still interesting and useful for any Perl programmer. We will look at the more complex one-liners from the list; the full list is available in the file tomc.txt (see Related topics to download this file). The list overlaps slightly with the " One-liners 101 " article, and I will try to point out those intersections.

Awk is commonly used for basic tasks such as breaking up text into fields; Perl excels at text manipulation by design. Thus, we come to our first one-liner, intended to add two columns in the text input to the script.

Listing 1. Like awk?
1 2 3 4 # add first and penultimate columns # NOTE the equivalent awk script: # awk '{i = NF - 1; print $1 + $i}' perl -lane 'print $F[0] + $F[-2]'

So what does it do? The magic is in the switches. The -n and -a switches make the script a wrapper around input that splits the input on whitespace into the @F array; the -e switch adds an extra statement into the wrapper. The code of interest actually produced is:

Listing 2: The full Monty
1 2 3 4 5 while (<>) { @F = split(' '); print $F[0] + $F[-2]; # offset -2 means "2nd to last element of the array" }

Another common task is to print the contents of a file between two markers or between two line numbers.

Listing 3: Printing a range of lines
1 2 3 4 5 6 7 8 9 10 11 # 1. just lines 15 to 17 perl -ne 'print if 15 .. 17' # 2. just lines NOT between line 10 and 20 perl -ne 'print unless 10 .. 20' # 3. lines between START and END perl -ne 'print if /^START$/ .. /^END$/' # 4. lines NOT between START and END perl -ne 'print unless /^START$/ .. /^END$/'

A problem with the first one-liner in Listing 3 is that it will go through the whole file, even if the necessary range has already been covered. The third one-liner does not have that problem, because it will print all the lines between the START and END markers. If there are eight sets of START/END markers, the third one-liner will print the lines inside all eight sets.

Preventing the inefficiency of the first one-liner is easy: just use the $. variable, which tells you the current line. Start printing if $. is over 15 and exit if $. is greater than 17.

Listing 4: Printing a numeric range of lines more efficiently
1 2 # just lines 15 to 17, efficiently perl -ne 'print if $. >= 15; exit if $. >= 17;'

Enough printing, let's do some editing. Needless to say, if you are experimenting with one-liners, especially ones intended to modify data, you should keep backups. You wouldn't be the first programmer to think a minor modification couldn't possibly make a difference to a one-liner program; just don't make that assumption while editing the Sendmail configuration or your mailbox.

Listing 5: In-place editing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # 1. in-place edit of *.c files changing all foo to bar perl -p -i.bak -e 's/\bfoo\b/bar/g' *.c # 2. delete first 10 lines perl -i.old -ne 'print unless 1 .. 10' foo.txt # 3. change all the isolated oldvar occurrences to newvar perl -i.old -pe 's{\boldvar\b}{newvar}g' *.[chy] # 4. increment all numbers found in these files perl -i.tiny -pe 's/(\d+)/ 1 + $1 /ge' file1 file2 .... # 5. delete all but lines between START and END perl -i.old -ne 'print unless /^START$/ .. /^END$/' foo.txt # 6. binary edit (careful!) perl -i.bak -pe 's/Mozilla/Slopoke/g' /usr/local/bin/netscape

Why does 1 .. 10 specify line numbers 1 through 10? Read the "perldoc perlop" manual page. Basically, the .. operator iterates through a range. Thus, the script does not count 10 lines , it counts 10 iterations of the loop generated by the -n switch (see "perldoc perlrun" and Listing 2 for an example of that loop).

The magic of the -i switch is that it replaces each file in @ARGV with the version produced by the script's output on that file. Thus, the -i switch makes Perl into an editing text filter. Do not forget to use the backup option to the -i switch. Following the i with an extension will make a backup of the edited file using that extension.

Note how the -p and -n switch are used. The -n switch is used when you want explicitly to print out data. The -p switch implicitly inserts a print $_ statement in the loop produced by the -n switch. Thus, the -p switch is better for full processing of a file, while the -n switch is better for selective file processing, where only specific data needs to be printed.

Examples of in-place editing can also be found in the " One-liners 101 " article.

Reversing the contents of a file is not a common task, but the following one-liners show than the -n and -p switches are not always the best choice when processing an entire file.

Listing 6: Reversal of files' fortunes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # 1. command-line that reverses the whole input by lines # (printing each line in reverse order) perl -e 'print reverse <>' file1 file2 file3 .... # 2. command-line that shows each line with its characters backwards perl -nle 'print scalar reverse $_' file1 file2 file3 .... # 3. find palindromes in the /usr/dict/words dictionary file perl -lne '$_ = lc $_; print if $_ eq reverse' /usr/dict/words # 4. command-line that reverses all the bytes in a file perl -0777e 'print scalar reverse <>' f1 f2 f3 ... # 5. command-line that reverses each paragraph in the file but prints # them in order perl -00 -e 'print reverse <>' file1 file2 file3 ....

The -0 (zero) flag is very useful if you want to read a full paragraph or a full file into a single string. (It also works with any character number, so you can use a special character as a marker.) Be careful when reading a full file in one command ( -0777 ), because a large file will use up all your memory. If you need to read the contents of a file backwards (for instance, to analyze a log in reverse order), use the CPAN module File::ReadBackwards. Also see " One-liners 101 ," which shows an example of log analysis with File::ReadBackwards.

Note the similarity between the first and second scripts in Listing 6. The first one, however, is completely different from the second one. The difference lies in using <> in scalar context (as -n does in the second script) or list context (as the first script does).

The third script, the palindrome detector, did not originally have the $_ = lc $_; segment. I added that to catch those palindromes like "Bob" that are not the same backwards.

My addition can be written as $_ = lc; as well, but explicitly stating the subject of the lc() function makes the one-liner more legible, in my opinion.

Paul Joslin's list

Paul Joslin was kind enough to send me some of his one-liners for this article.

Listing 7: Rewrite with a random number
1 2 # replace string XYZ with a random number less than 611 in these files perl -i.bak -pe "s/XYZ/int rand(611)/e" f1 f2 f3

This is a filter that replaces XYZ with a random number less than 611 (that number is arbitrarily chosen). Remember the rand() function returns a random number between 0 and its argument.

Note that XYZ will be replaced by a different random number every time, because the substitution evaluates "int rand(611)" every time.

Listing 8: Revealing the files' base nature
1 2 3 4 5 6 7 8 9 10 11 # 1. Run basename on contents of file perl -pe "s@.*/@@gio" INDEX # 2. Run dirname on contents of file perl -pe 's@^(.*/)[^/]+@$1\n@' INDEX # 3. Run basename on contents of file perl -MFile::Basename -ne 'print basename $_' INDEX # 4. Run dirname on contents of file perl -MFile::Basename -ne 'print dirname $_' INDEX

One-liners 1 and 2 came from Paul, while 3 and 4 were my rewrites of them with the File::Basename module. Their purpose is simple, but any system administrator will find these one-liners useful.

Listing 9: Moving or renaming, it's all the same in UNIX
1 2 3 4 5 6 # 1. write command to mv dirs XYZ_asd to Asd # (you may have to preface each '!' with a '\' depending on your shell) ls | perl -pe 's!([^_]+)_(.)(.*)!mv $1_$2$3 \u$2\E$3!gio' # 2. Write a shell script to move input from xyz to Xyz ls | perl -ne 'chop; printf "mv $_ %s\n", ucfirst $_;'

For regular users or system administrators, renaming files based on a pattern is a very common task. The scripts above will do two kinds of job: either remove the file name portion up to the _ character, or change each filename so that its first letter is uppercased according to the Perl ucfirst() function.

There is a UNIX utility called "mmv" by Vladimir Lanin that may also be of interest. It allows you to rename files based on simple patterns, and it's surprisingly powerful. See the Related topics section for a link to this utility.

Some of mine

The following is not a one-liner, but it's a pretty useful script that started as a one-liner. It is similar to Listing 7 in that it replaces a fixed string, but the trick is that the replacement itself for the fixed string becomes the fixed string the next time.

The idea came from a newsgroup posting a long time ago, but I haven't been able to find original version. The script is useful in case you need to replace one IP address with another in all your system files -- for instance, if your default router has changed. The script includes $0 (in UNIX, usually the name of the script) in the list of files to rewrite.

As a one-liner it ultimately proved too complex, and the messages regarding what is about to be executed are necessary when system files are going to be modified.

Listing 10: Replace one IP address with another one
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 #!/usr/bin/perl -w use Regexp::Common qw/net/; # provides the regular expressions for IP matching my $replacement = shift @ARGV; # get the new IP address die "You must provide $0 with a replacement string for the IP 111.111.111.111" unless $replacement; # we require that $replacement be JUST a valid IP address die "Invalid IP address provided: [$replacement]" unless $replacement =~ m/^$RE{net}{IPv4}$/; # replace the string in each file foreach my $file ($0, qw[/etc/hosts /etc/defaultrouter /etc/ethers], @ARGV) { # note that we know $replacement is a valid IP address, so this is # not a dangerous invocation my $command = "perl -p -i.bak -e 's/111.111.111.111/$replacement/g' $file"; print "Executing [$command]\n"; system($command); }

Note the use of the Regexp::Common module, an indispensable resource for any Perl programmer today. Without Regexp::Common, you will be wasting a lot of time trying to match a number or other common patterns manually, and you're likely to get it wrong.

Conclusion

Thanks to Paul Joslin for sending me his list of one-liners. And in the spirit of conciseness that one-liners inspire, I'll refer you to " One-liners 101 " for some closing thoughts on one-line Perl scripts.
Articles by Teodor Zlatanov

Git gets demystified and Subversion control (Aug 27, 2009)

Build simple photo-sharing with Amazon cloud and Perl (Apr 06, 2009)

developerWorks: Use IMAP with Perl, Part 2 (May 26, 2005)

developerWorks: Complex Layered Configurations with AppConfig (Apr 11, 2005)

developerWorks: Perl 6 Grammars and Regular Expressions (Nov 09, 2004)

developerWorks: Genetic Algorithms Simulate a Multi-Celled Organism (Oct 28, 2004)

developerWorks: Cultured Perl: Managing Linux Configuration Files (Jun 15, 2004)

developerWorks: Cultured Perl: Fun with MP3 and Perl, Part 2 (Feb 09, 2004)

developerWorks: Cultured Perl: Fun with MP3 and Perl, Part 1 (Dec 16, 2003)

developerWorks: Inversion Lists with Perl (Oct 27, 2003)

developerWorks: Cultured Perl: One-Liners 102 (Mar 21, 2003)

developerWorks: Developing cfperl, From the Beginning (Jan 22, 2003)

IBM developerWorks: Using the xinetd program for system administration (Nov 28, 2001)

IBM developerWorks: Reading and writing Excel files with Perl (Sep 30, 2001)

IBM developerWorks: Automating UNIX system administration with Perl (Jul 22, 2001)

IBM developerWorks: A programmer's Linux-oriented setup - Optimizing your machine for your needs (Mar 25, 2001)

IBM developerWorks: Cultured Perl: Debugging Perl with ease (Nov 23, 2000)

IBM developerWorks: Cultured Perl: Review of Programming Perl, Third Edition (Sep 17, 2000)

IBM developerWorks: Cultured Perl: Writing Perl programs that speak English Using Parse::RecDescent (Aug 05, 2000)

IBM developerWorks: Perl: Small observations about the big picture (Jul 02, 2000)

IBM developerWorks: Parsing with Perl modules (Apr 30, 2000)

[Dec 27, 2016] Perl is a great choice for a variety of industries

Dec 27, 2016 | opensource.com

Opensource.com

Earlier this year, ActiveState conducted a survey of users who had downloaded our distribution of Perl over the prior year and a half. We received 356 responses99 commercial users and 257 individual users. I've been using Perl for a long time, and I expected that lengthy experience would be typical of the Perl community. Our survey results, however, tell a different story.

Almost one-third of the respondents have three or fewer years of experience. Nearly half of all respondents reported using Perl for fewer than five years, a statistic that could be attributed to Perl's outstanding, inclusive community. The powerful and pragmatic nature of Perl and its supportive community make it a great choice for a wide array of uses across a variety of industries.

For a deeper dive, check out this video of my talk at YAPC North America this year.

Perl careers

Right now you can search online and find Perl jobs related to Amazon and BBC, not to mention several positions at Boeing. A quick search on Dice.com, an IT and engineering career website, yielded 3,575 listings containing the word Perl at companies like Amazon, Athena Health, and Northrop Grumman. Perl is also found in the finance industry, where it's primarily used to pull data from databases and process it.

Perl benefits

Perl's consistent utilization is the result of myriad factors, but its open source background is a powerful attribute.

Projects using Perl reduce upfront costs and downstream risks, and when you factor in how clean and powerful Perl is, it becomes quite a compelling option. Add to this that Perl sees yearly releases (more than that, even, as Perl has seen seven releases since 2012), and you can begin to understand why Perl still runs large parts of the web.

Mojolicious, Dancer, and Catalyst are just a few of the powerful web frameworks built for Perl. Designed for simplicity and scalability, these frameworks provide aspiring Perl developers an easy entry point to the language, which might explain some of the numbers from the survey I mentioned above. The inclusive nature of the Perl community draws developers, as well. It's hard to find a more welcoming or active community, and you can see evidence of that in the online groups, open source projects, and regular worldwide conferences and workshops.

Perl modules

Perl also has a mature installation tool chain and a strong testing culture. Anyone who wants to create automated test suites for Perl projects has the assistance of the over 400 testing and quality modules available on CPAN (Comprehensive Perl Archive Network). They won't have to sort through all 400 to choose the best, though: Test::Most is a one-stop shop for the most commonly used test modules. CPAN is one of Perl's biggest advantages over other programming languages. The archive hosts tens of thousands of ready-to-use modules for Perl, and the breadth and variety of those modules is astounding.

Even with a quick search you can find hardcore numerical modules, ODE (ordinary differential equations) solvers, and countless other types of modules written over the last 20 years by thousands of contributors. This contribution-based archive network helps keep Perl fresh and relevant, proliferating modules like pollen that will blow around to the incredible number of Perl projects out in the world.

You might think that community modules aren't the most reliable, but every distribution of modules on CPAN has been tested on myriad platforms and Perl configurations. As a testament to the determination of Perl users, the community has constructed a testing network and they spend time to make sure each Perl module works well on every available platform. They also maintain extensively-checked libraries that help Perl developers with big data projects.

What we're seeing today is a significant, dedicated community of Perl developers. This is not only because the language is pragmatic, effective, and powerful, but also because of the incredible community that these developers compose. The Perl community doesn't appear to be going anywhere, which means neither is Perl.

[Dec 26, 2016] Perl Advent Calendar Enters Its 17th Year

Dec 26, 2016 | developers.slashdot.org
(perladvent.org)

Posted by EditorDavid on Saturday December 03, 2016 @10:34AM

An anonymous reader writes: Thursday brought this year's first new posts on the Perl Advent Calendar , a geeky tradition first started back in 2000.

Friday's post described Santa's need for fast, efficient code, and the day that a Christmas miracle occurred during Santa's annual code review (involving the is_hashref subroutine from Perl's reference utility library). And for the last five years, the calendar has also had its own Twitter feed .

But in another corner of the North Pole, you can also unwrap the Perl 6 Advent Calendar , which this year celebrates the one-year anniversary of the official launch of Perl 6. Friday's post was by brian d foy, a writer on the classic Perl textbooks Learning Perl and Intermediate Perl (who's now also crowdfunding his next O'Reilly book , Learning Perl 6 ).

foy's post talked about Perl 6's object hashes, while the calendar kicked off its new season Thursday with a discussion about creating Docker images using webhooks triggered by GitHub commits as an example of Perl 6's "whipupitude".

[Nov 16, 2015] undef can be used as a dummy variable in split function

Instead of

($id, $not_used, credentials, $home_dir, $shell ) = split /:/;

You can write

($id, undef, credentials, $home_dir, $shell ) = split /:/;

In Perl 22 they even did pretty fancy (and generally useless staff). Instead of

my(undef, $card_num, undef, undef, undef, $count) = split /:/;

You can write

use v5.22; 
my(undef, $card_num, (undef)x3, $count) = split /:/;

[Nov 15, 2015] Web Basics with LWP

Aug 20, 2002 | Perl.com

LWP (short for "Library for WWW in Perl") is a popular group of Perl modules for accessing data on the Web. Like most Perl module-distributions, each of LWP's component modules comes with documentation that is a complete reference to its interface. However, there are so many modules in LWP that it's hard to know where to look for information on doing even the simplest things.

Introducing you to using LWP would require a whole book--a book that just happens to exist, called Perl & LWP. This article offers a sampling of recipes that let you perform common tasks with LWP.

Getting Documents with LWP::Simple

If you just want to access what's at a particular URL, the simplest way to do it is to use LWP::Simple's functions.

In a Perl program, you can call its get($url) function. It will try getting that URL's content. If it works, then it'll return the content; but if there's some error, it'll return undef.

  my $url = 'http://freshair.npr.org/dayFA.cfm?todayDate=current';
    # Just an example: the URL for the most recent /Fresh Air/ show
  use LWP::Simple;
  my $content = get $url;
  die "Couldn't get $url" unless defined $content;

  # Then go do things with $content, like this:

  if($content =~ m/jazz/i) {
    print "They're talking about jazz today on Fresh Air!\n";
  } else {
    print "Fresh Air is apparently jazzless today.\n";
  }

The handiest variant on get is getprint, which is useful in Perl one-liners. If it can get the page whose URL you provide, it sends it to STDOUT; otherwise it complains to STDERR.


  % perl -MLWP::Simple -e "getprint 'http://cpan.org/RECENT'"

This is the URL of a plain-text file. It lists new files in CPAN in the past two weeks. You can easily make it part of a tidy little shell command, like this one that mails you the list of new Acme:: modules:


  % perl -MLWP::Simple -e "getprint 'http://cpan.org/RECENT'"  \
     | grep "/by-module/Acme" | mail -s "New Acme modules! Joy!" $USER

There are other useful functions in LWP::Simple, including one function for running a HEAD request on a URL (useful for checking links, or getting the last-revised time of a URL), and two functions for saving and mirroring a URL to a local file. See the LWP::Simple documentation for the full details, or Chapter 2, "Web Basics" of Perl & LWP for more examples.

The Basics of the LWP Class Model

LWP::Simple's functions are handy for simple cases, but its functions don't support cookies or authorization; they don't support setting header lines in the HTTP request; and generally, they don't support reading header lines in the HTTP response (most notably the full HTTP error message, in case of an error). To get at all those features, you'll have to use the full LWP class model.

While LWP consists of dozens of classes, the two that you have to understand are LWP::UserAgent and HTTP::Response. LWP::UserAgent is a class for "virtual browsers," which you use for performing requests, and HTTP::Response is a class for the responses (or error messages) that you get back from those requests.

The basic idiom is $response = $browser->get($url), or fully illustrated:


  # Early in your program:
  
  use LWP 5.64; # Loads all important LWP classes, and makes
                #  sure your version is reasonably recent.

  my $browser = LWP::UserAgent->new;
  
  ...
  
  # Then later, whenever you need to make a get request:
  my $url = 'http://freshair.npr.org/dayFA.cfm?todayDate=current';
  
  my $response = $browser->get( $url );
  die "Can't get $url -- ", $response->status_line
   unless $response->is_success;

  die "Hey, I was expecting HTML, not ", $response->content_type
   unless $response->content_type eq 'text/html';
     # or whatever content-type you're equipped to deal with

  # Otherwise, process the content somehow:
  
  if($response->content =~ m/jazz/i) {
    print "They're talking about jazz today on Fresh Air!\n";
  } else {
    print "Fresh Air is apparently jazzless today.\n";
  }
There are two objects involved: $browser, which holds an object of the class LWP::UserAgent, and then the $response object, which is of the class HTTP::Response. You really need only one browser object per program; but every time you make a request, you get back a new HTTP::Response object, which will have some interesting attributes: Adding Other HTTP Request Headers

The most commonly used syntax for requests is $response = $browser->get($url), but in truth, you can add extra HTTP header lines to the request by adding a list of key-value pairs after the URL, like so:


  $response = $browser->get( $url, $key1, $value1, $key2, $value2, ... );

For example, here's how to send more Netscape-like headers, in case you're dealing with a site that would otherwise reject your request:


  my @ns_headers = (
   'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
   'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, 
        image/pjpeg, image/png, */*',
   'Accept-Charset' => 'iso-8859-1,*,utf-8',
   'Accept-Language' => 'en-US',
  );

  ...
  
  $response = $browser->get($url, @ns_headers);

If you weren't reusing that array, you could just go ahead and do this:



  $response = $browser->get($url,
   'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
   'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, 
        image/pjpeg, image/png, */*',
   'Accept-Charset' => 'iso-8859-1,*,utf-8',
   'Accept-Language' => 'en-US',
  );

If you were only going to change the 'User-Agent' line, you could just change the $browser object's default line from "libwww-perl/5.65" (or the like) to whatever you like, using LWP::UserAgent's agent method:


   $browser->agent('Mozilla/4.76 [en] (Win98; U)');
Enabling Cookies

A default LWP::UserAgent object acts like a browser with its cookies support turned off. There are various ways of turning it on, by setting its cookie_jar attribute. A "cookie jar" is an object representing a little database of all the HTTP cookies that a browser can know about. It can correspond to a file on disk (the way Netscape uses its cookies.txt file), or it can be just an in-memory object that starts out empty, and whose collection of cookies will disappear once the program is finished running.

To give a browser an in-memory empty cookie jar, you set its cookie_jar attribute like so:


  $browser->cookie_jar({});

To give it a copy that will be read from a file on disk, and will be saved to it when the program is finished running, set the cookie_jar attribute like this:


  use HTTP::Cookies;
  $browser->cookie_jar( HTTP::Cookies->new(
    'file' => '/some/where/cookies.lwp',
        # where to read/write cookies
    'autosave' => 1,
        # save it to disk when done
  ));

That file will be an LWP-specific format. If you want to access the cookies in your Netscape cookies file, you can use the HTTP::Cookies::Netscape class:


  use HTTP::Cookies;
    # yes, loads HTTP::Cookies::Netscape too
  
  $browser->cookie_jar( HTTP::Cookies::Netscape->new(
    'file' => 'c:/Program Files/Netscape/Users/DIR-NAME-HERE/cookies.txt',
        # where to read cookies
  ));

You could add an 'autosave' => 1 line as we did earlier, but at time of writing, it's uncertain whether Netscape might discard some of the cookies you could be writing back to disk.

Posting Form Data

Many HTML forms send data to their server using an HTTP POST request, which you can send with this syntax:


 $response = $browser->post( $url,
   [
     formkey1 => value1, 
     formkey2 => value2, 
     ...
   ],
 );
Or if you need to send HTTP headers:

 $response = $browser->post( $url,
   [
     formkey1 => value1, 
     formkey2 => value2, 
     ...
   ],
   headerkey1 => value1, 
   headerkey2 => value2, 
 );

For example, the following program makes a search request to AltaVista (by sending some form data via an HTTP POST request), and extracts from the HTML the report of the number of matches:


  use strict;
  use warnings;
  use LWP 5.64;
  my $browser = LWP::UserAgent->new;
  
  my $word = 'tarragon';
  
  my $url = 'http://www.altavista.com/sites/search/web';
  my $response = $browser->post( $url,
    [ 'q' => $word,  # the Altavista query string
      'pg' => 'q', 'avkw' => 'tgz', 'kl' => 'XX',
    ]
  );
  die "$url error: ", $response->status_line
   unless $response->is_success;
  die "Weird content type at $url -- ", $response->content_type
   unless $response->content_type eq 'text/html';

  if( $response->content =~ m{AltaVista found ([0-9,]+) results} ) {
    # The substring will be like "AltaVista found 2,345 results"
    print "$word: $1\n";
  } else {
    print "Couldn't find the match-string in the response\n";
  }
Sending GET Form Data

Some HTML forms convey their form data not by sending the data in an HTTP POST request, but by making a normal GET request with the data stuck on the end of the URL. For example, if you went to imdb.com and ran a search on Blade Runner, the URL you'd see in your browser window would be:


  http://us.imdb.com/Tsearch?title=Blade%20Runner&restrict=Movies+and+TV

To run the same search with LWP, you'd use this idiom, which involves the URI class:


  use URI;
  my $url = URI->new( 'http://us.imdb.com/Tsearch' );
    # makes an object representing the URL
  
  $url->query_form(  # And here the form data pairs:
    'title'    => 'Blade Runner',
    'restrict' => 'Movies and TV',
  );
  
  my $response = $browser->get($url);

See Chapter 5, "Forms" of Perl & LWP for a longer discussion of HTML forms and of form data, as well as Chapter 6 through Chapter 9 for a longer discussion of extracting data from HTML.

Absolutizing URLs

The URI class that we just mentioned above provides all sorts of methods for accessing and modifying parts of URLs (such as asking sort of URL it is with $url->scheme, and asking what host it refers to with $url->host, and so on, as described in the docs for the URI class. However, the methods of most immediate interest are the query_form method seen above, and now the new_abs method for taking a probably relative URL string (like "../foo.html") and getting back an absolute URL (like "http://www.perl.com/stuff/foo.html"), as shown here:


  use URI;
  $abs = URI->new_abs($maybe_relative, $base);

For example, consider this program that matches URLs in the HTML list of new modules in CPAN:


  use strict;
  use warnings;
  use LWP 5.64;
  my $browser = LWP::UserAgent->new;
  
  my $url = 'http://www.cpan.org/RECENT.html';
  my $response = $browser->get($url);
  die "Can't get $url -- ", $response->status_line
   unless $response->is_success;
  
  my $html = $response->content;
  while( $html =~ m/<A HREF=\"(.*?)\"/g ) {    
      print "$1\n";  
  }

When run, it emits output that starts out something like this:


  MIRRORING.FROM
  RECENT
  RECENT.html
  authors/00whois.html
  authors/01mailrc.txt.gz
  authors/id/A/AA/AASSAD/CHECKSUMS
  ...

However, if you actually want to have those be absolute URLs, you can use the URI module's new_abs method, by changing the while loop to this:


  while( $html =~ m/<A HREF=\"(.*?)\"/g ) {    
      print URI->new_abs( $1, $response->base ) ,"\n";
  }

(The $response->base method from HTTP::Message is for returning the URL that should be used for resolving relative URLs--it's usually just the same as the URL that you requested.)

That program then emits nicely absolute URLs:


  http://www.cpan.org/MIRRORING.FROM
  http://www.cpan.org/RECENT
  http://www.cpan.org/RECENT.html
  http://www.cpan.org/authors/00whois.html
  http://www.cpan.org/authors/01mailrc.txt.gz
  http://www.cpan.org/authors/id/A/AA/AASSAD/CHECKSUMS
  ...

See Chapter 4, "URLs", of Perl & LWP for a longer discussion of URI objects.

Of course, using a regexp to match hrefs is a bit simplistic, and for more robust programs, you'll probably want to use an HTML-parsing module like HTML::LinkExtor, or HTML::TokeParser, or even maybe HTML::TreeBuilder.

Other Browser Attributes

LWP::UserAgent objects have many attributes for controlling how they work. Here are a few notable ones:

For more options and information, see the full documentation for LWP::UserAgent.

Writing Polite Robots

If you want to make sure that your LWP-based program respects robots.txt files and doesn't make too many requests too fast, you can use the LWP::RobotUA class instead of the LWP::UserAgent class.

LWP::RobotUA class is just like LWP::UserAgent, and you can use it like so:


  use LWP::RobotUA;
  my $browser = LWP::RobotUA->new(
    'YourSuperBot/1.34', 'you@yoursite.com');
    # Your bot's name and your email address

  my $response = $browser->get($url);

But HTTP::RobotUA adds these features:

For more options and information, see the full documentation for LWP::RobotUA.

Using Proxies

In some cases, you will want to (or will have to) use proxies for accessing certain sites or for using certain protocols. This is most commonly the case when your LWP program is running (or could be running) on a machine that is behind a firewall.

To make a browser object use proxies that are defined in the usual environment variables (HTTP_PROXY), just call the env_proxy on a user-agent object before you go making any requests on it. Specifically:


  use LWP::UserAgent;
  my $browser = LWP::UserAgent->new;
  
  # And before you go making any requests:
  $browser->env_proxy;

For more information on proxy parameters, see the LWP::UserAgent documentation, specifically the proxy, env_proxy, and no_proxy methods.

HTTP Authentication

Many Web sites restrict access to documents by using "HTTP Authentication". This isn't just any form of "enter your password" restriction, but is a specific mechanism where the HTTP server sends the browser an HTTP code that says "That document is part of a protected 'realm', and you can access it only if you re-request it and add some special authorization headers to your request".

For example, the Unicode.org administrators stop email-harvesting bots from harvesting the contents of their mailing list archives by protecting them with HTTP Authentication, and then publicly stating the username and password (at http://www.unicode.org/mail-arch/)--namely username "unicode-ml" and password "unicode".

For example, consider this URL, which is part of the protected area of the Web site:


  http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html

If you access that with a browser, you'll get a prompt like "Enter username and password for 'Unicode-MailList-Archives' at server 'www.unicode.org'", or in a graphical browser, something like this:

In LWP, if you just request that URL, like this:


  use LWP 5.64;
  my $browser = LWP::UserAgent->new;

  my $url =
   'http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html';
  my $response = $browser->get($url);

  die "Error: ", $response->header('WWW-Authenticate') || 
    'Error accessing',
    #  ('WWW-Authenticate' is the realm-name)
    "\n ", $response->status_line, "\n at $url\n Aborting"
   unless $response->is_success;

Then you'll get this error:


  Error: Basic realm="Unicode-MailList-Archives"
   401 Authorization Required
   at http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html
   Aborting at auth1.pl line 9.  [or wherever]

because the $browser doesn't know any the username and password for that realm ("Unicode-MailList-Archives") at that host ("www.unicode.org"). The simplest way to let the browser know about this is to use the credentials method to let it know about a username and password that it can try using for that realm at that host. The syntax is:


  $browser->credentials(
    'servername:portnumber',
    'realm-name',
    'username' => 'password'
  );

In most cases, the port number is 80, the default TCP/IP port for HTTP; and you usually call the credentials method before you make any requests. For example:


  $browser->credentials(
    'reports.mybazouki.com:80',
    'web_server_usage_reports',
    'plinky' => 'banjo123'
  );

So if we add the following to the program above, right after the $browser = LWP::UserAgent->new; line:


  $browser->credentials(  # add this to our $browser 's "key ring"
    'www.unicode.org:80',
    'Unicode-MailList-Archives',
    'unicode-ml' => 'unicode'
  );

and then when we run it, the request succeeds, instead of causing the die to be called.

Accessing HTTPS URLs

When you access an HTTPS URL, it'll work for you just like an HTTP URL would--if your LWP installation has HTTPS support (via an appropriate Secure Sockets Layer library). For example:


  use LWP 5.64;
  my $url = 'https://www.paypal.com/';   # Yes, HTTPS!
  my $browser = LWP::UserAgent->new;
  my $response = $browser->get($url);
  die "Error at $url\n ", $response->status_line, "\n Aborting"
   unless $response->is_success;
  print "Whee, it worked!  I got that ",
   $response->content_type, " document!\n";

If your LWP installation doesn't have HTTPS support set up, then the response will be unsuccessful, and you'll get this error message:


  Error at https://www.paypal.com/
   501 Protocol scheme 'https' is not supported
   Aborting at paypal.pl line 7.   [or whatever program and line]

If your LWP installation does have HTTPS support installed, then the response should be successful, and you should be able to consult $response just like with any normal HTTP response.

For information about installing HTTPS support for your LWP installation, see the helpful README.SSL file that comes in the libwww-perl distribution.

Getting Large Documents

When you're requesting a large (or at least potentially large) document, a problem with the normal way of using the request methods (like $response = $browser->get($url)) is that the response object in memory will have to hold the whole document--in memory. If the response is a 30-megabyte file, this is likely to be quite an imposition on this process's memory usage.

A notable alternative is to have LWP save the content to a file on disk, instead of saving it up in memory. This is the syntax to use:


  $response = $ua->get($url,
                         ':content_file' => $filespec,
                      );

For example,


  $response = $ua->get('http://search.cpan.org/',
                         ':content_file' => '/tmp/sco.html'
                      );

When you use this :content_file option, the $response will have all the normal header lines, but $response->content will be empty.

Note that this ":content_file" option isn't supported under older versions of LWP, so you should consider adding use LWP 5.66; to check the LWP version, if you think your program might run on systems with older versions.

If you need to be compatible with older LWP versions, then use this syntax, which does the same thing:


  use HTTP::Request::Common;
  $response = $ua->request( GET($url), $filespec );

Resources

Remember, this article is just the most rudimentary introduction to LWP--to learn more about LWP and LWP-related tasks, you really must read from the following:


Copyright 2002, Sean M. Burke. You can redistribute this document and/or modify it, but only under the same terms as Perl itself.

[Mar 14, 2012] Perl vs Python Why the debate is meaningless "

If you want an interactive shell in perl, the easiest way is to fire up the perl debugger in command mode. perl -de 1

August 22, 2009 | The ByteBaker

AaronD

If you want an interactive shell in perl, the easiest way is to fire up the perl debugger in command mode. Just run

perl -de 1
Be careful declaring lexical my vars this way. Scope becomes a single line.

Aaron

Network Five Habits for Successful Regular Expressions

O'Reilly

The extended whitespace feature of most regex implementations allows programmers to extend their regular expressions over several lines, with comments at the end of each. Why do so few programmers use this feature? Perl 6 regular expressions, for example, will be in extended whitespace mode by default. Until your language makes extended whitespace the default, turn it on yourself.

The only trick to remember with extended whitespace is that the regex engine ignores whitespace. So if you are hoping to match whitespace, you have to say so explicitly, often with \s.

In Perl, add an x to the end of the regex, so m/foo|bar/ becomes:

m/
  foo
  |
  bar
 /x

...

The value of whitespace and comments becomes more important when working with more complex regular expressions. Consider the following regular expression to match a U.S. phone number:

\(?\d{3}\)? ?\d{3}[-.]\d{4}

This regex matches phone numbers like "(314)555-4000". Ask yourself if the regex would match "314-555-4000" or "555-4000". The answer is no in both cases. Writing this pattern on one line conceals both flaws and design decisions. The area code is required and the regex fails to account for a separator between the area code and prefix.

Spreading the pattern out over several lines makes the flaws more visible and the necessary modifications easier.

In Perl this would look like:

/ 
    \(?     # optional parentheses
      \d{3} # area code required
    \)?     # optional parentheses
    [-\s.]? # separator is either a dash, a space, or a period.
      \d{3} # 3-digit prefix
    [-.]    # another separator
      \d{4} # 4-digit line number
/x

The rewritten regex now has an optional separator after the area code so that it matches "314-555-4000." The area code is still required. However, a new programmer who wants to make the area code optional can quickly see that it is not optional now, and that a small change will fix that.

2. Write Tests

There are three levels of testing, each adding a higher level of reliability to your code. First, you need to think hard about what you want to match and whether you can deal with false matches. Second, you need to test the regex on example data. Third, you need to formalize the tests into a test suite.

Deciding what to match is a trade-off between making false matches and missing valid matches. If your regex is too strict, it will miss valid matches. If it is too loose, it will generate false matches. Once the regex is released into live code, you probably will not notice either way. Consider the phone regex example above; it would match the text "800-555-4000 = -5355". False matches are hard to catch, so it's important to plan ahead and test.

Sticking with the phone number example, if you are validating a phone number on a web form, you may settle for ten digits in any format. However, if you are trying to extract phone numbers from a large amount of text, you might want to be more exact to avoid a unacceptable numbers of false matches.

When thinking about what you want to match, write down example cases. Then write some code that tests your regular expression against the example cases. Any complicated regular expression is best written in a small test program, as the examples below demonstrate:

In Perl:

#!/usr/bin/perl

my @tests = ( "314-555-4000",
              "800-555-4400",
	      "(314)555-4000",
              "314.555.4000",
              "555-4000",
              "aasdklfjklas",
              "1234-123-12345"         
            );

foreach my $test (@tests) {
    if ( $test =~ m/
                   \(?     # optional parentheses
                     \d{3} # area code required
                   \)?     # optional parentheses
                   [-\s.]? # separator is either a dash, a space, or a period.
                     \d{3} # 3-digit prefix
                   [-\s.]  # another separator
                     \d{4} # 4-digit line number
                   /x ) {
        print "Matched on $test\n";
     }
     else {
        print "Failed match on $test\n";
     }
}

.... ... ...

Running the test script exposes yet another problem in the phone number regex: it matched "1234-123-12345". Include tests that you expect to fail as well as those you expect to match.

Ideally, you would incorporate these tests into the test suite for your entire program. Even if you do not have a test suite already, your regular expression tests are a good foundation for a suite, and now is the perfect opportunity to start on one. Even if now is not the right time (really, it is!), you should make a habit to run your regex tests after every modification. A little extra time here could save you many headaches.

3. Group the Alternation Operator

The alternation operator (|) has a low precedence. This means that it often alternates over more than the programmer intended. For example, a regex to extract email addresses out of a mail file might look like:

^CC:|To:(.*)

The above attempt is incorrect, but the bugs often go unnoticed. The intent of the above regex is to find lines starting with "CC:" or "To:" and then capture any email addresses on the rest of the line.

Unfortunately, the regex doesn't actually capture anything from lines starting with "CC:" and may capture random text if "To:" appears in the middle of a line. In plain English, the regular expression matches lines beginning with "CC:" and captures nothing, or matches any line containing the text "To:" and then captures the rest of the line. Usually, it will capture plenty of addresses and nobody will notice the failings.

If that were the real intent, you should add parentheses to say it explicitly, like this:

(^CC:)|(To:(.*))

However, the real intent of the regex is to match lines starting with "CC:" or "To:" and then capture the rest of the line. The following regex does that:

^(CC:|To:)(.*)

This is a common and hard-to-catch bug. If you develop the habit of wrapping your alternations in parentheses (or non-capturing parentheses -- (?:...)) you can avoid this error.

4. Use Lazy Quantifiers

Most people avoid using the lazy quantifiers *?, +?, and ??, even though they are easy to understand and make many regular expressions easier to write.

Lazy quantifiers match as little text as possible while still aiding the success of the overall match. If you write foo(.*?)bar, the quantifier will stop matching the first time it sees "bar", not the last time. This may be important if you are trying to capture "###" in the text "foo###bar+++bar". A regular quantifier would have captured "###bar+++".

Let's say you want to capture all of the phone numbers from an HTML file. You could use the phone number regular expression example we discussed earlier in this article. However, if you know that the file contains all of the phone numbers in the first column of a table, you can write a much simpler regex using lazy quantifiers:

<tr><td>(.+?)<td>

Many beginning regular expression programmers avoid lazy quantifiers with negated character classes. They write the above code as:

<tr><td>([^>]+)</td>

That works in this case, but leads to trouble if the text you are trying to capture contains common characters from your delimiter (in this case, </td>). If you use lazy quantifiers, you will spend less time kludging character classes and produce clearer regular expressions.

Lazy quantifiers are most valuable when you know the structure surrounding the text you want to capture.

5. Use Available Delimiters

Perl and PHP often use the forward slash to mark the start and end of a regular expression. Python uses a variety of quotes to mark the start and end of a string, which may then be used as a regular expression. If you stick with the slash delimiter in Perl and PHP, you will have to escape any slashes in your regex. If you use regular quotes in Python, you will have to escape all of your backslashes. Choosing different delimiters or quotes allows to avoid escaping half of your regex. This makes the regex easier to read and reduces the potential for bugs when you forget to escape something.

Perl and PHP allow you to use any non-alphanumeric or whitespace character as a delimiter. If you switch to a new delimiter, you can avoid having to escape the forward slashes when you are trying to match URLs or HTML tags such as "http://" or "<br />".

For example:

/http:\/\/(\S)*/

could be rewritten as:

#http://(\S)*#

Common delimiters are #, !, |. If you use square brackets, angle brackets, or curly braces, the opening and closing brackets must match. Here are some common uses of delimiters:

## !! {}
s||| (Perl only) s[][] (Perl only) s<>// (Perl only)

In Python, regular expressions are treated as strings first. If you use quotes -- the regular string delimiter -- you will have to escape all of your backslashes. However, you can use raw strings, r'', to avoid this. If you use raw triple-quoted strings with the re.VERBOSE option, it allows you to include newlines.

For example:

regex = "(\\w+)(\\d+)"

could be rewritten as:

regex = r'''
           (\w+)
           (\d+)
         '''

[Aug 19, 2009] Checking perl syntax in VIM on each save

May 10, 2004
au BufWritePost *.pl,*.pm !perl -c %

Every time you save a .pl or .pm file, it executes perl -c and shows you the output.

~~
naChoZ

[Aug 18, 2009] RegExp reminder

Feb.22, 2006

I was just reminded about this small thing, which is so easy to forget regular expressions that have markers of line start (^) and/or line end($) are so much faster than those regexps that don't have these markers. The thing is that with line start/end marker regexp engine needs to make only one match/substution, whereas when there is no such markers, it has to repeat the match/substitution operation at every character of the string.

In practice, it's unbelievable how much difference this can make. Especially when using complex regular expressions over large data sets.

P.S.: I understand that it is not always possible to use these markers, but I think that they can be used much more often than they are. Everywhere.



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haters Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: October, 19, 2017