Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Grep and Map in Perl

News Recommended Links Reference Sort in Perl
Regular expressions History Humor Etc

There are two related functions in Perl grep that resembles Unix grep and map. Both are essentially an shorthand for the foreach loop. Thier main value is not a new functionality, but the ability to make the code more compact.  Both functions accept two arguments: the first is an expression a nd the second is the list or array.  Both are shorthand for foreach look.

The main difference between them is that grep can just select certain elements from the array or list while map can transform them into a new area. 

Notes:
  • Those pages are written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • This is a Spartan WHYFF (We Help You For Free) site. It cannot replace the best teachers and the best books.
  • The site contain some obsolete pages as it develops like a living tree... Some links on older pages are broken. Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.

Search Amazon by keywords:

Google   
Open directory

Research Index

 

Old News ;-)

SAGE ;login - Effective Perl Programming

Welcome to Effective Perl Programming, the column. In this and coming articles I'm going to discuss ways that you can use Perl more effectively, whether by mastering Perl idioms, using Perl modules, or finding new applications for Perl programs. I begin by covering two very powerful but underused language features: Perl's map operator and its cousin, the grep operator. I'll cover the basics quickly, then show you some useful techniques and a few neat tricks as well.

The Basics of map and grep

The map operator creates a transformed copy of a list by evaluating a specified expression or code block for each element in the list. Each time the expression or block is evaluated, $_ contains the value of the current element. The result from the expression or block is appended to the list returned by map. The syntax for map looks like this:

@result = map expression, list
@result = map { code } list

Rewritten without using map, the effect is like this:

@result = ();
foreach (list) { push @result, expression; }

For example:

@times_ten = map $_ * 10, 1..10;

returns the list (10, 20, 30, 40, 50, 60, 70, 80, 90, 100), and

@uppercased = map { ucfirst $_ } qw(george jane judy elroy);

returns the list ('George', 'Jane', 'Judy', 'Elroy'). The transform expression (or block) is evaluated in a list context. It does not have to return a single element ­ it can return two or more or none at all (an empty list). More on that later.

The grep operator resembles the map operator syntactically:

@result = grep expression, list
@result = grep { code } list

However, unlike the map operator, which constructs a transformed copy of a list, the grep operator selects items from a list. The selection expression (or block) is evaluated for each element of its argument, with $_ set to the current element. If the result is true (anything other than the empty string or the string '0'), a copy of the element is appended to the result from grep. For example:

# Returns (2, 4, 6, 8, 10).
@even = grep { not $_ % 2 } 1..10;

# Returns a list of text files in the current directory.
@text_files = grep -T, glob "*";

# Classic grep -- imitating Unix grep. Prints lines containing the
# word 'Joseph'.

print grep /\bJoseph\b/, <>;

The grep operator has been around for a long time, but the map operator is new in Perl 5 (as much as anything that is four years old can be called "new," anyway). The map operator is more versatile and can do anything that grep can:

# Another way to get a list of text files.
@text_files = map { (-T) ? $_ : () } glob "*";

map and grep Idioms

The map operator is obviously useful for simple one-to-one transformations:

# Print out the contents of a hash.

print map "$_: $hash{$_}\n", sort keys %hash;

Be careful, though: this approach creates a lot of temporary structures in memory. For a very large hash it would be more appropriate to use an each loop:

while (($key, $val) = each %hash) { print "$key: $val\n" }

Using map to construct hashes is an important idiom. You can construct existence hashes that are used to test whether a particular value has been seen; in this case, set all the values in the hash to 1 (or some other "true" value). You can also use map to construct hashes where the value is computed from the key. To use map to construct a hash, return two values for each original element ­ the key and its corresponding value.

# Create keys for all the "words" in $text, so that we can test for
# a word later with if $seen{$word}.

%words_seen = map { $_ => 1 } split /\s+/, $text;

# After this, $file_size{$file} gives -s $file -- saves time if
# we need to use it more than once.

%file_size = map { $_ => -s } @files;

The map operator is handy for "nesting" and "slicing" multidimensional data structures. Using an anonymous array (or hash) constructor inside map creates nested structures. For example, you can blend parallel arrays into a single 2-d structure:

# Blend @x, @y, and @z into a single 2-d array @xyz ... $xyz[0][0]
# is $x[0], $xyz[0][1] is $y[0], and so on.

@xyz = map [$x[$_], $y[$_], $z[$_]], 0..$#x;

You can use the same technique to create a hash of arrays:

# Cache the results from stat into a hash of arrays ... then
# $info{'file'}[7] gives the size of 'file', $info{'file'}[5]
# gives the owner's uid, and so on.

%info = map { $_, [ stat $_ ] } @files;

Extracting a slice of a nested structure is just as easy. Just use a subscript inside map:

# This will extract @x from @xyz (undoing what we did above) ...
# $x[0] is $xyz[0][0], $x[1] is $xyz[1][0], and so on.

@x = map $_->[0], @xyz;

The grep operator isn't as versatile as map, but it is usually the most succinct way to select items from a list. Don't forget that it can be used on complex structures:

# Select elements from @xyz whose "coordinates" are all >0.
# @gt_zero is still a 2-d array with the same organization as @xyz.

@gt_zero = grep {$_->[0] > 0 and $_->[1] > 0 and $_->[2] > 0} @xyz;

Cool Tricks with map

You can use map to read several lines of input at a time:

# Read 10 lines from STDIN.
@ten_lines = map scalar(<STDIN>), 1..10;

The "Schwartzian Transform" (named after fellow Perl trainer and author Randal L. Schwartz) is a sort surrounded by maps. It is generally preferred over other techniques when the sorting process requires time-intensive key transformations:

# Sort files in descending order of size.

@files_by_size =
map { $_->[0] } # 3. slice out the original list, now sorted
sort { $b->[1] <=> $a->[1] } # 2. sort the list of tuples
map { [$_, -s $_] } # 1. create a list of tuples by nesting
@files; # the data to be sorted

You can use map for some set operations. Here is an example of using it to find the elements in one hash (%hash1) that are not in another hash (%hash2). Depending on the relative sizes of the hashes involved, this can be more efficient than other methods (like using the delete operator):

# keys %result contains 2 4 6 7 8 9 when this is done.

%hash1 = map { $_, 1 } 1..9;    # some sample data
%hash2 = map { $_, 1 } 1, 3, 5; # more sample data
%result = map { $_, $hash1{$_} } grep { not exists $hash2{$_} } keys %hash1;

# Another way to do the same thing, with delete.

%result = %hash1;
delete @result{keys %hash2};

Because map's transform expression is evaluated in a list context, using map in combination with a pattern match that contains some parentheses can produce unusually succinct code:

# Create a hash of user name vs. user id from lines in /etc/passwd.

open PASSWD, "/etc/passwd" or
die "couldn't open password file: $!\n";
%name_to_id = map /(.*?):.*?:(.*?):/, <PASSWD>;

The map operator can even be useful for some string operations:

# Convert a string like 'ABC' into its
# hex equivalent, '\x41\x42\x43'.

$hexed = join '', map { sprintf "\\x%x", ord $_ } split //, $str;

# An alternative using s///, which is slightly slower
# for long strings.

($hexed = $str) =~ s/(.)/sprintf "\\x%x", ord $1/ge;

That should be enough for now. I hope you've enjoyed this little tour of map and grep. My next column will be something of a change of pace ­ I will introduce object-oriented programming in Perl.

Use the map function to make your code more concise:

One of Perl's less commonly used functions (at least, by novices) is the map function. Learning to use this function effectively can make you a more efficient programmer by eliminating the need to code many of the mindless iterative loops that occur in the course of any programming project.

How many times do you have an array of values, and you want to do an operation on each of them -- perhaps to print them all out, each indented and on a separate line? The brute force method that most novice Perl weenies use might look something like this:

print "ARGV[] is:\n";
for ($i=0; $i<$#ARGV; $i++) {
    print "  '$ARGV[$i]'\n";
}
print "--------\n";

This code snippet, saved to file t.1 and run from the command line, results in output like the following:

bash) perl ./t.1 foo bar baz blah
ARGV[] is:
  'foo'
  'bar'
  'baz'
  'blah'
--------

Sure, it works, but that's an awful lot of code to do such a simple operation. Plus, if you're running with use strict; (as you should be!), you've got to declare the variable $i. Let's try the same using the map function. Quoth the camel:

map BLOCK LIST
map EXPR, LIST
This function evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation.

In plainer terms, map takes an array, returns a new array in which each of the elements has changed to whatever is specified in the BLOCK or EXPR.

Here's that same example, written using map:

print "ARGV[] is:\n",
      map("  '$_'\n", @ARGV),
      "--------\n";

Save this to a file t.2, run it, and you'll see the same results as from the previous example. Notice, there's no need to declare a loop variable, set up a loop, etc. With this trivial operation, the difference is minimal, but as you start doing more complicated operations on the array elements, you'll really see a difference in the compactness of the code you need to write. (Some may find this a code a little dense, but trust me, the more you work with it, the more you'll see it as a clearer way to do it.)

For those of you who want more, notice all of the explicit newlines ("\n") in the above. Let's use the join function to eliminate the need for specifying all of these:

print join "\n",
           "ARGV[] is:",
           map("  '$_'", (@ARGV)),
           "--------",
           "";        # This one causes a trailing newline

And for those of you writing CGI applications, who are familiar with Lincoln Stein's excellent CGI.pm module, consider the following:

use CGI qw/:html/;

print div({ALIGN => 'CENTER'},
          table({BORDER => 1}, join "\n",
             caption("ARGV[] List:"),
             map( { TR(td($_))} @ARGV)
          )    # End-table
      ),      # End-div
      "";     # This one causes a trailing newline

Way cool.

Perl's map function can come in handy when you need to simplify potentially repetitive operations, such as capitalizing strings of text. In this article, we'll offer several examples of how you can put map to work. Then, we'll turn our attention to Perl's parsing capabilities with a look at various ways you can parse your program's command line to extract switches or other information.

The power of map
Perl offers many functions that help to simplify and shorten code.

Among the more powerful is map, which takes a list, evaluates a specified block or expression on each element, and then returns a list of all the results. Inside the block, map locally assigns $_ as an alias to the current list item.

One of the simplest uses of map is to capitalize an entire array by applying the uc function to each element:
    @caps = map uc, @phrases;

In the next sample, mapping a regular expression to the array returns the first word of every phrase:
    @first_word = map { /(\S+)/ } @phrases;

Each element need not necessarily map to a single item. If multiple values are created, map returns them all as a single, flattened list. For example, you could split all words in all phrases into a single list:
    @words = map split, @phrases;

Still another use for map might be to convert a string to title case. You can do this by splitting a string into individual words, converting each to lowercase and then initial capitalization, and finally joining the words back into a single string:
    $title = join ' ', map { ucfirst lc } split / /, $name;

Our final example uses map to put the sorted key/value pairs of a hash into a two-column HTML table:
    print "<table>\n";
    print map {"<tr><td>$_</td><td>$hash{$_}</td></tr>\n"} sort keys %hash;
    print "</table>\n";


Command-line parsing
When you need to determine the command-line switches passed into a Perl program, you can take various approaches. An easy way of identifying expected Boolean switches is to loop through @ARGV, setting a flag for each option that is encountered:
    foreach $arg (@ARGV) {
        $a = 1, next if $arg eq '-a';
        $b = 1, next if $arg eq '-b';
        $c = 1, next if $arg eq '-c';
    }


Another simple option is to use Perl's -s switch. In this case, Perl will create variables named the same as each switch and then remove them from @ARGV. For example:
    perl -s prog.pm -a -b -c

When prog.pm is executed, the variables $a, $b, and $c are all defined and set to 1. Only the switches listed before any nonswitch argument or "--" will be handled. Therefore, the following may not work as desired:
    perl -s prog.pm -a -b 13 -c 6/6/2001

Here, $a and $b are set to 1 and @ARGV contains ('13', '-c', '6/6/2001'). To have $b set to 13 and $c set to 6/6/2001, the command line could be entered as:
    perl -s prog.pm -a -b=13 -c=6/6/2001

A more robust alternative to using -s is to use either Getopt::Std or Getopt::Long. These modules will parse the command line and set global variables (named the same as the switch but prefixed with opt_) for each option. In the following sample, -a is a Boolean option, -b requires that an integer be specified, -c requires a string, and -d accepts an optional string:
    use Getopt::Long;
    GetOptions("a!", "b=i", "c=s", "d:s");


The variables set would be $opt_a, $opt_b, $opt_c, and $opt_d, respectively.
 


[Courses] [Perl] Part 17: grep and map

Dan Richter daniel.richter at wimba.com
Fri Nov 28 13:01:50 EST 2003


LinuxChix Perl Course Part 17: grep and map

1) Introduction
2) grep - filters a list
3) map - transforms the values of a list
4) What "grep" and "map" have in common
5) Exercises
6) Answer to Previous Exercise
7) Past Information
8) Credits
9) Licensing

             -----------------------------------

1) Introduction

Before we finish looking at arrays in Perl, I thought we should take a 
quick look at two handy Perl functions: "grep" and "map". Both functions 
are technically operators because they allow you to do magical things 
that a function can't do, but syntactically they look like functions, so 
we refer to them as functions here.

Let me add that this will be the last e-mail before January. It's that 
busy-busy-busy time of year and I'm afraid that I have no more time to 
write about Perl than you have to read about it.

             -----------------------------------

2) grep - filters a list

The "grep" function returns only the elements of a list that meet a 
certain condition:

   @positive_numbers = grep($_ > 0, @numbers);

As you can see, each element is refered to as "$_". This (plus the fact 
that parentheses are optional) allows you write commands that look 
similar to invocations of the Unix "grep" program:

   @non_blank_lines = grep /\S/, @lines;

In addition, you can specify a code block rather than a single condition:

   @non_blank_lines = grep { /\S/ } @lines;     # Equivalent to the above.

Obviously it doesn't matter in this case, but code blocks are helpful 
when you want a complex filter with multiple lines of code. The result 
of the code block is the result of the last statement executed:

   # All positive numbers can be used as exponents,
   # but negative exponents must be integers.
   @can_be_used_as_exponent = grep {
     if ( $_ < 0 ) {
       ! /\./;          # No decimal point -> integer.
     }
     else {
       1;               # Always true.
     }
   } @array;

             -----------------------------------

3) map - transforms the values of a list

The "map" function applies a transformation to each element of a list 
and returns the result, leaving the original list unchanged (unless you 
mess it up; more on that in a moment).

   @lines_with_newlines = map( $_ . "\n", @lines_without_newlines);

As with "grep", each value in the list is refered to as "$_".

"map" can also take a block of code:

   # Replace "x at y.z" with "x at y dot z" to confuse spammers.
   @disguised_addresses = map {
       my $email = $_;
       $email =~ s/\@/ at /;
       $email =~ s/\./ dot /g;
       $email;
     } @email_addresses;

Note that it's important not to change "$_" because that would change 
the original "@email_addresses" (and you wouldn't get what you wanted in 
"@disguised_addresses").

"map" needs not be a one-to-one mapping. For example, in the following code:

   @words   =   map   m/\b(\w+)\b/g,   @lines;    # Spaces are for clarity.

the regular expression splits a string into a list of words. The "map" 
function returns the result of joining all the small lists. If a line 
contains no words, the regular expression will return an empty list, and 
that's okay.

             -----------------------------------

4) What "grep" and "map" have in common

"grep" and "map" have a lot in common. They both "magically" take a 
piece of code (either an expression or a code block) as a parameter. You 
need to put a comma after an expression but shouldn't put a comma after 
a code block.

Changing "$_" in "grep" or "map" will change the original list. This 
isn't generally a good idea because it makes the code hard to read. 
Remember that "map" builds a list of results by evaluating an 
expression, NOT by setting "$_".

A side effect of this fact is that you should not use "s///" with "map". 
The "s///" operator changes "$_" rather than returning a result, so you 
won't get what you would expect if you use it with "map" (and you 
CERTAINLY shouldn't use it with "grep").

             -----------------------------------

5) Exercises

a) Write some Perl code that, given a list of numbers, generates a list 
of square roots of those numbers. (The square root function in Perl is 
"sqrt".)

b) Modify the code to filter out any negative numbers. The result should 
be as though the negative numbers were never in the original list.

c) Write a Perl program that reads two files and outputs only the lines 
that are common to both of them.

             -----------------------------------

6) Answer to Previous Exercise

The following program reads the password file and outputs a list of 
usernames and UIDs, ordered by username:

   #!/usr/bin/perl -w
   use strict;

   open FILE, '< /etc/passwd' or die "Couldn't open file: $!";
   my @data = sort(<FILE>);
   close FILE;

   my @result;
   foreach (@data) {
     my @fields = split(/:/);    # Equivalent to split(/:/, $_)
     push @result, $fields[0] . ' -> ' . $fields[2];
   }

   print join("\n", at result) . "\n";

The above program is a nice review of Perl functions. But of course, 
There Is More Than One Way To Do It, and we could replace the bottom 
half with:

   foreach (@data) {
     s/^(.*?):.*?:(\d*):.*$/$1 -> $2/;
   }
   print join("\n", at result) . "\n";

Or to make the program really short:

   $_ = join '', @data;
   s/^(.*?):.*?:(\d*):.*$/$1 -> $2/gm;
   print;        # Prints "$_"

-----------------------------------

7) Past Information

Part 16: Array Functions
      http://linuxchix.org/pipermail/courses/2003-November/001359.html

Part 15: More About Lists
      http://linuxchix.org/pipermail/courses/2003-November/001351.html

Part 14: Arrays
      http://linuxchix.org/pipermail/courses/2003-October/001350.html

Part 13: Perl Style
      http://linuxchix.org/pipermail/courses/2003-October/001349.html

Part 12: Side Effects with Perl Variables
      http://linuxchix.org/pipermail/courses/2003-October/001347.html

Part 11: Perl Variables
      http://linuxchix.org/pipermail/courses/2003-October/001345.html

Parts 1-10: see the end of:
      http://linuxchix.org/pipermail/courses/2003-October/001345.html

             -----------------------------------

8) Credits

Works cited:
a) man perlfunc
b) Kirrily Robert, Paul Fenwick and Jacinta Richardson's
    "Intermediate Perl", which you can find (along with their
    "Introduction to Perl") at:
    http://www.perltraining.com.au/notes.html

Thanks to Jacinta Richardson for fact checking.

             -----------------------------------

9) Licensing

This course (i.e., all parts of it) is copyright 2003 by Alice Wood and 
Dan Richter, and is released under the same license as Perl itself 
(Artistic License or GPL, your choice). This is the license of choice to 
make it easy for other people to integrate your Perl code/documentation 
into their own projects. It is not generally used in projects unrelated 
to Perl.


Map Function

The road to better programming Chapter 4

The map() function

The map() function is like a rubber stamp applied to all the elements of a list (see "perldoc -f map" for more information). It consists of two parts: a block or an expression, and a list:


Listing 1. How map() works
 

# map {BLOCK GOES HERE} LIST;
map {$_++} @p;                          # increment every element of @p 
# map EXPRESSION, LIST;
map -f,@p;                              # file test every element of @p

 

The foreach() loop will usually do better than map() in benchmarks, so don't use map() for CPU-intensive calculations unless you test its performance. Do use it when it makes your code more elegant and simple without a loss of efficiency.

There are many neat things that map() can do. First of all, it can modify the array elements as it goes through them by changing the $_ variable. Inside the block or (less commonly) the expression, $_ is the current element of the list. You don't know which element you are looking at -- the whole point is to map one single function to all the elements independently. It is my experience that in about 80% of loops over an array you don't need to know the offset of the current element inside the array. Thus, map() can improve coding efficiency and style by forcing the programmer to think independently of array offsets.


Listing 2. foreach() vs. map()
 


# "normal" (procedural)  way
foreach (sort keys %ENV)
{
 print "$_ => $ENV{$_}\n";
}
# FP way
map { print "$_ => $ENV{$_}\n" } sort keys %ENV;

In Listing 2, you can see how the FP way is not fundamentally different, yet manages to convey a flow of functions from right to left. First the list of keys is obtained, then it is sorted, then the print() function is applied to each element of the sorted key list.


Listing 3. Modifying a list on the fly with map()
 



# these are the users
@users = ('joe', 'ted', 'larry');
# and this is an on-the-fly substitution of user names with hash references
map { $_ = { $_ => length } } @users;
# @users is now ( { 'joe' => 3 },
#                 { 'ted' => 3 },
#                 { 'larry' => 5 } )

 

Listing 3 demonstrates how the list passed to map() can be completely rewritten. In this case, the array @users contained only user names, but after map() was applied, the array contained hash references with one key-value pair, username => byte length of user name. How about quickly filling in file information?


Listing 4. Modifying a list on the fly with map(), part 2
 


use File::stat;
use Data::Dumper;
@files = ('/etc/passwd', '/etc/group', '/etc/fstab', '/etc/vfstab');
print Dumper 
     map { $sb = stat $_; 
           $_ = (defined $sb) ? 
                { 
                  name => $_,  
                  size =>$sb->size(), 
                  mtime => $sb->mtime() 
                } : 
                undef
         } @files;

 

In Listing 4 we create a list of files, and then in one statement create a list of hashes with entries for the name, size, and mtime (modification time) of each file. Furthermore, non-existent files get an "undef" reference instead of a hash reference. Finally, Data::Dumper is used to print out a nice view of the entire product list.

Needless to say, code like Listing 4 should be heavily documented for other people's sake. Don't try to cram it all into one line, either. Elegant code is just an ugly duckling without proper formatting and comments.


 

 
Back to top


 

The grep() function

For anyone who has used UNIX, the grep() function is simple to learn and use. It acts just like the grep utility -- elements that satisfy a test pass through, while everything else gets dropped.

The syntax of grep() is just like map(). A block or an expression can be passed, and $_ is aliased to the current element under examination. It is not a good idea to modify elements of the list passed to grep(). That's what map() is for. Use grep() to grep, and map() to map. The only exception to this rule is if you must create temporary hash fields or array entries while sorting, but make sure you remove them afterwards.


Listing 5. How grep() works
 
# grep {BLOCK GOES HERE} LIST;
grep {$_ > 1} @p;                       # only accept numbers more than 1
grep {$_++} @p;                         # please don't do this
# grep EXPRESSION, LIST;
grep /hi/,@p;                           # only accept matching elements
grep !/hi/,@p;                          # do not accept matching elements

It can be very convenient to use grep() for quick filtering, but remember that a foreach() loop may be faster under some circumstances. When in doubt, benchmark.


Listing 6. Using grep() to filter out odd numbers
 

my @list = (1, 2, 3, 'hi');
my @results;
# the procedural approach
foreach (@list)
{
 push @results, $_
  unless $_%2;
}
# using grep - FP approach
@results = grep { !($_ % 2) } @list;

 

Here is another example. Say we need to look in a directory and retrieve all the file names from it:


Listing 7. Using grep() to get all the filenames in a directory
 
opendir(DIR, ".") || die "can't opendir: $!"; # get the directory handle
my @f = grep { /^[^\.]/ && -f } readdir(DIR); # filter only files into @f

Line 1 of Listing 7 just opens the current directory, or exits the program with the appropriate notice.

Line 2 invokes the readdir() function, which returns a list of filenames, and runs a grep() that filters out hidden files (filenames must not begin with a "." character) and non-file objects such as directories.

In two lines we do as much work as four or five lines of a foreach() loop might do. Don't forget to comment such tight code; the short comments shown are not sufficient for production code. Sometimes, grep() is used in scalar context (for instance, to test if Perl interpreters are running [the Proc::ProcessTable module is from CPAN]):


Listing 8. Using grep() to get all the Perl processes running
 
use Proc::ProcessTable;                 # get this module from CPAN
use strict;
my $table = new Proc::ProcessTable;
my @procs;
if (@procs = grep { defined $_->cmndline && 
                            $_->cmndline =~ /^perl/ } 
                @{$table->table})
{
 print $_->pid, "\n" foreach @procs;
}
else
{
 print "No Perl interpreters seem to be running.\n";
}

 

Here, we simultaneously assign the return from grep() to the @procs array, and we test to see if it contained any elements at all. If there were no elements matching the pattern, we print a message to that effect. The same code with a foreach() loop would probably take five or six lines. In case it hasn't been said enough, code like Listing 8 should be commented well enough that someone else could look at it and immediately know the intent and effect of that code. It's no use writing production code if you are the only one who will ever be able to read it.
 

Sorting with map: the Schwartzian and Guttman-Rosler transforms

The sort() function in Perl is "sort of" procedural, in that it takes a code block or a function name and uses it to sort all elements. The comparison function has to be written as if it were only looking at two elements -- it doesn't know which ones specifically out of the whole list. Like map() and grep(), sort() deals with references to the values being compared, so modifying them would modify the values being compared. Don't do this (for more information on the sort() function, see "perldoc -f sort").

Perl's sorting abilities are remarkably simple to use. In its simplest form, a sort can be done like this:

Listing 9. The default sort()
 

@new_list = sort @old_list;             # sort @old_list alphabetically

 

The default sort uses simple string comparisons on all the scalars in the list. This is fine if the list contains dictionary words to be sorted alphabetically, but not so great if the list contains numbers. Why? Because "1" comes before "010" in a string comparison, for example. Numbers have to be compared by value, not as strings.

Fortunately, this is easy to do and is in fact a common Perl idiom:


Listing 10. The numeric sort()
 
@old_list = (1, 2, 5, 200, '010');
@new_list = sort { $a <=> $b } @old_list; # sort @old_list numerically

 

I quoted 010, because in Perl numbers that begin with 0 are interpreted as octal, so 010 octal would have been 8 decimal. Try it without the quotes and see for yourself. This also demonstrates how Perl scalars are automatically converted to numbers when necessary.

If you run the default sort from Listing 9 on the @old_list in Listing 10, you will see that 200 is before 5, for example. That's what we are trying to avoid.

To reverse the sorted list, you can either apply the reverse() function to the list after it's sorted, or you can change the sorting function. For example, the reverse sort of the one in Listing 10 would have the comparison code block be { $b <=> $a }.

See "perldoc perlop" for more information on the <=> and cmp operators, which are essential to all sorting in Perl. The cmp operators are what's used in the default search in Listing 9 behind the scenes.

Well, fine, so we can sort scalars. That's not enough -- most sorting is done on data structures such as arrays and hashes. Perl supports almost any kind of sorting because of its flexible syntax. For instance, say we need to sort a bunch of hash references, where the 'name' key in the hash is the sorting field. We want a regular alphabetical sorting order, so the cmp operator should be used:


Listing 11. The sort by a hash member
 
# create a list with two hash references
@old_list = ( { name => "joe" }, { name => "moe" } );
# sort @old_list by the value of the 'name' key
@new_list = sort { $a->{name} cmp $b->{name} } @old_list;

 

Now we get into the interesting stuff. What if it's expensive to obtain data from the objects being sorted? Say we need to apply the split() function to a string every time we need to obtain the value to sort by. It would be computationally expensive to run a split() every time the comparison value is needed, and your co-workers would laugh at you. You could build a temporary list of the comparison values, but that's not so easy to do and can easily introduce bugs. You are probably better off using the Schwartzian transform, named after Randal Schwartz.

The Schwartzian transform looks like this:


Listing 12. The Schwartzian transform
 
# create a list with some strings
@old_list = ( '5 eagles', '10 hawks', '2 bulls', '8 cows');
# sort @old_list by the first word in each string, numerically
@new_list = map($_->[1], 
                sort( { $a->[0] <=> $b->[0] }
                       map( [ (split)[0], $_ ], 
                            @old_list)));

 

Look at it from right to left. First, we apply a map() to the @old_list list to create a new temporary list. Remember that map() can transform a list. In this case, we rewrite @old_list to contain an array consisting of the first value from a split() of the string (this is the comparison value) and the string itself, for each string in @old_list. The result is a new list; no changes are made to @old_list.

Next, we sort by the comparison value (first element of the array elements in @old_list). Note that @old_list is not actually modified in this whole process.

Then, we perform another map() on the sorted list to reduce it back to just strings by mapping only the second array element into the current variable. $_->[1] means "rewrite $_ to be the value stored in the second object in the list that $_ refers to."

Right about now your eyes are probably glazed from looking at the Schwartzian transform. It really does look frightening at first, but deep down inside it's just a little pussycat. If you are still unclear on it, see the Resources below.

The Guttman-Rosler transform is fascinating in its own right, but discussion of it will only make your eyes glaze further. Look at the Resources for a paper on the GRT, which explains it best. The paper is a very nice introduction to sorting in general. I recommend taking a look at that paper if you do not have at least a little bit of background knowledge on sorting algorithms. The theory behind sorting is extremely useful to a programmer, and understanding O(n) notation can be an invaluable tool not just for sorting, but also for profiling, debugging, and writing good code.


When to use FP

Once again, I will say this: know your tools. Functional programming is an excellent tool, as we have seen so far. It can simplify some pretty hairy problems and make others a little easier. So when should you use FP?

Exercises

  1. Write a code snippet that uses map() to transform a list of user names into user IDs.
  2. Write a program that uses (1) to look up user IDs. Allow filtering of the user list by partial names.
  3. Write code to check if any processes owned by root are running (on a UNIX system).
  4. Benchmark the Schwartzian transform versus your own sorting code versus the Guttman-Rosler transform on a small data set. Use the Benchmark module to do this.
  5. Do (4) on a large data set. For instance, sort all the file names on your system by size. Look at the File::Find module for ideas.
  6. Read about Erlang, Scheme, and Haskell in the comp.lang.functional FAQ. Look at other FP languages, and see if any of them have neat ideas that you can use in Perl.
  7. Write a Perl version of the grep program. Did you think of the grep() function right away? You shouldn't use grep() in this case because you may have to process a large file, and there's no sense in keeping the contents of the whole file in memory just to run grep() on them. Think of a better solution.
Resources

Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     

perlfunc - Perl builtin functions

Simpler code: Tips on using Perl's map function and command-line ...

Robert's Perl Tutorial


Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: June 02, 2008