|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
| News | Recommended Links | Reference | Sort in Perl |
| Regular expressions | History | Humor | Etc |
There are two related functions in Perl grep that resembles Unix grep and map. Both are essentially an shorthand for the foreach loop. Thier main value is not a new functionality, but the ability to make the code more compact. Both functions accept two arguments: the first is an expression a nd the second is the list or array. Both are shorthand for foreach look.
The main difference between them is that grep can just select certain elements from the array or list while map can transform them into a new area.
Suppose we have an array of filenames called @files, and we
want a second list called @jpgs containing only those filenames
which end in the extension ``.jpg''. Instead of using a loop, we could
employ grep as follows:
@jpgs = grep(/\.jpg$/,@files);
As another example, suppose we have a list of files and we want to eliminate all the directories from the list.
@non_a_direcotory = grep(! -d ,@list);
map The first argument is evaluated for each element of a list and the results are returned by map as a new list. Default variable $_ represents the elements of the list (one per iteration). Look how short is the code for printing an array using map, actually this is a classic Perl one liner:
print "Program arguments are:\n", map(" '$_'\n", @ARGV);
It also can simplify creating an array in which each element represents the
calue of the function applied to each element of the initial array.
Suppose we have a array called @files containing filenames,
and we wish to create a similar array called @sizes containing
the sizes of those files.
@sizes = map(-s,@files); # the same can be achieved by: foreach @files { $sizes($i++)= -s; }
The expression ``-s'', uses $_ as its default argument,
so it will return the size of each file in turn. Using map saves shortens
the code probably in half and save us from trouble of using explicit iteration
counter for the resulting array.
Map often can be used to create a hash in whihc keys are elements of the array and values is some result of avluation of each key. For example:
map($sizes{$_} = -s ,@files);
In this case, we created the hash %sizes as the expression
that we used as the first argument is evaluated once for each element of the
arrray.
You can use a function as the first argument to map, but it should operation on the default variable $_ as the only paramaer, since map does not pass any arguments to a function. If this function is used for other purposes you can simply pass $_ as one of the arguments
|
SAGE ;login - Effective Perl Programming
Welcome to Effective Perl Programming, the column. In this and coming articles I'm going to discuss ways that you can use Perl more effectively, whether by mastering Perl idioms, using Perl modules, or finding new applications for Perl programs. I begin by covering two very powerful but underused language features: Perl's map operator and its cousin, the grep operator. I'll cover the basics quickly, then show you some useful techniques and a few neat tricks as well.The Basics of map and grep
The map operator creates a transformed copy of a list by evaluating a specified expression or code block for each element in the list. Each time the expression or block is evaluated, $_ contains the value of the current element. The result from the expression or block is appended to the list returned by map. The syntax for map looks like this:
@result = map expression, list
@result = map { code } listRewritten without using map, the effect is like this:
@result = ();
foreach (list) { push @result, expression; }For example:
@times_ten = map $_ * 10, 1..10;
returns the list (10, 20, 30, 40, 50, 60, 70, 80, 90, 100), and
@uppercased = map { ucfirst $_ } qw(george jane judy elroy);
returns the list ('George', 'Jane', 'Judy', 'Elroy'). The transform expression (or block) is evaluated in a list context. It does not have to return a single element it can return two or more or none at all (an empty list). More on that later.
The grep operator resembles the map operator syntactically:
@result = grep expression, list
@result = grep { code } listHowever, unlike the map operator, which constructs a transformed copy of a list, the grep operator selects items from a list. The selection expression (or block) is evaluated for each element of its argument, with $_ set to the current element. If the result is true (anything other than the empty string or the string '0'), a copy of the element is appended to the result from grep. For example:
# Returns (2, 4, 6, 8, 10).
@even = grep { not $_ % 2 } 1..10;# Returns a list of text files in the current directory.
@text_files = grep -T, glob "*";# Classic grep -- imitating Unix grep. Prints lines containing the
# word 'Joseph'.
print grep /\bJoseph\b/, <>;The grep operator has been around for a long time, but the map operator is new in Perl 5 (as much as anything that is four years old can be called "new," anyway). The map operator is more versatile and can do anything that grep can:
# Another way to get a list of text files.
@text_files = map { (-T) ? $_ : () } glob "*";map and grep Idioms
The map operator is obviously useful for simple one-to-one transformations:
# Print out the contents of a hash.
print map "$_: $hash{$_}\n", sort keys %hash;
Be careful, though: this approach creates a lot of temporary structures in memory. For a very large hash it would be more appropriate to use an each loop:
while (($key, $val) = each %hash) { print "$key: $val\n" }
Using map to construct hashes is an important idiom. You can construct existence hashes that are used to test whether a particular value has been seen; in this case, set all the values in the hash to 1 (or some other "true" value). You can also use map to construct hashes where the value is computed from the key. To use map to construct a hash, return two values for each original element the key and its corresponding value.
# Create keys for all the "words" in $text, so that we can test for
# a word later with if $seen{$word}.
%words_seen = map { $_ => 1 } split /\s+/, $text;# After this, $file_size{$file} gives -s $file -- saves time if
# we need to use it more than once.
%file_size = map { $_ => -s } @files;The map operator is handy for "nesting" and "slicing" multidimensional data structures. Using an anonymous array (or hash) constructor inside map creates nested structures. For example, you can blend parallel arrays into a single 2-d structure:
# Blend @x, @y, and @z into a single 2-d array @xyz ... $xyz[0][0]
# is $x[0], $xyz[0][1] is $y[0], and so on.
@xyz = map [$x[$_], $y[$_], $z[$_]], 0..$#x;You can use the same technique to create a hash of arrays:
# Cache the results from stat into a hash of arrays ... then
# $info{'file'}[7] gives the size of 'file', $info{'file'}[5]
# gives the owner's uid, and so on.
%info = map { $_, [ stat $_ ] } @files;Extracting a slice of a nested structure is just as easy. Just use a subscript inside map:
# This will extract @x from @xyz (undoing what we did above) ...
# $x[0] is $xyz[0][0], $x[1] is $xyz[1][0], and so on.
@x = map $_->[0], @xyz;The grep operator isn't as versatile as map, but it is usually the most succinct way to select items from a list. Don't forget that it can be used on complex structures:
# Select elements from @xyz whose "coordinates" are all >0.
# @gt_zero is still a 2-d array with the same organization as @xyz.
@gt_zero = grep {$_->[0] > 0 and $_->[1] > 0 and $_->[2] > 0} @xyz;Cool Tricks with map
You can use map to read several lines of input at a time:
# Read 10 lines from STDIN.
@ten_lines = map scalar(<STDIN>), 1..10;The "Schwartzian Transform" (named after fellow Perl trainer and author Randal L. Schwartz) is a sort surrounded by maps. It is generally preferred over other techniques when the sorting process requires time-intensive key transformations:
# Sort files in descending order of size.
@files_by_size = map { $_->[0] } # 3. slice out the original list, now sorted sort { $b->[1] <=> $a->[1] } # 2. sort the list of tuples map { [$_, -s $_] } # 1. create a list of tuples by nesting @files; # the data to be sorted You can use map for some set operations. Here is an example of using it to find the elements in one hash (%hash1) that are not in another hash (%hash2). Depending on the relative sizes of the hashes involved, this can be more efficient than other methods (like using the delete operator):
# keys %result contains 2 4 6 7 8 9 when this is done.
%hash1 = map { $_, 1 } 1..9; # some sample data
%hash2 = map { $_, 1 } 1, 3, 5; # more sample data
%result = map { $_, $hash1{$_} } grep { not exists $hash2{$_} } keys %hash1;# Another way to do the same thing, with delete.
%result = %hash1;
delete @result{keys %hash2};Because map's transform expression is evaluated in a list context, using map in combination with a pattern match that contains some parentheses can produce unusually succinct code:
# Create a hash of user name vs. user id from lines in /etc/passwd.
open PASSWD, "/etc/passwd" or
die "couldn't open password file: $!\n";
%name_to_id = map /(.*?):.*?:(.*?):/, <PASSWD>;The map operator can even be useful for some string operations:
# Convert a string like 'ABC' into its
# hex equivalent, '\x41\x42\x43'.$hexed = join '', map { sprintf "\\x%x", ord $_ } split //, $str;
# An alternative using s///, which is slightly slower
# for long strings.($hexed = $str) =~ s/(.)/sprintf "\\x%x", ord $1/ge;
That should be enough for now. I hope you've enjoyed this little tour of map and grep. My next column will be something of a change of pace I will introduce object-oriented programming in Perl.
map function to make your code more concise:One of Perl's less commonly used functions (at least, by novices) is the
map function. Learning to use this function effectively can
make you a more efficient programmer by eliminating the need to code many of the
mindless iterative loops that occur in the course of any programming project.
How many times do you have an array of values, and you want to do an operation on each of them -- perhaps to print them all out, each indented and on a separate line? The brute force method that most novice Perl weenies use might look something like this:
print "ARGV[] is:\n";
for ($i=0; $i<$#ARGV; $i++) {
print " '$ARGV[$i]'\n";
}
print "--------\n";
This code snippet, saved to file t.1 and run from the command
line, results in output like the following:
bash) perl ./t.1 foo bar baz blah ARGV[] is: 'foo' 'bar' 'baz' 'blah' --------
Sure, it works, but that's an awful lot of code to do such a simple operation.
Plus, if you're running with use strict; (as you should be!),
you've got to declare the variable $i. Let's try the same using the
map function. Quoth the camel:
map BLOCK LIST map EXPR, LISTThis function evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation.
In plainer terms, map takes an array, returns a new array
in which each of the elements has changed to whatever is specified in the BLOCK
or EXPR.
Here's that same example, written using map:
print "ARGV[] is:\n",
map(" '$_'\n", @ARGV),
"--------\n";
Save this to a file t.2, run it, and you'll see the same
results as from the previous example. Notice, there's no need to declare a loop
variable, set up a loop, etc. With this trivial operation, the difference is minimal,
but as you start doing more complicated operations on the array elements, you'll
really see a difference in the compactness of the code you need to write. (Some
may find this a code a little dense, but trust me, the more you work with it, the
more you'll see it as a clearer way to do it.)
For those of you who want more, notice all of the explicit newlines ("\n")
in the above. Let's use the join function to eliminate the need
for specifying all of these:
print join "\n",
"ARGV[] is:",
map(" '$_'", (@ARGV)),
"--------",
""; # This one causes a trailing newline
And for those of you writing CGI applications, who are familiar with Lincoln
Stein's excellent CGI.pm module, consider the following:
use CGI qw/:html/;
print div({ALIGN => 'CENTER'},
table({BORDER => 1}, join "\n",
caption("ARGV[] List:"),
map( { TR(td($_))} @ARGV)
) # End-table
), # End-div
""; # This one causes a trailing newline
Way cool.
Perl's map function can come in handy when you need to simplify potentially
repetitive operations, such as capitalizing strings of text. In this article, we'll
offer several examples of how you can put map to work. Then, we'll turn our
attention to Perl's parsing capabilities with a look at various ways you can parse
your program's command line to extract switches or other information.
The power of map
Perl offers many functions that help to simplify and shorten code.
Among the more powerful is map, which takes a list, evaluates a specified
block or expression on each element, and then returns a list of all the results.
Inside the block, map locally assigns $_ as an alias to the current list
item.
One of the simplest uses of map is to capitalize an entire array by applying
the uc function to each element:
@caps = map uc, @phrases;
In the next sample, mapping a regular expression to the array returns the first
word of every phrase:
@first_word = map { /(\S+)/ } @phrases;
Each element need not necessarily map to a single item. If multiple values are created,
map returns them all as a single, flattened list. For example, you could
split all words in all phrases into a single list:
@words = map split, @phrases;
Still another use for map might be to convert a string to title case. You
can do this by splitting a string into individual words, converting each to lowercase
and then initial capitalization, and finally joining the words back into a single
string:
$title = join ' ', map { ucfirst lc } split
/ /, $name;
Our final example uses map to put the sorted key/value pairs of a hash into
a two-column HTML table:
print "<table>\n";
print map {"<tr><td>$_</td><td>$hash{$_}</td></tr>\n"} sort keys
%hash;
print "</table>\n";
Command-line parsing
When you need to determine the command-line switches passed into a Perl program,
you can take various approaches. An easy way of identifying expected Boolean switches
is to loop through @ARGV, setting a flag for each option that is encountered:
foreach $arg (@ARGV) {
$a = 1, next if $arg eq '-a';
$b = 1, next if $arg eq '-b';
$c = 1, next if $arg eq '-c';
}
Another simple option is to use Perl's -s switch. In this case, Perl will create
variables named the same as each switch and then remove them from @ARGV. For example:
perl -s prog.pm -a -b -c
When prog.pm is executed, the variables $a, $b, and $c are all defined and set to
1. Only the switches listed before any nonswitch argument or "--" will be handled.
Therefore, the following may not work as desired:
perl -s prog.pm -a -b 13 -c 6/6/2001
Here, $a and $b are set to 1 and @ARGV contains ('13', '-c', '6/6/2001'). To have
$b set to 13 and $c set to 6/6/2001, the command line could be entered as:
perl -s prog.pm -a -b=13 -c=6/6/2001
A more robust alternative to using -s is to use either Getopt::Std or Getopt::Long.
These modules will parse the command line and set global variables (named the same
as the switch but prefixed with opt_) for each option. In the following sample,
-a is a Boolean option, -b requires that an integer be specified, -c requires a
string, and -d accepts an optional string:
use Getopt::Long;
GetOptions("a!", "b=i", "c=s", "d:s");
The variables set would be $opt_a, $opt_b, $opt_c, and $opt_d, respectively.
Dan Richter daniel.richter at wimba.com
Fri Nov 28 13:01:50 EST 2003
- Previous message: [Courses] [Perl] Nothing this week
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
LinuxChix Perl Course Part 17: grep and map 1) Introduction 2) grep - filters a list 3) map - transforms the values of a list 4) What "grep" and "map" have in common 5) Exercises 6) Answer to Previous Exercise 7) Past Information 8) Credits 9) Licensing ----------------------------------- 1) Introduction Before we finish looking at arrays in Perl, I thought we should take a quick look at two handy Perl functions: "grep" and "map". Both functions are technically operators because they allow you to do magical things that a function can't do, but syntactically they look like functions, so we refer to them as functions here. Let me add that this will be the last e-mail before January. It's that busy-busy-busy time of year and I'm afraid that I have no more time to write about Perl than you have to read about it. ----------------------------------- 2) grep - filters a list The "grep" function returns only the elements of a list that meet a certain condition: @positive_numbers = grep($_ > 0, @numbers); As you can see, each element is refered to as "$_". This (plus the fact that parentheses are optional) allows you write commands that look similar to invocations of the Unix "grep" program: @non_blank_lines = grep /\S/, @lines; In addition, you can specify a code block rather than a single condition: @non_blank_lines = grep { /\S/ } @lines; # Equivalent to the above. Obviously it doesn't matter in this case, but code blocks are helpful when you want a complex filter with multiple lines of code. The result of the code block is the result of the last statement executed: # All positive numbers can be used as exponents, # but negative exponents must be integers. @can_be_used_as_exponent = grep { if ( $_ < 0 ) { ! /\./; # No decimal point -> integer. } else { 1; # Always true. } } @array; ----------------------------------- 3) map - transforms the values of a list The "map" function applies a transformation to each element of a list and returns the result, leaving the original list unchanged (unless you mess it up; more on that in a moment). @lines_with_newlines = map( $_ . "\n", @lines_without_newlines); As with "grep", each value in the list is refered to as "$_". "map" can also take a block of code: # Replace "x at y.z" with "x at y dot z" to confuse spammers. @disguised_addresses = map { my $email = $_; $email =~ s/\@/ at /; $email =~ s/\./ dot /g; $email; } @email_addresses; Note that it's important not to change "$_" because that would change the original "@email_addresses" (and you wouldn't get what you wanted in "@disguised_addresses"). "map" needs not be a one-to-one mapping. For example, in the following code: @words = map m/\b(\w+)\b/g, @lines; # Spaces are for clarity. the regular expression splits a string into a list of words. The "map" function returns the result of joining all the small lists. If a line contains no words, the regular expression will return an empty list, and that's okay. ----------------------------------- 4) What "grep" and "map" have in common "grep" and "map" have a lot in common. They both "magically" take a piece of code (either an expression or a code block) as a parameter. You need to put a comma after an expression but shouldn't put a comma after a code block. Changing "$_" in "grep" or "map" will change the original list. This isn't generally a good idea because it makes the code hard to read. Remember that "map" builds a list of results by evaluating an expression, NOT by setting "$_". A side effect of this fact is that you should not use "s///" with "map". The "s///" operator changes "$_" rather than returning a result, so you won't get what you would expect if you use it with "map" (and you CERTAINLY shouldn't use it with "grep"). ----------------------------------- 5) Exercises a) Write some Perl code that, given a list of numbers, generates a list of square roots of those numbers. (The square root function in Perl is "sqrt".) b) Modify the code to filter out any negative numbers. The result should be as though the negative numbers were never in the original list. c) Write a Perl program that reads two files and outputs only the lines that are common to both of them. ----------------------------------- 6) Answer to Previous Exercise The following program reads the password file and outputs a list of usernames and UIDs, ordered by username: #!/usr/bin/perl -w use strict; open FILE, '< /etc/passwd' or die "Couldn't open file: $!"; my @data = sort(<FILE>); close FILE; my @result; foreach (@data) { my @fields = split(/:/); # Equivalent to split(/:/, $_) push @result, $fields[0] . ' -> ' . $fields[2]; } print join("\n", at result) . "\n"; The above program is a nice review of Perl functions. But of course, There Is More Than One Way To Do It, and we could replace the bottom half with: foreach (@data) { s/^(.*?):.*?:(\d*):.*$/$1 -> $2/; } print join("\n", at result) . "\n"; Or to make the program really short: $_ = join '', @data; s/^(.*?):.*?:(\d*):.*$/$1 -> $2/gm; print; # Prints "$_" ----------------------------------- 7) Past Information Part 16: Array Functions http://linuxchix.org/pipermail/courses/2003-November/001359.html Part 15: More About Lists http://linuxchix.org/pipermail/courses/2003-November/001351.html Part 14: Arrays http://linuxchix.org/pipermail/courses/2003-October/001350.html Part 13: Perl Style http://linuxchix.org/pipermail/courses/2003-October/001349.html Part 12: Side Effects with Perl Variables http://linuxchix.org/pipermail/courses/2003-October/001347.html Part 11: Perl Variables http://linuxchix.org/pipermail/courses/2003-October/001345.html Parts 1-10: see the end of: http://linuxchix.org/pipermail/courses/2003-October/001345.html ----------------------------------- 8) Credits Works cited: a) man perlfunc b) Kirrily Robert, Paul Fenwick and Jacinta Richardson's "Intermediate Perl", which you can find (along with their "Introduction to Perl") at: http://www.perltraining.com.au/notes.html Thanks to Jacinta Richardson for fact checking. ----------------------------------- 9) Licensing This course (i.e., all parts of it) is copyright 2003 by Alice Wood and Dan Richter, and is released under the same license as Perl itself (Artistic License or GPL, your choice). This is the license of choice to make it easy for other people to integrate your Perl code/documentation into their own projects. It is not generally used in projects unrelated to Perl.
The road to better programming Chapter 4
The map() functionThe
map()function is like a rubber stamp applied to all the elements of a list (see "perldoc -f map" for more information). It consists of two parts: a block or an expression, and a list:
Listing 1. How map() works
# map {BLOCK GOES HERE} LIST; map {$_++} @p; # increment every element of @p # map EXPRESSION, LIST; map -f,@p; # file test every element of @p
The
foreach()loop will usually do better thanmap()in benchmarks, so don't usemap()for CPU-intensive calculations unless you test its performance. Do use it when it makes your code more elegant and simple without a loss of efficiency.There are many neat things that
map()can do. First of all, it can modify the array elements as it goes through them by changing the$_variable. Inside the block or (less commonly) the expression,$_is the current element of the list. You don't know which element you are looking at -- the whole point is to map one single function to all the elements independently. It is my experience that in about 80% of loops over an array you don't need to know the offset of the current element inside the array. Thus,map()can improve coding efficiency and style by forcing the programmer to think independently of array offsets.
Listing 2. foreach() vs. map()
# "normal" (procedural) way foreach (sort keys %ENV) { print "$_ => $ENV{$_}\n"; } # FP way map { print "$_ => $ENV{$_}\n" } sort keys %ENV;
In Listing 2, you can see how the FP way is not fundamentally different, yet manages to convey a flow of functions from right to left. First the list of keys is obtained, then it is sorted, then theprint()function is applied to each element of the sorted key list.
Listing 3. Modifying a list on the fly with map()
# these are the users @users = ('joe', 'ted', 'larry'); # and this is an on-the-fly substitution of user names with hash references map { $_ = { $_ => length } } @users; # @users is now ( { 'joe' => 3 }, # { 'ted' => 3 }, # { 'larry' => 5 } )
Listing 3 demonstrates how the list passed to
map()can be completely rewritten. In this case, the array @users contained only user names, but aftermap()was applied, the array contained hash references with one key-value pair, username => byte length of user name. How about quickly filling in file information?
Listing 4. Modifying a list on the fly with map(), part 2
use File::stat; use Data::Dumper; @files = ('/etc/passwd', '/etc/group', '/etc/fstab', '/etc/vfstab'); print Dumper map { $sb = stat $_; $_ = (defined $sb) ? { name => $_, size =>$sb->size(), mtime => $sb->mtime() } : undef } @files;
In Listing 4 we create a list of files, and then in one statement create a list of hashes with entries for the name, size, and mtime (modification time) of each file. Furthermore, non-existent files get an "undef" reference instead of a hash reference. Finally, Data::Dumper is used to print out a nice view of the entire product list.
Needless to say, code like Listing 4 should be heavily documented for other people's sake. Don't try to cram it all into one line, either. Elegant code is just an ugly duckling without proper formatting and comments.
Back to top
For anyone who has used UNIX, the
grep()function is simple to learn and use. It acts just like the grep utility -- elements that satisfy a test pass through, while everything else gets dropped.The syntax of
grep()is just likemap(). A block or an expression can be passed, and$_is aliased to the current element under examination. It is not a good idea to modify elements of the list passed togrep(). That's whatmap()is for. Usegrep()to grep, andmap()to map. The only exception to this rule is if you must create temporary hash fields or array entries while sorting, but make sure you remove them afterwards.
Listing 5. How grep() works
# grep {BLOCK GOES HERE} LIST; grep {$_ > 1} @p; # only accept numbers more than 1 grep {$_++} @p; # please don't do this # grep EXPRESSION, LIST; grep /hi/,@p; # only accept matching elements grep !/hi/,@p; # do not accept matching elements
It can be very convenient to usegrep()for quick filtering, but remember that aforeach()loop may be faster under some circumstances. When in doubt, benchmark.
Listing 6. Using grep() to filter out odd numbers
my @list = (1, 2, 3, 'hi'); my @results; # the procedural approach foreach (@list) { push @results, $_ unless $_%2; } # using grep - FP approach @results = grep { !($_ % 2) } @list;
Here is another example. Say we need to look in a directory and retrieve all the file names from it:
Listing 7. Using grep() to get all the filenames in a directory
opendir(DIR, ".") || die "can't opendir: $!"; # get the directory handle my @f = grep { /^[^\.]/ && -f } readdir(DIR); # filter only files into @f
Line 1 of Listing 7 just opens the current directory, or exits the program with the appropriate notice.Line 2 invokes the
readdir()function, which returns a list of filenames, and runs agrep()that filters out hidden files (filenames must not begin with a "." character) and non-file objects such as directories.In two lines we do as much work as four or five lines of a
foreach()loop might do. Don't forget to comment such tight code; the short comments shown are not sufficient for production code. Sometimes,grep()is used in scalar context (for instance, to test if Perl interpreters are running [the Proc::ProcessTable module is from CPAN]):
Listing 8. Using grep() to get all the Perl processes running
use Proc::ProcessTable; # get this module from CPAN use strict; my $table = new Proc::ProcessTable; my @procs; if (@procs = grep { defined $_->cmndline && $_->cmndline =~ /^perl/ } @{$table->table}) { print $_->pid, "\n" foreach @procs; } else { print "No Perl interpreters seem to be running.\n"; }
Here, we simultaneously assign the return from
grep()to the @procs array, and we test to see if it contained any elements at all. If there were no elements matching the pattern, we print a message to that effect. The same code with aforeach()loop would probably take five or six lines. In case it hasn't been said enough, code like Listing 8 should be commented well enough that someone else could look at it and immediately know the intent and effect of that code. It's no use writing production code if you are the only one who will ever be able to read it.
Sorting with map: the Schwartzian and Guttman-Rosler transforms
The
sort()function in Perl is "sort of" procedural, in that it takes a code block or a function name and uses it to sort all elements. The comparison function has to be written as if it were only looking at two elements -- it doesn't know which ones specifically out of the whole list. Likemap()andgrep(),sort()deals with references to the values being compared, so modifying them would modify the values being compared. Don't do this (for more information on thesort()function, see "perldoc -f sort").Perl's sorting abilities are remarkably simple to use. In its simplest form, a sort can be done like this:
Listing 9. The default sort()
@new_list = sort @old_list; # sort @old_list alphabetically
The default sort uses simple string comparisons on all the scalars in the list. This is fine if the list contains dictionary words to be sorted alphabetically, but not so great if the list contains numbers. Why? Because "1" comes before "010" in a string comparison, for example. Numbers have to be compared by value, not as strings.
Fortunately, this is easy to do and is in fact a common Perl idiom:
Listing 10. The numeric sort()
@old_list = (1, 2, 5, 200, '010'); @new_list = sort { $a <=> $b } @old_list; # sort @old_list numerically
I quoted 010, because in Perl numbers that begin with 0 are interpreted as octal, so 010 octal would have been 8 decimal. Try it without the quotes and see for yourself. This also demonstrates how Perl scalars are automatically converted to numbers when necessary.
If you run the default sort from Listing 9 on the @old_list in Listing 10, you will see that 200 is before 5, for example. That's what we are trying to avoid.
To reverse the sorted list, you can either apply the
reverse()function to the list after it's sorted, or you can change the sorting function. For example, the reverse sort of the one in Listing 10 would have the comparison code block be{ $b <=> $a }.See "perldoc perlop" for more information on the
<=>and cmp operators, which are essential to all sorting in Perl. The cmp operators are what's used in the default search in Listing 9 behind the scenes.Well, fine, so we can sort scalars. That's not enough -- most sorting is done on data structures such as arrays and hashes. Perl supports almost any kind of sorting because of its flexible syntax. For instance, say we need to sort a bunch of hash references, where the 'name' key in the hash is the sorting field. We want a regular alphabetical sorting order, so the cmp operator should be used:
Listing 11. The sort by a hash member
# create a list with two hash references @old_list = ( { name => "joe" }, { name => "moe" } ); # sort @old_list by the value of the 'name' key @new_list = sort { $a->{name} cmp $b->{name} } @old_list;
Now we get into the interesting stuff. What if it's expensive to obtain data from the objects being sorted? Say we need to apply the
split()function to a string every time we need to obtain the value to sort by. It would be computationally expensive to run asplit()every time the comparison value is needed, and your co-workers would laugh at you. You could build a temporary list of the comparison values, but that's not so easy to do and can easily introduce bugs. You are probably better off using the Schwartzian transform, named after Randal Schwartz.The Schwartzian transform looks like this:
Listing 12. The Schwartzian transform
# create a list with some strings @old_list = ( '5 eagles', '10 hawks', '2 bulls', '8 cows'); # sort @old_list by the first word in each string, numerically @new_list = map($_->[1], sort( { $a->[0] <=> $b->[0] } map( [ (split)[0], $_ ], @old_list)));
Look at it from right to left. First, we apply a
map()to the @old_list list to create a new temporary list. Remember thatmap()can transform a list. In this case, we rewrite @old_list to contain an array consisting of the first value from asplit()of the string (this is the comparison value) and the string itself, for each string in @old_list. The result is a new list; no changes are made to @old_list.Next, we sort by the comparison value (first element of the array elements in @old_list). Note that @old_list is not actually modified in this whole process.
Then, we perform another
map()on the sorted list to reduce it back to just strings by mapping only the second array element into the current variable. $_->[1] means "rewrite$_to be the value stored in the second object in the list that$_refers to."Right about now your eyes are probably glazed from looking at the Schwartzian transform. It really does look frightening at first, but deep down inside it's just a little pussycat. If you are still unclear on it, see the Resources below.
The Guttman-Rosler transform is fascinating in its own right, but discussion of it will only make your eyes glaze further. Look at the Resources for a paper on the GRT, which explains it best. The paper is a very nice introduction to sorting in general. I recommend taking a look at that paper if you do not have at least a little bit of background knowledge on sorting algorithms. The theory behind sorting is extremely useful to a programmer, and understanding O(n) notation can be an invaluable tool not just for sorting, but also for profiling, debugging, and writing good code.
When to use FPOnce again, I will say this: know your tools. Functional programming is an excellent tool, as we have seen so far. It can simplify some pretty hairy problems and make others a little easier. So when should you use FP?
- First of all, remember that in Perl, FP is only an approach. The actual solution will be procedural, even though it simulates a functional solution. The question is not when FP should be used, but how much it should be used, from "not at all" to "as much as possible."
- Any time you need to do complex sorting, see if the Schwartzian transform or the Guttman-Rosler transform are appropriate. They are drop-in functional replacements for regular sorting.
- If your functions are chained often, consider FP. For example, modification of a list in steps by several functions can probably be accomplished with an FP approach.
- If you have a lot of temporary variables that are thrown away as soon as they are used, consider FP to decrease their number.
- Filtering, sorting, and general transforming functions applied to lists or hashes are candidates for FP.
- If your functions have a lot of side effects, and their parameters are more than a few, FP is probably not going to work too well.
- Recursive algorithms can go either way with regard to FP. They are not clearly better or worse when done with the FP approach.
- Avoid FP if performance is very important. Use the Benchmark module to check your approach -- sometimes FP will speed things up considerably (for example, the Schwartzian transform is significantly faster because of its cache of comparison values), but sometimes it will cause the performance of the code to drop significantly.
- One-liners work well with FP.
- Obfuscated Perl code has always favored
grep()andmap()as ways to obscure the actions of code. Unless you are writing obfuscated Perl code for a contest, don't usegrep()andmap()without at least some commenting.- Learn, practice, and use FP in your daily programming work. You will gain insight into all of your other code, see new ways ahead, and make life easier. Don't use FP just because it's there, but do use it because it works well for your specific problem.
Resources
- Write a code snippet that uses
map()to transform a list of user names into user IDs.- Write a program that uses (1) to look up user IDs. Allow filtering of the user list by partial names.
- Write code to check if any processes owned by root are running (on a UNIX system).
- Benchmark the Schwartzian transform versus your own sorting code versus the Guttman-Rosler transform on a small data set. Use the Benchmark module to do this.
- Do (4) on a large data set. For instance, sort all the file names on your system by size. Look at the File::Find module for ideas.
- Read about Erlang, Scheme, and Haskell in the comp.lang.functional FAQ. Look at other FP languages, and see if any of them have neat ideas that you can use in Perl.
- Write a Perl version of the grep program. Did you think of the
grep()function right away? You shouldn't usegrep()in this case because you may have to process a large file, and there's no sense in keeping the contents of the whole file in memory just to rungrep()on them. Think of a better solution.
- Read the previous chapters of The road to better programming.
- The comp.lang.functional FAQ offers a wide array of information on all aspects of functional programming languages, including general topics, such as history and motivation, journals and conferences, and schools and workshops; technical topics, such as performance and applications; and other resources, such as Web pages, research groups, and newsgroups. This is the best starting point for anyone interested in learning more about functional programming.
- Go to the Schwartzian transform for "Far More Than Everything You've Ever Wanted to Know About Sorting."
- Read A Fresh Look at Efficient Perl Sorting by Uri Guttman and Larry Rosler.
- Read Teodor's other Perl articles in the developerWorks "Cultured Perl" series:
- A programmer's Linux-oriented setup
- Application configuration with Perl
- Automating UNIX system administration with Perl
- Debugging Perl with ease
- The elegance of JAPH
- Genetic algorithms applied with Perl
- One-liners 101
- Parsing with Perl modules
- Perl 5.6 for C and Java programmers
- Reading and writing Excel files with Perl
- Review of Programming Perl, Third Edition
- Small observations about the big picture
- Writing Perl programs that speak English
- Browse more Linux resources on developerWorks.
- Browse more Open source resources on developerWorks.
perlfunc - Perl builtin functions
Simpler code: Tips on using Perl's map function and command-line ...
Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: June 02, 2008