|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
Softpanorama Search
|
version 0.2 (raw draft)
Scripting languages usually do not contain the concept of pointers (a.k.a references). But Perl is an exclusion from the rule and surprisingly it proved that this concept is pretty useful even for the class of very high languages that Perl belongs to.
Initially Perl has very limited implementation of concept of references. Perl 4 permits only symbolic references, which are difficult to use. For example, in Perl 4, you have to use names to index to an associative array called _main{} of symbol names for a package. Perl 5 now lets you have hard references to data.
One problem that came up all the time in Perl 4 was how to represent a hash whose values were lists. Perl 4 had hashes, of course, but the values had to be scalars; they couldn't be lists.
The second problem is multidimensional arrays. Arrays in Perl are one-dimensional and the easiest way to emulate multidimensional array is to use references.
That why they were introduced in Perl 5 to provide the capability to create complex data structures like multidimensional arrays and nested hashes.
Another very useful class of reverences is references to subroutines. They proved to be useful for extension of Perl with OO programming constructs.
There are several ebooks that contain chapters about references.
Like in C, a reference is simply a pointer to something, such as a Perl variable, array, hash (also known as an associative array), or even a subroutine. We will use the terms pointer and reference interchangeably.
In other words a reference is simply an address of a variable. References are useful in creating complex data structures in Perl. In fact, you cannot really define any complicated structures in Perl without using references.
Like links in Unix filesystems there are two types of references in Perl 5 are hard and symbolic. A symbolic reference is a string which contains the name of a variable. Symbolic references are useful for creating new variable names and addressing them at runtime. Hard references are similar to hard links in the Unix file system: another path to the same underlying item.
Hard references keep track of reference counts. When the reference count becomes zero, Perl automatically frees the storage of the variable that the item referred to. If that item happens to be a Perl object, the object is destructed -- freed to the memory pool.
Hard references are easy to use in Perl as long as you use them as scalars. To use hard references as anything but scalars, you have to explicitly de-reference the variable and tell it how you want it to behave.
A reference is a scalar value that refers to a variable or entire array or an entire hash (or to just about anything else.) To create a reference you put a \ in front of a variable on the left side of the assignment statement. It means "take address instead of value". Symbol @ would be probably better but it already used for arrays.
$scalar_var = 2009;
$pointer = \$scalar_var;
printf "\n Pointer *($pointer) points to $$pointer\n";
In the preceding code, the variable $pointer contains the address of a variable $scalar_var, not the value itself. To get the value, you have to de-reference $pointer with two $$.
Pay attention to how the address is shown in the printed pointer variable. If the word SCALAR is followed by a long hexadecimal number, then this is a reference to a scalar variable.
Once the reference is stored in a variable like $pointer , you can copy it or store it like aby other valuable with a scalar value:
$ref = $pointer; # $xy now holds a reference to variable
$p[3] = $ref; # $p[3] now holds a reference to $scalar_var
$new_ref = $p[3]; # $new_ref now holds a reference to $scalar_var
@array=(1,2,3);
$aref = \@array; # $aref now holds a reference to @array $href = \%hash; # $href now holds a reference to %hash
printf "\n Pointer *($aref) points to $$aref\n";There is another way to create a references to an array. [ list ] makes a new, anonymous array, and returns a reference to that array. It is called anonymous reference as the array does not have a regular name.
For example:
$aref = [ 1, 2, 3 ]; @array = (1, 2, 3); $aref = \@array;
The first line is an abbreviation for the following two lines, except that it doesn't create the superfluous array variable @array.
Similarly { list } makes a new, anonymous hash. and returns a reference to that hash.
For example:
$href = { APR => 4, AUG => 8 }; # $href now holds a reference to a hash
Good inroduction to references to hashed can be found at perl.com Managing Rich Data Structures. Here is one example:
I thought about finding some way to store those hashes as an array of anonymous hashes (one hash per ad), but then I realized that an array wouldn't let me access a particular ad's data easily. The hashes would be in the order in which I saved them into the array, but that wouldn't translate easily to the ad for a particular date. For example, how would I know where to find the data for next Monday's newsletter? Is it in
$array[8]or$array[17]?Hmm. Each anonymous hash could be identified by a particular date--the key (!) to locating the ad for any particular date. What kind of data structure associates a unique key with a value? A hash, of course! My data would fit nicely into a hash of hashes.
The name I chose for the hash was
%data_for_ad_on. Choosing a hash name that ends in a preposition provides a more natural-reading and meaningful name; the key for data for the December 8, 2005 banner ad would be2005_12_08, for example, and the way to access the value associated with that key would be$data_for_ad_on{2005_12_08}.In code, this is how the data for two days of newsletters could be represented as a hash of hashes:
%data_for_ad_on = ( '2005_12_08' => { 'url' => 'http://roadrunners-r-us.com/index.html', 'gif' => 'http://myserver.com/banners/roadrunners_banner.gif', 'headline' => 'Use Roadrunners R Us for speedy, reliable deliveries!', }, '2005_12_09' => { 'url' => 'http://acme.com/index.html', 'gif' => 'http://myserver.com/banners/acme_banner.gif', 'headline' => 'Look to Acme for quality, inexpensive widgets!', }, );The keys of the named hash are 2005_12_08 and 2005_12_09. Each key's value is a reference to an anonymous hash that contains its own keys and values. When a hash is created using braces instead of parentheses, its value is a reference to that unnamed, "anonymous" hash. I need to use a reference because a hash is permitted to contain only scalar keys and scalar values; another hash can't be stored as a value. A reference to that hash works, because it acts like a scalar.
With any complex language construct you can get 80% of usage in 20% of space and the other 20% of usage in 80% of space. that's why Perl man page often look so useless. they don't distinguish what is important what is not what is used frequently what is not. Here are some tips that make reference usage more transparent
If you ever see a string that looks like this, you'll know you printed out a reference by mistake.
A side effect of this representation is that you still get a correct result if you use eq to see if two references refer to the same thing. But don't do this, always use == instead; it's much faster.
We already know that reference is a scalar value, and we've seen that you can store it as a scalar and get it back again just like any scalar. There are just two more ways to use it:
If $aref contains a reference to an array, then you can use {$aref} anywhere you would normally put the name of an array. For example, @{$aref} instead of @array. In most cases curvy parentethisi can be dropped, so you can use @$array.Let's assume that $aref=\@a. Then the following are equivalent ways to address the values of array @a:
@a @{$aref} An array
reverse @a reverse @{$aref} Reverse the array
$a[3] ${$aref}[3] An element of the array
$a[3] = 17; ${$aref}[3] = 17 Assigning an element
On each line are two expressions that do the same thing. The left-hand versions operate on the array @a, and the right-hand versions operate on the array that is referred to by $aref, but once they find the array they're operating on, they do the same things to the arrays.
The same is true for references to a hash (again let's assume that $href=\%h), for example:
%h %{$href} A hash
keys %h keys %{$href} Get the keys from the hash
$h{'red'} ${$href}{'red'} An element of the hash
$h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element
Most often, when you have an array or a hash, you want to get or set a single element from it. ${$aref}[3] and ${$href}{'red'} have too much punctuation, and Perl lets you abbreviate.
Tips:
First, remember that [1, 2, 3] makes an anonymous array containing (1, 2, 3), and gives you a reference to that array.
Now think about
@a = ( [1, 2, 3],
[4, 5, 6],
[7, 8, 9]
);
@a is an array with three elements, and each one is a reference to another array.
$a[1] is one of these references. It refers to an array, the array containing (4, 5, 6), and because it is a reference to an array, Use Rule 2 says that we can write $a[1]->[2] to get the third element from that array. $a[1]->[2] is the 6. Similarly, $a[0]->[1] is the 2. What we have here is like a two-dimensional array; you can write $a[ROW]->[COLUMN] to get or set the element in any row and any column of the array.
The notation still looks a little cumbersome, so there's one more abbreviation:
Tip: Between two subscripts, the arrow is optional. For example Instead of $a[1]->[2], we can write $a[1][2]That means that we can use standard notation for multidimensional arrays in Perl.
Another example:
$line = ['solid', 'black', ['1','2','3'] , ['4', '5', '6']];
As you can see the array does not need to be homogeneous, some elements can be scalar variable some references to arrays, etc.
Again, to check what particular element of array contains you can use function ref.
In the same way you reference individual items such as arrays and scalar variables, you can also point to subroutines.
To construct such a reference, you use the following type of statement:
$pointer_to_sub = sub { ... declaration of sub ... } ;
Notice the use of the semicolon at the end of the sub declaration. To call a subroutine by reference, you must use the following type of reference:
&$pointer_to_sub( parameters );
This is pretty logical: you are de-referencing the $pointer_to_sub and using it with the ampersand (&) as a pointer to a function.
The code within a sub is simply a declaration created through a previous statement. The code within the sub is not executed immediately, however. It is compiled and set for each use.
Sometimes, you have to write the same output to different output files. For example, an application programmer might want the output to go to the screen in one instance, the printer in another, and a file in another-or even all three at the same time. Rather than make separate statements for each handle, it would be nice to write something like the following:
spitOut(\*STDIN); spitOut(\*LPHANDLE); spitOut(\*LOGHANDLE);
Notice that the file handle reference is sent with the \*FILEHANDLE syntax because you refer to the symbol table in the current package. In the subroutine that handles the output to the file handle, you would have code that looks something like the following:
sub spitOut {
my $fh = shift;
print $fh "Gee Wilbur, I like this lettuce\n";
In UNIX (and other operating systems), the asterisk is a sort of wildcard operator. In Perl, you can refer to other variables and so on by using the asterisk operator:
*iceCream;
When used in this manner, the asterisk is also known as a typeglob. The asterisk at the beginning of a term can be thought of as a wildcard match for all the mangled names generated internally by Perl.
You can use a typeglob in the same way you use a reference because the de-reference syntax always indicates the kind of reference you want. ${*iceCream} and ${\$iceCream} both indicate the same scalar variable. Basically, *iceCream refers to the entry in the internal _main associative array of all symbol names for the _main package. *kamran really translates to $_main{'kamran'} if you are in the _main package context. If you are in another package, the _packageName{} hash is used.
When evaluated, a typeglob produces a scalar value that represents the first objects of that name. This includes file handles, format specifiers, and subroutines.
In addition to consulting the obvious documents such as the Perl man pages, look at the Perl source code for more information. The 't/op' directory in the Perl source tree has some regression test routines that should definitely get you thinking. A lot of documents and references are available at the Web sites www.perl.com and PerlMonks
| Q: | How do I know what type of address a pointer is pointing to? |
| A: | The address printed out with the print statement on a reference has a qualifier word in front of it. For example, a reference to a hash has the word HASH followed by an address value, an array has the word ARRAY, and so on. |
| Q: | How are multidimensional arrays possible using Perl? |
| A: | References in Perl point to scalars only. References to arrays point to the beginning of the array. Arrays can contain references to other arrays, hashes, and so on. The way to create multidimensional arrays in Perl is by using references to references. |
| Q: | What's the best way to pass more than one array into a subroutine? |
| A: | Pass references to the arrays, using the \@arrayname
for each array passed-as in the following call: mysub(\@one, \@two); Within the subroutine, take each reference off one at a time. my ($a, $b) = @_; Now use @$a and @$b to get to the arrays passed into the subroutines. |
| Q: | Why is *moo more efficient to use than $_main{'moo'}? Is there a difference in usage? |
| A: | Both *moo and $_main{'moo'} mean the same variable (as long as you aren't using a package). *moo is more efficient because the reference is looked up once at compile time, whereas $_main{'moo'} is evaluated at runtime and evaluated each time it is run. |