Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Introduction to Perl for Unix System Administrators

(Perl without excessive complexity)

by Dr Nikolai Bezroukov


Prev | Up | Contents | Down | Next

3.2. Operations on Arrays

From the point of view of a typical University data structure course an array is a data type that contains or stores numbered items sequentially in memory. Each numbered datum is called an element of the array, and the number assigned to an element is called its index.

But Perl arrays are a different and are more common to lists then to arrays. First because Perl is an untyped language, an element of an array may be of any type, and different elements of the same array may be of different types. Array elements may even contain other arrays, which allows you to create data structures that are arrays of arrays.

The second feature of Perl arrays is that they are sparse. This means that array indexes need not fall into a contiguous range of numbers, and that memory is allocated only for those array elements that are actually stored in the array. They are more like an index in the database. Thus, when you assign values only to the first and, say, 1000th elements of an array, Perl will allocate memory only for two elements, not for 1000 elements.

As we already mentions it's very important to understand that Perl arrays should be considered not as arrays per se, but as an attempt to introduce a concept of programmable editor buffer into the language (for example ed text buffer). From the point of view of data structures Perl arrays are close to indexed lists. We will use the terms list and array more or less interchangeably, but there is a small difference between them that we will discuss later.

Perl arrays should be considered not as arrays per se, but as an attempt to introduce a concept of programmable editor buffer into the scripting language: indexes are just line numbers on this buffer

Anyway, like in other languages arrays allow to access to their elements by a integer number (index) and like in C the first element of array has index zero. For example the first element of array @week  is $week[0], the second element is $week[1], and so on. Actually you will be better off thinking about them as line numbers in a text buffer. That can help more quickly learn a useful innovation introduced by Perl and very similar to the ideas used in sed, vi and other text editors: we can count elements from the other end of the array using negative indexes. The last element of the array @week  is $week[-1], element before it is  $week[-2] and so on.

Negative indexes are used quite ingeniously in Perl -- as a way of calculating index in the reverse direction and the last element of the array @week is $week[-1], the element before last is $week[-2], etc. This is quite  convenient shortcut worth remembering. 

Being actually lists, arrays in Perl have no lower or upper bound and can accept any type of variable (both numbers and strings). All array names are prefixed by an @ symbol (but elements should be prefixed with $ -- the nuance we already discussed). For example:

@workweek  = ("Mn", "Ts", "Wn", "Th","Fr"); # initialization of array @workweek with 5 values
@weekend = ("St", "Sn"); # another array with just two elements
@day_array = ($day); # an array with just one element

Please note the difference: @day_array is an array with one element. $day is a scalar.  List notation for arrays can be used whenever you can use an array in Perl. Important case is its use of the left side of the assignment statements that we will discuss next.

Index brackets in Perl enforce conversion to numeric, so $workweek["Mn"] is essentially equal to $workweek[0].

Arrays and Lists, Arrays Initialization

Lists in Perl is a notation very similar to arrays. List should be included in parenthesis and should contain zero or more (usually scalar) values separated by comma, for example ("Mn", "Ts", "Wn", "Th","Fr"). Lists can be assigned to arrays and vise versa (see below). That significantly increase power and flexibility of the language.

Actually list notation for arrays has one difference with an array notation: in a scalar context, array returns the length of the array, but list returns the last value (like the C comma operator), for example.

$last_day = ("Mn", "Ts", "Wn", "Th", "Fr");  # $last_day="Fr"

List also can be used to assign initial values to several related scalar variables, for example:

($day, $month, $year)=(10,2,2000);

In case of integer arrays, they can be initialized with so called "ranges", for example:

    $all_days_of_the_month=(1..31);

Array operations, push/pop, shift/unshift pairs of built-in functions

Perl provide a set of array operations very similar to PL/1 operations. One problem is that the set of array operations is different from the set of operations of strings. That's regrettable and complicates the language.

As arrays are actually a text editor buffer Perl provide a pretty rich functionality similar to the functionality of programmable editors like Sed. First of all you can copy one array to another by simple assignment

@week=@workweek; # buffer to buffer (Select all) copy

What is important that on both sides of the assignment statement one can also use list notation. This way  arrays can be used to make multiple assignments to scalar variables and vise versa.  This is a very flexible and powerful feature that can be used for transposition of elements, extracting elements from array, etc. For example:

($a, $b) = ($b, $a);	# Transposition of $a and  $b. It works...
($a, $b) = @week;	# $a and $b are the first two elements of @week.

The push operation make it possible to concatenates arrays with scalars or with arrays. The push build-in function does exactly this:

push(@week, "St", "Sn"); # can add element only to the end (concatenate)

Here is another example when we concatenate two arrays using push:

push(@week, @weekend); # same effect as above 

The push function returns the length of the new list.

The opposite to the push function --  pop function is essentially an analog of  chop function used for strings (but due to, probably questionable decision to permit usage of chop with arrays with different semantic one needs a new name -- here one can see that the possibility to simplify the language by extending semantic of string functions on arrays was missed and this definitely  overcomplicates the language) and will remove the last item from a list and return it. Please note that the array size will be truncated by one.

$holiday = pop(@week);	# Now $holiday will contain "Sn"

Built-in function pop is extracting the same element as $ARRAY[$#ARRAY--]. If there are no elements in the array, pop returns the undefined value (although this may happen at other times as well). If ARRAY is omitted, pop performs operation on the @ARGV array in the main program, and the @_ array in subroutines, just like shift.

Actually we usually process the array from the beginning and in this case shift function is more convenient than pop. It removes the first element of an array, and return it

The shift function returns the first element of the array and remove it, shortening the array by one. If there are no elements in the array, shift returns the value undef

The opposite function unshift will add element to the beginning of the array. 

Note:  array can contain empty or undef elements. So the index of the last element does not guarantee that all elements exists. It is just a number that is one equal to the maximum index ever used -- it does not tell you how many actual elements are present. Total number of slots in the array is one more than the index of the last element. To get it you can  assign an array to a scalar variable or use the built-in function scalar.

$days_in_a_week = $#week+1; # will assign the index of the last element incremented by one
$days_in_a_week = scalar(@week); # same thing

Quotes will turns the list into a string with a space between each element. So they act as an implicit call to the join function, not to the scalar function as you might expect:

$f = "@week"; 	# result is the sting that contains all elements of @week with delimiter 
		#dependent of the value of $" -- default delimiter(usually blank)

Conversions between Strings and Array (split and unpack functions)

The function split can be classified as string function, as array function and as the function that can use a regular expression, but we will discuss it as a string function. The function generally accept two arguments: delimiter (which is a regular expression, but we will limit ourselves to strings right now) and target string. For example in Unix colon is a pretty common delimiter for configuration files and we can get elements from the line using split function:

@pass_elems=split(':',pass_line);

Target string or both arguments can be omitted  so if $_ contain required string we can write @pass_elems=split(':'). Or if we plan to split on whitespace @pass_lelem=split; At the same time Perl does not understand the notation split(,pass_line); which have a clear semantic meaning -- default delimiter is used. Another similar example:

$_ = "Nick,Tanya,Sergey";
@family = split(',');
# equal to split(',',$_)

which has the same overall effect as

@family = ("Nick", "Tanya", "Sergey");

The default delimiter is whitespace (set of characters that includes blank, tab and newline).

More interesting usage of split is parsing the string into several scalars by using the list notation on the right side of the assignment statement:

$_ = "Nick Tanya Sergey"; # a default variable that split operates upon

($husband)=split; # $husband will contain Nick

($husband, $wife, $child)=split; # all three

We will discuss more complex cases of split in the regular expression chapter.

open(passwd, '/etc/passwd');

while ( ) {

   ($login, $passwd, $uid, $gid, $gcos, $home, $shell) = split(":");

}

In the example above the split function works on the default input variable --  $_ and we have a list of scalar values instead of array on the left hand side of the assignment statement.  This possibility of using default variable we already saw in the try function, and it's another strange Perl creature -- useful and dangerous at the same time because you never know what will change this default variable as there are no rules in this regard in Perl and everything is decided on ad hos basis.  If you do not supply argument to split it will use $_ as a default argument. All Perl input functions put the current line into this variable after reading a record.

The split function can be conveniently used for conversion from strings to arrays and for parsing simple stream of data with fixed delimiter. Please remember that $_  is a default argument and whitespace is a default delimiter.

($weekend1,$weekend2)=split; # $_ is used as argument and whitespace as a default delimiter
($weekend1)=(split)[0]; # if 'St,Sn' is the string and ',' is 
                           # the delimiter then result will be "St"
($cur_day, @rest_of_the_week) = @week;	#  here the second element is an array 

There is an important nuance in using array in the list notation as in the example above -- array assignments are greedy, so you should never put array name anywhere but the last element of the list:

# @new_week will be assigned all elements from @weeks and the variable
# $newday will be left undefined.
(@new_week, $lastday) = @week;	# probably an error 

One important shortcut is the ability of split to convert a string into array of characters. In this case zero length string should be used as the first parameter:

@char=split('',$string);

We will return to this function in regular expressions chapter, but for now please read man page for the function:

Split up a string using a regexp delimiter

Splits a string into an array of strings, and returns it. By default, empty leading fields are preserved, and empty trailing ones are deleted.

If not in list context, returns the number of fields found and splits into the @_ array. (In list context, you can force the split into @_ by using ?? as the pattern delimiters, but it still returns the list value.) The use of implicit split to @_ is deprecated, however, because it clobbers your subroutine arguments.

If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.)

If LIMIT is specified and positive, splits into no more than that many fields (though it may split into fewer). If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop() would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified.

A pattern matching the null string (not to be confused with a null pattern //, which is just one member of the set of patterns matching a null string) will split the value of EXPR into separate characters at each point it matches that way. For example:

    print join(':', split(/ */, 'hi there'));

produces the output 'h:i:t:h:e:r:e'.

The LIMIT parameter can be used to split a line partially

    ($login, $passwd, $remainder) = split(/:/, $_, 3);

When assigning to a list, if LIMIT is omitted, Perl supplies a LIMIT one larger than the number of variables in the list, to avoid unnecessary work. For the list above LIMIT would have been 4 by default. In time critical applications it behooves you not to split into more fields than you really need.

If the PATTERN contains parentheses, additional array elements are created from each matching substring in the delimiter.

    split(/([,-])/, "1-10,20", 3);

produces the list value

    (1, '-', 10, ',', 20)

If you had the entire header of a normal Unix email message in $header, you could split it up into fields and their values this way:

    $header =~ s/\n\s+/ /g;  # fix continuation lines
    %hdrs   =  (UNIX_FROM => split /^(\S*?):\s*/m, $header);

The pattern /PATTERN/ may be replaced with an expression to specify patterns that vary at runtime. (To do runtime compilation only once, use /$variable/o.)

As a special case, specifying a PATTERN of space (' ') will split on white space just as split() with no arguments does. Thus, split(' ') can be used to emulate awk's default behavior, whereas split(/ /) will give you as many null initial fields as there are leading spaces. A split() on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field. A split() with no arguments really does a split(' ', $_) internally.

Example:

    open(PASSWD, '/etc/passwd');
    while (<PASSWD>) {
        ($login, $passwd, $uid, $gid,
         $gcos, $home, $shell) = split(/:/);
        #...
    }

unpack

Unpack is probably the second most useful built-in function after splice. Regrettably Perl does not support scanf function from C. Instead it provides similar functionality with unpack.  There can be question about introducing a new function instead of tried and true scanf, but again things are already done...

unpack() takes a string and extracts relevant parts of the string into a list, returning an array (in scalar context, it returns the first value produced). Like scanf function in C it uses formatting TEMPLATE. Here I will mention most frequently used formats. See Perl documentation for the complete list:

    A   An ascii string, truncate trailing spaces from the resulting string
    a   An ascii string, do not truncate trailing spaces from the result
  
    x   A byte that you need to ignore
    

When unpacking, "A" strips trailing spaces and nulls, but "a" does not.

It is interesting to note that one can emulate the substr function with unpack, for example
        $word2=substr("Hello world", 6,5); # get the second word
      $word2=unpack("A6 A5", "Hello world"); # same thing

  
If you need to extract several words list on the left side of the assignment statement should be used:

#--------------------------12345678901

($a,$b)=unpack("a5 x1 a5","Hello world");

The same result can be achieved by using

#-----------------------12345678901

($a,$b)=unpack("A6 A5","Hello world"); # A6 will truncate trailing space

Perl permits generating format string for the pack function dynamically. In case scalar variables are used in the format string they will be interpolated -- one can use scalar variables instead of constant length, for example

($i,$j,$k)=(5,1,5);

($a,$b)=unpack("a$i x$j a$k","Hello world");

Displaying arrays

Yes it is possible to print content of an array in print statement, but user beware. Since context is important, it shouldn't be too surprising that the following all produce different results:

print @week;	# the list will be expanded -- all elements will be printed 
print "@week";	# guess what will be printed ?
print @week."";	# Slightly pervert way to enforce a scalar context ;-)

The index of the last element of an array and the number of elements in the array

The way of getting the index of the last element is to use $# prefix, like in the example below:

$last_index = $#array;

The last index is one less that the number of elements as indexes are started with zero:

 @array = ('1','2')

 $i= $#array    # $i now equals 1 not 2 because it is the index of the lst element.

Again I would like to remind that scalar(@array) function provide the number of elements in the array, but  prefix $# ( you can consider it to be a function) provides with the index of the last element. For example:

for ($i = 0; $i < scalar(@X); $i++) {

    print $X[$i];

}

The index of the last element of array is always one less than the number of elements, expressions ($#arraya+1) and scalar(@array) are always equal.

Actually  operator  "<"  presuppose scalar on both parts (scalar content in Perl speak) so you can write:

for ($i = 0; $i < @X; $i++) {

    print $X[$i];

}

This is how loops are usually are written in Perl -- you will see that almost nobody use the scalar function.

The scalar built-in function convert its argument into a scalar. Return variable can be not that intuitive -- for example a scalar value of an array is the number of its elements, but never mind.

Again, if the scalar function is supplied with an array argument it returns the number of elements in an array. It just isn't the semantic you would like to expect, but that's how it is. For example:

@X = (1,2,3,4);

$X_size = scalar(@X);

print $X_size; # will print the number '4'.

Actually the scalar function is seldom used. For all operations that expect scalar arguments conversion will be performed automatically and you do not need specify function explicitly. For example:

$number_of_elements = @X;                     # @X in a scalar context

Grep and Map

There are two related functions in Perl grep that resembles Unix grep and map. Both are essentially an shorthand for the foreach loop. Thier main value is not a new functionality, but the ability to make the code more compact.  Both functions accept two arguments: the first is an expression a nd the second is the list or array.  Both are shorthand for foreach look.

 The main difference between them is that grep can just select certain elements from the array or list while map can transform them into a new area. 

Other Important Build-in Functions

sort

The function sort allows you to sort an array using your own comparison routine. For example:

sort (@array);
sort { expression } @array);
sort sort_function @array;

The default usage of sort is to sort in alphanumeric order. If you wish to to sort the list numerically you might use:

@array Name = sort ($a<=>$b) @array Name; # numeric sort

In this case expression is $a<=>$b is used as comparison function for sort. The sort function should use the special values $a and $b (to denote elements that we compare) and after expression is evaluated should provide just three value -1, 0 or +1. The value of expression  is  passed to the sorting subroutine.  

You can use a function call in place of expression, so sorting can be quite flexible and use for comparison only certain fields instead of all strings $a and $b.

reverse

Like in strings this function takes an array, and reverse the order of elements.  For example is we have

@week = ("Mn","Ts","Wn","Th","Fr","St","Sn"); # initialization of @week

then we can write

@reversed_week= reverse @week

Although we did not mention it, reverse is also defined on strings, but I was unable to find useful examples to warrant its inclusion in the sting built-in functions section.

Additional functions

The author does not mention a few very useful functions which you had to look up elsewhere, This includes splice and  glob. Unfortunately Perl set of functions for manipulating arrays and manipulating strings are unnecessary different and does not exploit similarities inherent in those who data structures.

Webliogrpahy

Internal Links:

External Links:

Summary

To get the last element in a list or array, use $array[-1] instead of $array[$#array]. The former works on both lists and arrays, but the latter does not.

There are more then a dozen built-in functions for working with array, functions that are similar to functions designed for working with strings but named differently and sometimes having slightly different semantic.

Questions

Will it work or not ?

    ($first, @rest) = 1 .. Inf;
Would it be better if the type of operation was specified in the operator like in: 

We might find some short prefix character stands in for ``list'' or ``scalar''. The obvious candidates are @ and $:

    @a $+ @b
    @a @/ @b

 

Supplement

Imitating strings using arrays

    use Tie::CharArray;
    my $foobar = 'a string';

    tie my @foo, 'Tie::CharArray', $foobar;
    $foo[0] = 'A';    # $foobar = 'A string'
    push @foo, '!';   # $foobar = 'A string!'
    print "@foo\n";   # prints: A   s t r i n g !

    tie my @bar, 'Tie::CharArray::Ord', $foobar; 
    $bar[0]--;        # $foobar = '@ string!'
    pop @bar;         # $foobar = '@ string'
    print "@bar\n";   # prints: 64 32 115 116 114 105 110 103

Alternative interface functions

    use Tie::CharArray qw( chars codes );
    my $foobar = 'another string';
    
    my $chars = chars $foobar;  # arrayref in scalar context
    push @$chars, '?';          # $foobar = 'another string?'

    $_ += 2 for codes $foobar;  # tied array in list context
                                # $foobar = 'cpqvjgt"uvtkpiA'

    my @array = chars $foobar;  # WARNING: @array isn't tied!

DESCRIPTION

In low-level programming languages such as C, and to some extent Java, strings are not primitive data types but arrays of characters, which in turn are treated as integers. This closely matches the internal representation of strings in the memory.

Perl, on the other hand, abstracts such internal details away behind the concept of scalars, which can be treated as either strings or numbers, and appear as primitive types to the programmer. This often better matches the way people think about the data, which facilitates programming by making common high-level manipulation tasks trivial.

Sometimes, though, the low-level view is better suited for the task at hand. Perl does offer functions such as ord()/chr(), pack()/unpack() and substr() that can be used to solve such tasks with reasonable efficiency. For someone used to the direct access to the internal representation offered by other languages, however, these functions may feel awkward. While this is often only a symptom of thinking in un-Perlish terms, sometimes being able to manipulate strings as character arrays really does simplify the code, making the intent more obvious by eliminating syntactic clutter.

This module provides a way to manipulate Perl strings through tied arrays. The operations are implemented in terms of the aforementioned string manipulation functions, but the programmer normally need not be aware of this. As Perl has no primitive character type, two alternative representations are provided:

Strings as arrays of single-character strings

The first way is to represent characters as strings of length 1. In most cases this is the most convenient representation, as such "characters" can be printed without explicit transformations and written as ordinary Perl string literals.

This representation is provided by the main class Tie::CharArray. As the class maps most array operations directly to calls to substr(), several features of that function apply. (Below, @foo is an array tied to Tie::CharArray and $n is a positive integer.)

In general, if you only put one-character strings into the array, and don't go beyond its end, there should be no problems.

Strings as arrays of small integers

While the representation described above is usually the most convenient one, it still does not allow direct arithmetic manipulation of the character code values. For tasks where this is needed, an alternative representation is provided by the subclass Tie::CharArray::Ord. Note that it is perfectly possible to manipulate a single string through both interfaces at the same time. As the array operations are still based on substr(), the first two of the above caveats apply here as well. Unicode support depends on whether and how the underlying perl implementation supports it.

Alternative interface functions

Since using tie() can sometimes seem inconvenient, Tie::CharArray can also export two functions to perform the tying internally. The functions are reproduced below in their entirety.

    sub chars ($) {
        tie my @chars, 'Tie::CharArray', $_[0];
        return wantarray ? @chars : \@chars;
    }
    sub codes ($) {
        tie my @codes, 'Tie::CharArray::Ord', $_[0];
        return wantarray ? @codes : \@codes;
    }

When called in scalar context, they return a reference to a tied array through which the characters of the string given to them can be manipulated. In list context the functions return the tied array itself.

This is of rather limited use, since the tied array is only temporary, and assigning it to a permanent array only copies the values it contains but does not tie the permanent array. However, if the temporary array is passed to a subroutine or a foreach loop, perl will alias the elements directly to the temporary array instead of copying them. What that means in practice is that you can write:

    foreach my $ch (chars $string) {
        # reverse bits in each character 
        $ch = pack "b*", unpack "B*", $ch;
    }

Prev | Up | Contents | Down | Next



Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

Created: November 7 1998; Last modified: September 07, 2009