Prev | Up | Contents | Down | Next

Namespaces, visibility and scope of variables

Every programming language has a philosophy, and an important part of this philosophy is how the programming language defines rules for allocation and visibility of variable, ies have to do with the way the names of variables are managed. Details of which variables are visible to which parts of the program, and what names mean what, and when, are of prime importance.

Perl has two types of variables

The problem if visibility of variables is one of the oldest problem that language designers were trying to solve. Fortran introduced the notion of common blocks, Algol introduced variables local to procedure that were allocated when the procedure was called and discarded when you exist from the subroutine (stack was used for this purpose). This stack mentality was the dominant paradigm until PL/1  clearly distinguished method of allocation and visibility as two separate categories. PL/1 introduced three classic types of allocation:

Separately PL/1 defined two classes of visibility:

C, being a simplification of PL/1 as for variable allocation and visibility preserved this hierarchy. Java was the first mainstream language where all variables were allocation on the heap instead of stack or specific memory area and that used automatic garbage collection of unused variables. Although from the point of view of language design Java was very conservative, weak attempt to create Cobol with C syntax, the idea of garbage collection proved valuable and confirmed the Perl did it right 15 year earlier.  

Perl has less clear separation of issues of visibility and memory allocation then PL/1 but still it provides all three memory allocation category with an interesting trust that static variables and automatic variables are emulated with heap.

Static variables corresponds to state variables (Perl 5.10 and later)

Automatic variables corresponds to my variables

And external variables are what is called Perl global variables. They are separated into several user defined namespaces. If use did not define any namespace all global variables belong to namespace main::

Global variables are sometimes called "package variables" as namespace in Perl is confined to a package.

They can be declared with the explcit namespace prefixed to the variable:

$main::foo; $CGI::POST_MAX; @foo::bar;

[download]

All packages can access the variable $foo in the main symbol table (%main:: or the shorthand %::) by using $main::foo.

A lexically scoped variable is declared with my and state variables (alternative to "my" variables that keeps the value between invocations),

my $foo; my $POST_MAX; my @bar;

Lexically scopes variables cannot be accessed outside the subroutine or block (scope) in which they were declared.

For those who do not want to verify thier scripts for misspelings using Perl -cw you can use strict and you try to access a variable that's not previously declared, you'll get an error similar to the following:

Global symbol "$foo" requires explicit package name at C:\test.pl +line 2.

[download]

Basically, Perl expects that you are trying to use a package variable, but left off the package name. The problem with using package names ($main::foo) is that strict ignores these variables and if you type $main::field in one place and $main::feild in another place, strict won't warn you of the typo.

our tries to help out by letting you use package variables without adding the package name. The following two variables are the same:

 

Perl has five types of variable

A global variable has a package name prepending to it:

$main::foo; $CGI::POST_MAX; @foo::bar;

[download]

 

All packages can access the variable $foo in the main symbol table (%main:: or the shorthand %::) by using $main::foo. Global variables are generally a bad idea, but do crop up in a lot of code. If you omit the namespace name Perl assumes that you refer to namespace main.

The problem with using package names ($main::foo) is that strict ignores these variables and if you type $main::field in one place and $main::feild in another place, strict won't warn you of the typo.

A lexically scoped variable is declared with my.  It exists only as long as the scope is executed and is reset to default or to undefined each new execution. This also leads to very subtle coding errors. Try the following code:
 
use strict; for ( 1 .. 3 ) { &doit } sub doit { our $foo; print ++$foo . "\n"; }

[download]

 

That will print 1, 2, and 3 on successive lines. Change the our to a my and it prints 1, 1, and 1. Because the my variable goes out of scope at the end of the sub and the our variable doesn't (since it's global), you get wildly different results.

 

they are not the same and cannot be accessed outside of the scope (denoted by curvy paratheesis) in which they were declared.

our tries to help out by letting you use package variables without adding the package name. The following two variables are the same:

Coping with Scoping

Copyright 1998 The Perl Journal. Reprinted with permission.


Just the FAQs: Coping with Scoping

Package Variables

        $x = 1

Here, $x is a package variable. There are two important things to know about package variables:

  1. Package variables are what you get if you don't say otherwise.
  2. Package variables are always global.

Global means that package variables are always visible everywhere in every program. After you do $x = 1, any other part of the program, even some other subroutine defined in some other file, can inspect and modify the value of $x. There's no exception to this; package variables are always global.

Package variables are divided into families, called packages. Every package variable has a name with two parts. The two parts are analogous to the variable's given name and family name. You can call the Vice-President of the United States `Al', if you want, but that's really short for his full name, which is `Al Gore'. Similarly, $x has a full name, which is something like $main::x. The main part is the package qualifier, analogous to the `Gore' part of `Al Gore'. Al Gore and Al Capone are different people even though they're both named `Al'. In the same way, $Gore::Al and $Capone::Al are different variables, and $main::x and $DBI::x are different variables.

You're always allowed to include the package part of the variable's name, and if you do, Perl will know exactly which variable you mean. But for brevity, you usually like to leave the package qualifier off. What happens if you do?

The Current Package

If you just say $x, perl assumes that you mean the variable $x in the current package. What's the current package? It's normally main, but you can change the current package by writing

        package Mypackage;

in your program; from that point on, the current package is Mypackage. The only thing the current package does is affect the interpretation of package variables that you wrote without package names. If the current package is Mypackage, then $x really means $Mypackage::x. If the current package is main, then $x really means $main::x.

If you were writing a module, let's say the MyModule module, you wuold probably put a line like this at the top of the module file:

        package MyModule;

From there on, all the package variables you used in the module file would be in package MyModule, and you could be pretty sure that those variables wouldn't conflict with the variables in the rest of the program. It wouldn't matter if both you and the author of DBI were to use a variable named $x, because one of those $xes would be $MyModule::x and the other would be $DBI::x.

Remember that package variables are always global. Even if you're not in package DBI, even if you've never heard of package DBI, nothing can stop you from reading from or writing to $DBI::errstr. You don't have to do anything special. $DBI::errstr, like all package variables, is a global variable, and it's available globally; all you have to do is mention its full name to get it. You could even say

        package DBI;
        $errstr = 'Ha ha Tim!';

and that would modify $DBI::errstr.

Package Variable Trivia

There are only three other things to know about package variables, and you might want to skip them on the first reading:

  1. The package with the empty name is the same as main. So $::x is the same as $main::x for any x.
  2. Some variables are always forced to be in package main. For example, if you mention %ENV, Perl assumes that you mean %main::ENV, even if the current package isn't main. If you want %Fred::ENV, you have to say so explicitly, even if the current package is Fred. Other names that are special this way include INC, all the one-punctuation-character names like $_ and $$, @ARGV, and STDIN, STDOUT, and STDERR.
  3. Package names, but not variable names, can contain ::. You can have a variable named $DBD::Oracle::x. This means the variable x in the package DBD::Oracle; it has nothing at all to do with the package DBD which is unrelated. Isaac Newton is not related to Olivia Newton-John, and Newton::Isaac is not related to Newton::John::Olivia. Even though it appears that they both begin with Newton, the appearance is deceptive. Newton::John::Olivia is in package Newton::John, not package Newton.

That's all there is to know about package variables.

Package variables are global, which is dangerous, because you can never be sure that someone else isn't tampering with them behind your back. Up through Perl 4, all variables were package variables, which was worrisome. So Perl 5 added new variables that aren't global.

Issues of Scope with my() and local()

Chapter 1 "Perl Overview," alluded to some issues related to scope. These issues are very important with relation to subroutines. In particular, all variables inside subroutines should be made lexical local variables (via my()) or dynamic local variables (via local()). In Perl 4.0, the only choice is local(), because my() was introduced in Perl 5.0.

Variables declared with the my() construct are considered to be lexical local variables. These variables are not entered in the symbol table for the current package; therefore, they are totally hidden from all contexts other than the local block within which they are declared. Even subroutines called from the current block cannot access lexical local variables in that block.Lexical local variables must begin with an alphanumeric character (or an underscore).

Variables declared by means of the local() construct are considered to be dynamic local variables. The value is local to the current block and any calls from that block. You can localize special variables as dynamic local variables, but you cannot make them into lexical local variables. These two differences from lexical local variables show the two cases in Perl 5.0 in which it is still advisable to use local() rather than my():

In general, you should be using my instead of local, because it's faster and safer. Exceptions to this rule include the global punctuation variables, file handles and formats, and direct manipulation of the Perl symbol table itself. Format variables often use local, though, as do other variables whose current value must be visible to called subroutines.

Lexical Variables

Perl's other set of variables are called lexical variables (we'll see why later) or private variables because they're private. They're also sometimes called my variables because they're always declared with my. It's tempting to call them `local variables', because their effect is confined to a small part of the program, but don't do that, because people might think you're talking about Perl's local operator, which we'll see later. When you want a `local variable', think my, not local.

The declaration

        my $x;

creates a new variable, named x, which is totally inaccessible to most parts of the program---anything outside the block where the variable was declared. This block is called the scope of the variable. If the variable wasn't declared in any block, its scope is from the place it was declared to the end of the file.

You can also declare and initialize a my variable by writing something like

        my $x = 119;

You can declare and initialize several at once:

        my ($x, $y, $z, @args) = (5, 23, @_);

Let's see an example of where some private variables will be useful. Consider this subroutine:

        sub print_report {
          @employee_list = @_;
          foreach $employee (@employee_list) {
            $salary = lookup_salary($employee);
            print_partial_report($employee, $salary);
          }
        }

If lookup_salary happens to also use a variable named $employee, that's going to be the same variable as the one used in print_report, and the works might get gummed up. The two programmers responsible for print_report and lookup_salary will have to coordinate to make sure they don't use the same variables. That's a pain. In fact, in even a medium-sized project, it's an intolerable pain.

The solution: Use my variables:

        sub print_report {
          my @employee_list = @_;
          foreach my $employee (@employee_list) {
            my $salary = lookup_salary($employee);
            print_partial_report($employee, $salary);
          }
        }

my @employee_list creates a new array variable which is totally inaccessible outside the print_report function. for my $employee creates a new scalar variable which is totally inaccessible outside the foreach loop, as does my $salary. You don't have to worry that the other functions in the program are tampering with these variables, because they can't; they don't know where to find them, because the names have different meanings outside the scope of the my declarations. These `my variables' are sometimes called `lexical' because their scope depends only on the program text itself, and not on details of execution, such as what gets executed in what order. You can determine the scope by inspecting the source code without knowing what it does. Whenever you see a variable, look for a my declaration higher up in the same block. If you find one, you can be sure that the variable is inaccessible outside that block. If you don't find a declaration in the smallest block, look at the next larger block that contains it, and so on, until you do find one. If there is no my declaration anywhere, then the variable is a package variable.

my variables are not package variables. They're not part of a package, and they don't have package qualifiers. The current package has no effect on the way they're interpreted. Here's an example:

        my $x = 17;

        package A;
        $x = 12;
      $x = 20;

        # $x is now 20.
        # $A::x and $B::x are still undefined

The declaration my $x = 17 at the top creates a new lexical variable named x whose scope continues to the end of the file. This new meaning of $x overrides the default meaning, which was that $x meant the package variable $x in the current package.

package A changes the current package, but because $x refers to the lexical variable, not to the package variable, $x=12 doesn't have any effect on $A::x. Similarly, after package B, $x=20 modifies the lexical variable, and not any of the package variables.

At the end of the file, the lexical variable $x holds 20, and the package variables $main::x, $A::x, and $B::x are still undefined. If you had wanted them, you could still have accessed them by using their full names.

The maxim you must remember is:

Package variables are global variables.
For private variables, you must use my.

local and my

Almost everyone already knows that there's a local function that has something to do with local variables. What is it, and how does it related to my? The answer is simple, but bizarre:

my creates a local variable. local doesn't.

First, here's what local $x really does: It saves the current value of the package variable $x in a safe place, and replaces it with a new value, or with undef if no new value was specified. It also arranges for the old value to be restored when control leaves the current block. The variables that it affects are package variables, which get local values. But package variables are always global, and a local package variable is no exception. To see the difference, try this:

        $lo = 'global';
        $m  = 'global';
        A();

        sub A {
          local $lo = 'AAA';
          my    $m  = 'AAA';
          B();
        }

        sub B {
          print "B ", ($lo eq 'AAA' ? 'can' : 'cannot') ,
                " see the value of lo set by A.\n";

          print "B ", ($m  eq 'AAA' ? 'can' : 'cannot') ,
                " see the value of m  set by A.\n";
        }

This prints

        B can see the value of lo set by A.
        B cannot see the value of m  set by A.

What happened here? The local declaration in A saved a new temporary value, AAA, in the package variable $lo. The old value, global, will be restored when A returns, but before that happens, A calls B. B has no problem accessing the contents of $lo, because $lo is a package variable and package variables are always available everywhere, and so it sees the value AAA set by A.

In contrast, the my declaration created a new, lexically scoped variable named $m, which is only visible inside of function A. Outside of A, $m retains its old meaning: It refers the the package variable $m; which is still set to global. This is the variable that B sees. It doesn't see the AAA because the variable with that value is a lexical variable, and only exists inside of A.

What Good is local?

Because local does not actually create local variables, it is not very much use. If, in the example above, B happened to modify the value of $lo, then the value set by A would be overwritten. That is exactly what we don't want to happen. We want each function to have its own variables that are untouchable by the others. This is what my does.

Why have local at all? The answer is 90% history. Early versions of Perl only had global variables. local was very easy to implement, and was added to Perl 4 as a partial solution to the local variable problem. Later, in Perl 5, more work was done, and real local variables were put into the language. But the name local was already taken, so the new feature was invoked with the word my. my was chosen because it suggests privacy, and also because it's very short; the shortness is supposed to encourage you to use it instead of local. my is also faster than local.

When to Use my and When to Use local

Always use my; never use local.

Wasn't that easy?

Other Properties of my Variables

Every time control reaches a my declaration, Perl creates a new, fresh variable. For example, this code prints x=1 fifty times:

        for (1 .. 50) {
          my $x;
          $x++;
          print "x=$x\n";
        }

You get a new $x, initialized to undef, every time through the loop.

If the declaration were outside the loop, control would only pass by it once, so there would only be one variable:

        { my $x;
          for (1 .. 50) {
            $x++;
            print "x=$x\n";
          }     
        }
        

This prints x=1, x=2, x=3, ... x=50.

You can use this to play a useful trick. Suppose you have a function that needs to remember a value from one call to the next. For example, consider a random number generator. A typical random number generator (like Perl's rand function) has a seed in it. The seed is just a number. When you ask the random number generator for a random number, the function performs some arithmetic operation that scrambles the seed, and it returns the result. It also saves the result and uses it as the seed for the next time it is called.

Here's typical code: (I stole it from the ANSI C standard, but it behaves poorly, so don't use it for anything important.)

        $seed = 1;
        sub my_rand {
          $seed = int(($seed * 1103515245 + 12345) / 65536) % 32768;
          return $seed;
        }

And typical output:

        16838
        14666
        10953
        11665
        7451
        26316
        27974
        27550

There's a problem here, which is that $seed is a global variable, and that means we have to worry that someone might inadvertently tamper with it. Or they might tamper with it on purpose, which could affect the rest of the program. What if the function were used in a gambling program, and someone tampered with the random number generator?

But we can't declare $seed as a my variable in the function:

        sub my_rand {
          my $seed;
          $seed = int(($seed * 1103515245 + 12345) / 65536) % 32768;
          return $seed;
        }

If we did, it would be initialized to undef every time we called my_rand. We need it to retain its value between calls to my_rand.

Here's the solution:

        { my $seed = 1;
          sub my_rand {
            $seed = int(($seed * 1103515245 + 12345) / 65536) % 32768;
            return $seed;
          }
        }

The declaration is outside the function, so it only happens once, at the time the program is compiled, not every time the function is called. But it's a my variable, and it's in a block, so it's only accessible to code inside the block. my_rand is the only other thing in the block, so the $seed variable is only accessible to the my_rand function.

$seed here is sometimes called a `static' variable, because it stays the same in between calls to the function. (And because there's a similar feature in the C language that is activated by the static keyword.)

my Variable Trivia

  1. You can't declare a variable my if its name is a punctuation character, like $_, @_, or $$. You can't declare the backreference variables $1, $2, ... as my. The authors of my thought that that would be too confusing.
  2. Obviously, you can't say my $DBI::errstr, because that's contradictory---it says that the package variable $DBI::errstr is now a lexical variable. But you can say local $DBI::errstr; it saves the current value of $DBI::errstr and arranges for it to be restored at the end of the block.
  3. New in Perl 5.004, you can write

            foreach my $i (@list) {
    

    instead, to confine the $i to the scope of the loop instead. Similarly,

            for (my $i=0; $i<100; $i++) { 
    

    confines the scope of $i to the for loop.

Declarations

If you're writing a function, and you want it to have private variables, you need to declare the variables with my. What happens if you forget?

        sub function {
          $x = 42;        # Oops, should have been my $x = 42.
        }

In this case, your function modifies the global package variable $x. If you were using that variable for something else, it could be a disaster for your program.

Recent versions of Perl have an optional protection against this that you can enable if you want. If you put

        use strict 'vars';

at the top of your program, Perl will require that package variables have an explicit package qualifier. The $x in $x=42 has no such qualifier, so the program won't even compile; instead, the compiler will abort and deliver this error message:

        Global symbol "$x" requires explicit package name at ...

If you wanted $x to be a private my variable, you can go back and add the my. If you really wanted to use the global package variable, you could go back and change it to

        $main::x = 42;

or whatever would be appropriate.

Just saying use strict turns on strict vars, and several other checks besides. See perldoc strict for more details.

Now suppose you're writing the Algorithms::KnuthBendix modules, and you want the protections of strict vars But you're afraid that you won't be able to finish the module because your fingers are starting to fall off from typing $Algorithms::KnuthBendix::Error all the time.

You can save your fingers and tell strict vars to make an exception:

        package Algorithms::KnuthBendix;
        use vars '$Error';

This exempts the package variable $Algorithms::KnuthBendix::Error from causing a strict vars failure if you refer to it by its short name, $Error.

You can also turn strict vars off for the scope of one block by writing

        { no strict 'vars';

          # strict vars is off for the rest of the block.

        }

Summary

Package variables are always global. They have a name and a package qualifier. You can omit the package qualifier, in which case Perl uses a default, which you can set with the package declaration. For private variables, use my. Don't use local; it's obsolete.

You should avoid using global variables because it can be hard to be sure that no two parts of the program are using one another's variables by mistake.

To avoid using global variables by accident, add use strict 'vars' to your program. It checks to make sure that all variables are either declared private, are explicitly qualified with package qualifiers, or are explicitly declared with use vars.


Glossary


Notes

  1. The tech editors complained about my maxim `Never use local.' But 97% of the time, the maxim is exactly right. local has a few uses, but only a few, and they don't come up too often, so I left them out, because the whole point of a tutorial article is to present 97% of the utility in 50% of the space.

    I was still afraid I'd get a lot of tiresome email from people saying ``You forgot to mention that local can be used for such-and-so, you know.'' So in the colophon at the end of the article, I threatened to deliver Seven Useful Uses for local in three months. I mostly said it to get people off my back about local. But it turned out that I did write it, and it was published some time later.

    The Seven Useful Uses of local is now available on the web site. It appeared in The Perl Journal issue #14.

  2. Here's another potentially interesting matter that I left out for space and clarity. I got email from Robert Watkins with a program he was writing that didn't work. The essence of the bug looked like this:
            my $x;
    
            for $x (1..5) {
              s();
            }
    
            sub s { print "$x, " }
    

    Robert wanted this to print 1, 2, 3, 4, 5, but it did not. Instead, it printed , , , , , . Where did the values of $x go?

    The deal here is that normally, when you write something like this:

                        for $x (...) { }
    

    Perl wants to confine the value of the index variable to inside the loop. If $x is a package variable, it pretends that you wrote this instead:

            { local $x; for $x (...) { } }
    

    But if $x is a lexical variable, it pretends you wrote this instead, instead:

            { my $x;    for $x (...) { } }
    

    This means that the loop index variable won't get propagated to subroutines, even if they're in the scope of the original declaration.

    I probably shouldn't have gone on at such length, because the perlsyn manual page describes it pretty well:

    ...the variable is implicitly local to the loop and regains its former value upon exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global one, but it's still localized to the loop. (Note that a lexically scoped variable can cause problems if you have subroutine or format declarations within the loop which refer to it.)

    In my opinion, lexically scoping the index variable was probably a mistake. If you had wanted that, you would have written for my $x ... in the first place. What I would have liked it to do was to localize the lexical variable: It could save the value of the lexical variable before the loop, and restore it again afterwards. But there may be technical reasons why that couldn't be done, because this doesn't work either:

    	my $m;
    	{ local $m = 12;
    	  ...
    	}
    

    The local fails with this error message:

    	Can't localize lexical variable $m...
    

    There's been talk on P5P about making this work, but I gather it's not trivial.


Return to: Universe of Discourse main page | Perl Paraphernalia | Just the FAQs

mjd-perl-faqs-id-iwe+bgebuph+@plover.com