Before we proceed, let's make a healthy assumption that we want to develop
the code under strict pragma and avoid using global variables, thus using my()
scoped variables whenever it's possible.
Let's look at this code:
nested.pl
-----------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
sub power_of_2 {
return $x ** 2;
}
my $result = power_of_2();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Don't let the weird subroutine names fool you, the
print_power_of_2() subroutine should print the power of two of
the passed number. Let's run the code and see whether it works:
print_power_of_2(5); print_power_of_2(6);
And run it:
% ./nested.pl 5^2 = 25 6^2 = 25
Ouch, something is wrong. May be there is a bug in Perl and it doesn't work correctly with number 6? Let's try again using the 5 and 7:
print_power_of_2(5); print_power_of_2(7);
And run it:
% ./nested.pl 5^2 = 25 7^2 = 25
Wow, does it works only for 5? How about using 3 and 5:
print_power_of_2(3); print_power_of_2(5);
and the result is:
% ./nested.pl 3^2 = 9 5^2 = 9
Now we start to understand--only the first call to the
print_power_of_2() function works correctly. Which makes us
think that our code has a memory for the results of first time
execution and an ignorance of the arguments from consequent executions.
Let's follow the guidelines and use a -w flag. Now execute the code:
% ./nested.pl Variable "$x" will not stay shared at ./nested.pl line 9. 5^2 = 25 6^2 = 25
We have never saw such a warning message before and we don't quite
understand what it means. A diagnostics pragma will certainly help us. Let's prepend this pragma before the strict pragma in our code:
#!/usr/bin/perl -w use diagnostics; use strict;
And execute it:
% ./nested.pl
Variable "$x" will not stay shared at ./nested.pl line 10 (#1)
(W) An inner (nested) named subroutine is referencing a lexical
variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of
the outer subroutine's variable as it was before and during the
*first* call to the outer subroutine; in this case, after the first
call to the outer subroutine is complete, the inner and outer
subroutines will no longer share a common value for the variable. In
other words, the variable will no longer be shared.
Furthermore, if the outer subroutine is anonymous and references a
lexical variable outside itself, then the outer and inner subroutines
will never share the given variable.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are called or referenced,
they are automatically rebound to the current values of such
variables.
5^2 = 25
6^2 = 25
Well, now everything is clear. We have the inner subroutine power_of_2() and the outer subroutine print_power_of_2() in our code.
When the inner power_of_2() subroutine is called for the first
time, it sees the value of the outer print_power_of_2()
subroutine's $x
variable. On consequent calls the $x variable wouldn't be updated, no matter what was the value of it in the
outer subroutine. That's why the $x variable is no longer be shared.
diagnostics pragma suggests using an anonymous subroutine (known also as closure). Let's rewrite the code to use this technique instead:
anonymous.pl
--------------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
my $func_ref = sub {
return $x ** 2;
};
my $result = &$func_ref();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Now $func_ref contains a reference to an anonymous function, which we later use when we
need to get the power of two. Since the anonymous function will be
generated afresh every time print_power_of_2() will be called
the correct answer will given. Let's verify:
% ./anonymous.pl 5^2 = 25 6^2 = 36
Indeed, it worked correctly as advertised.
First you might wonder, why in the world someone will need to define an inner subroutine. For example to improve the efficiency of perl scripts starting overhead you decide to write a daemon that will compile that the scripts and modules only once and store the cached pre-compiled code in memory. When some script ought to be executed you just tell the daemon the name of the script to run and it will do the rest.
Seems like an easy task, and it is. The only problem is once the script is
compiled, how do you execute it? Or let's put it the other way: after it
was executed for the first time and it stays compiled in the daemon memory,
how do you call it again? If you could enforce on developers to code the
scripts so each will have a subroutine called run() that will
actually execute the code in the script you have half of the problem
solved.
But how daemon knows to refer to some specific script if they all run in the main:: name space? An obvious thing is to ask the developers to declare a package in each and every script, and for the package name to be derived from the script name. Moreover, since there is chance that there will be more than once script with the same name but residing in different directories, the directory has to be a part of the package name in order to prevent name-space collisions. And don't forget that script can be moved from directory to directory and you will have to make sure that the package name will be corrected every time the script gets moved.
But why enforce these strange rules on developers, when we can arrange for
our daemon to do this work? For every script that daemon is about to
execute for the first time, it should be wrapped inside the package whose
name is constructed from the mangled path to the script and a subroutine
called run(). For example if the daemon is about to execute
the script /tmp/hello.pl:
hello.pl -------- #!/usr/bin/perl print "Hello\n";
Prior to running it, the daemon will change the code to be:
wrapped_hello.pl
----------------
package cache::tmp::hello_2epl;
sub run{
#!/usr/bin/perl
print "Hello\n";
}
Where the package name is constructed from prefix cache::, each directories separation slash replaced with :: and non ASCII characters are encoded, so the . becomes _2e.
Now when the daemon is requested to execute the script
/tmp/hello.pl, all it has to do is to build the package name as before based on the
location of the script and call its run() subroutine:
use cache::tmp::hello_2epl; cache::tmp::hello_2epl::run();
We have just written a partial prototype of the daemon we desired, the only not defined method is how to pass the path to the script to the daemon. This detail is left to the reader as an exercise.
If you are familiar with Apache::Registry module, you know that it works almost in the same way. It uses a different
package prefix and the generic function is called handler()
and not run(). The scripts to run are passed through the HTTP
protocol's headers.
Now you understand that there are cases where your normal subroutines can become inner, since if your script was a simple:
simple.pl
---------
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
Wrapped into a run() subroutine it becomes:
simple.pl
---------
package cache::simple_2epl;
sub run{
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
}
Therefore, hello() is an inner subroutine and if you have used
my() scoped variables defined and altered outside and used
inside hello(), it wouldn't work correctly starting from the
second call, as was explained in the previous section.
First of all there is nothing to worry about since if you do happen to have
``the my() scoped variable in the inner subroutine'' problem,
Perl will always alert you if you don't forget to turn the warnings On.
Given that you have a script that has this problem. What are the ways to solve it? There are many of them and we will discuss some of them here.
We will the following code to show different solutions.
multirun.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
This code executes the run() subroutine three times, which in
turn initializes the $counter variable to 0, every time it executed and then calls twice the
increment_counter() inner subroutine that prints
$counter's value after incrementing it. One might expect to see the following
output:
run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 3] Counter is equal to 1 ! Counter is equal to 2 !
But as we have already learned from the previous sections, this is not what we are going to see. Indeed, when we run the script we see:
% ./multirun.pl
Variable "$counter" will not stay shared at ./nested.pl line 18. run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 3 ! Counter is equal to 4 ! run: [time 3] Counter is equal to 5 ! Counter is equal to 6 !
Obviously, the $counter variable is not reinitialized on each run() execution,
therefore the $counter variable inside the increment_counter() subroutine preserves
its previous value from the last execution and increments it to the next
value.
One of the workarounds is to use globally declared variables, with the
vars pragma.
multirun1.pl
-----------
#!/usr/bin/perl -w
use strict;
use vars qw($counter);
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
If you run this and other offered below solutions, the correct expected output will be generated:
% ./multirun1.pl run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 3] Counter is equal to 1 ! Counter is equal to 2 !
By the way, the warning we saw before has gone and so the problem, since
there is no my() (lexically defined) variable used in the nested subroutine.
Another approach is to use fully qualified variables. This is a better one, since less memory will be used, but it adds a typing overhead:
multirun2.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$main::counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$main::counter++;
print "Counter is equal to $main::counter !\n";
}
} # end of sub run
You can also pass the variable to the subroutine by value and make the subroutine return it after it was updated. This adds time and memory overheads, so it's not a good idea if the variable can be very large.
Don't rely on the fact that the variable is small during the development of the application, it can grow quite big in situations you didn't expect. For example, a very simple HTML form text entry field can return a few megabytes of data if one of users is bored and want to test how good is your code. It's not uncommon to see user Copy-and-Paste core dump files of 10Mb in size into a form's text fields and submit it for your script to process.
multirun3.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
$counter = increment_counter($counter);
$counter = increment_counter($counter);
sub increment_counter{
my $counter = shift || 0 ;
$counter++;
print "Counter is equal to $counter !\n";
return $counter;
}
} # end of sub run
Finally, you can use references to do the job.
increment_counter() accepts a reference to a $counter variable and increments its value by first dereferencing it. The $counter variable outside gets affected by this change as well.
multirun4.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter(\$counter);
increment_counter(\$counter);
sub increment_counter{
my $r_counter = shift || 0;
$$r_counter++;
print "Counter is equal to $$r_counter !\n";
}
} # end of sub run
Here is yet another even more obscure reference usage. We modify the value
of $counter inside the subroutine by using the fact that variables in @_ are actually aliases, so if you directly modify one of the members of the
array the actual value of the passed variable gets changed.
multirun5.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter($counter);
increment_counter($counter);
sub increment_counter{
$_[0]++;
print "Counter is equal to $_[0] !\n";
}
} # end of sub run
Now you have at least five workarounds to choose from.
For more information please refer to perlref and perlsub manpages.
@INC is a special Perl variable which is an equivalent of the shell's PATH variable. While PATH includes a list of directories the executables are being looked up in, @INC contains a list of directories Perl modules and libraries can be loaded
from.
When you use(), require() or do() a
filename or a module, Perl gets a list of directories from the @INC variable to search for the file it was requested to load. If the file that
you want to load is not located in one of the listed directories, you have
to tell Perl where to find the file by providing it a relative path to one
of the directories in @INC or a full path to the file.
%INC is another special Perl variable that is used to cache the names of the
files and the modules that were successfully loaded and compiled by
use(), require() or do() functions.
Before attempting to load a file or a module, Perl checks whether it's
already in %INC
hash. If it's there--the loading and therefore the loaded code compilation
are not performed at all. Otherwise the file is loaded in memory and
attempted to be compiled.
If the file is successfully loaded and compiled, a new key-value pair is
added to %INC, where the key is the name of the file or module as it passed to the one
of the three functions we have just mentioned, and the value is a full path
to it in the file system if it was found in any of the @INC directories, but ".".
The following examples will make it easier to understand a described logic.
First, let's see what are the contents of @INC on my system:
% perl -e 'print join "\n", @INC' /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .
Notice the . (current directory) as a last directory in the list.
Now let's load a module strict.pm and see the contents of %INC:
% perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'
strict.pm => /usr/lib/perl5/5.00503/strict.pm
Since strict.pm was found in /usr/lib/perl5/5.00503/ directory and /usr/lib/perl5/5.00503/ is a part of @INC--%INC includes a full path as a value for the key strict.pm.
Now let's create the simplest module in /tmp/test.pm:
test.pm ------- 1;
It does nothing, but returns a true value when loaded. Now let's load it in different ways:
% cd /tmp
% perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => test.pm
Since the file was found relative to . (current directory) the relative path is inserted as a value, but if we
alter the @INC, by adding the /tmp to the end:
% cd /tmp
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => test.pm
we still get the relative path, since the module was found first relative
to ".", because the /tmp was after . in the list. But if we execute the same code from a different directory and
therefore the "." directory wouldn't match:
% cd /
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => /tmp/test.pm
we get the full path. We can also prepend the path with
unshift(), so it will be used for matching before "." and therefore we get a full path as well.
% cd /tmp
% perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => /tmp/test.pm
BEGIN{unshift @INC, "/tmp"}
can be replaced with more elegant:
use lib "/tmp";
Which executes exactly the BEGIN block from above.
These approaches to modifying @INC can be labor intensive, since if you want to move the script around in the
file-system you have to modify the path. This can be painful, for example,
when you move your scripts from development to a production server.
There is a FindBin module, which solves this problem is the plain perl world, but
unfortunately it doesn't work correctly under mod_perl.
If you use this module, you don't need to write a hard coded path. The following snippet does all the work for you (the file is /tmp/load.pl):
load.pl
-------
#!/usr/bin/perl
use FindBin ();
use lib "$FindBin::Bin";
use test;
print "test.pm => $INC{'test.pm'}\n";
In the above example $FindBin::Bin equals to /tmp. If we move the script somewhere else... e.g. /tmp/x in the code above
$FindBin::Bin equals to /home/x.
% /tmp/load.pl test.pm => /tmp/test.pm
Just like with use lib but no hard coded path required.
As I've mentioned earlier, FindBin will not work in mod_perl environment, since it's a module and as any
module it's loaded only once. So the first script using it will have all
the settings correct, but the rest of the scripts will not if located in a
different directory than the first one.
Before we proceed let's define what do we mean by module and library or file.
A file which contains perl subroutines and other code.
It generally doesn't include a package declaration.
Its last statement returns true.
Can be named in any desired way, but generally it has a .pl or .ph extensions.
Examples:
config.pl ---------- $dir = "/home/httpd/cgi-bin"; $cgi = "/cgi-bin"; 1;
mysubs.pl
----------
sub print_header{
print "Content-type: text/plain\r\n\r\n";
}
1;
A file which contains perl subroutines and other code.
It generally declares a package name at the beginning of it.
Its last statement returns true.
A naming convention requires it to have a .pm extension.
Example:
MyModule.pm
-----------
package My::Module;
$My::Module::VERSION = 0.01;
sub new{ return bless {}, shift;}
END { print "Quitting\n"}
1;
What require() does is reading a file with Perl code and
compiles it. Before attempting to load the file it looks up its argument in
%INC to see whether it was already loaded. If it was, require()
just returns without doing a thing. Otherwise the file will be attempted to
be loaded and compiled.
require() has to find the file, is has to load. If the
argument is a full path to the file, it just tries to read it. For example:
require "/home/httpd/perl/mylibs.pl";
If the path is relative, require() will attempt to search for
the file in all the directories listed in @INC. For example:
require "mylibs.pl";
If there is more than one occurrence of the file with the same name, in
directories listed in @INC the first occurrence will be used.
The file must return TRUE as the last statement to indicate successful execution of any
initialization code. Since you never know what changes the file will go
through in the future, you cannot be sure that the last statement will
always return TRUE. That's why the suggestion is to put ``1;'' at the end of file.
While you should use the real filename for most of the files. If the file is a module, you may use the following convention instead:
require My::Module;
This is equal to:
require "My/Module.pm";
If require() fails to load the file, either because it
couldn't find the file in question, the code failed to compile and didn't
return
TRUE at the end, the program would die(), unless the
require() statement would be enclosed into an
eval() block, like in this example:
require.pl
----------
#!/usr/bin/perl -w
eval { require "/file/that/does/not/exists"};
if ($@) {
print "Failed to load, because : $@"
}
print "\nHello\n";
When we execute the program:
% ./require.pl Failed to load, because : Can't locate /file/that/does/not/exists in @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .) at require.pl line 3. Hello
We see that the program didn't die(), because Hello was printed. This trick is useful when you want to check whether a user has some module installed,
but if she hasn't--it's not so critical, may be the program runs without
this module with a reduced set of functionality.
If we remove the eval() part and try again:
require.pl ---------- #!/usr/bin/perl -w require "/file/that/does/not/exists"; print "\nHello\n";
% ./require1.pl Can't locate /file/that/does/not/exists in @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3.
The program just die()s in the last example, which is what you
want in most of the cases.
For more information referrer to perlfunc manpage.
use() just like require() loads and compiles the
files with Perl code, but it works with modules only. Thus the only way to pass a module to load is by its name and not a
filename. If the module located in MyCode.pm, the correct way to use() it is:
use MyCode
and not:
use "MyCode.pm"
What use() does is translating of the passed argument into a
file name replacing :: with / and appending .pm at the end. So
My::Module becomes My/Module.pm.
use() is exactly equivalent to:
BEGIN { require Module; import Module LIST; }
Internally it calls to require() to do the loading and
compilation chores, when the former finishes its job, the
import() is being called, unless () is a second argument. The following pairs are equivalent:
use MyModule;
BEGIN {require MyModule; import MyModule; }
use MyModule qw(foo bar);
BEGIN {require MyModule; import MyModule ("foo","bar"); }
use MyModule ();
BEGIN {require MyModule; }
When non of the parameters passed to import() it imports the
default symbols if such were defined inside the module. The import() is not a builtin function--it's just an ordinary static method call into
the ``MyModule'' package to tell the module to import the list of features back into the
current package. See the Exporter manpage for more information.
There's a corresponding ``no'' command that un-imports symbols imported by use, i.e., it calls unimport Module LIST instead of
import().
While do() behaves almost identically to
require(), it reloads the file unconditionally. It doesn't
check %INC to see whether the file was already loaded.
If do() cannot read the file, it returns undef and sets $! to report the error. If do() can read the file but cannot
compile it, it returns undef and sets an error message in $@. If the file is successfully compiled, do() returns the value
of the last expression evaluated.
Next month we will get to the real stuff and start looking how one should code in mod_perl, which techniques should be deployed and which should be avoided.