mod_perl Coding Guidelines - Part I

By: Stas Bekman


perldoc's Rarely Known But Very Useful Options

First of all, I want to stress that you cannot become a Perl hacker without knowing how to read a Perl documentation and search through it. Books are good, but an easily accessable and searchable Perl reference at your fingertips is a great time saver.

While you can use an online Perl documentation at the Web, perldoc utility provides you an access to the documentation installed on your system. To find out what Perl manpages are available execute:

  % perldoc perl

To find what functions perl has, execute:

  % perldoc perlfunc

To learn the syntax and to find an example of specific known function, you would execute (e.g. for open()):

  % perldoc -f open

Note: As of perl5.00503 and earlier, there is a bug in this and (-q) options of perldoc--it wouldn't call pod2man, but display the section in POD format instead . But it's still readable and very useful.

To search through the Perl FAQ (perlfaq manpage) sections you would do (e.g for an open keyword):

  % perldoc -q open

will return you all the matching Q&A sections, still in POD format.

To read the perldoc manpage you execute:

  % perldoc perldoc


Tracing Warnings Reports

Sometimes it's very hard to understand what a warning is complaining about. You see the source code, but you cannot understand why some specific snippet produces that warning. The mystery often results from the fact that the code can be called from different places if it's located inside a subroutine.

Here is an example:

  warnings.pl
  -----------
  #!/usr/bin/perl -w
  
  correct();
  incorrect();
  
  sub correct{
    print_value("Perl");
  }
  
  sub incorrect{
    print_value();
  }
  
  sub print_value{
    my $var = shift;
    print "My value is $var\n";
  }

In the code above, print_value() prints the passed value, correct() passes the value to print and in incorrect() we forgot to pass it. When we run the script:

  % ./warnings.pl

we get the warning:

  Use of uninitialized value at ./warnings.pl line 16.

Perl complains about an undefined variable $var at the line that attempts to print its value:

  print "My value is $var\n";

But how do we know why it is undefined? The only reason is that the caller function didn't pass the argument. But how do we know who was the caller, in our example there are two possible offending callers, in general case there can be many of them. Basically we can use the caller() subroutine, which tells who has called us, but even that can be not enough, since there can be some third subroutine:

  sub third{
    second();
  }
  sub second{
    my $var = shift;
    first($var);
  }
  sub second{
    my $var = shift;
   print "Var = $var\n"
  }

The solution is quite simple. What we need is a full calls stack trace to the call that triggered the warning.

The Carp module comes to our aid with its cluck() function. Let's modify the script by adding a couple of lines. The rest of the script is unchanged.

  warnings2.pl
  -----------
  #!/usr/bin/perl -w
  
  use Carp ();
  local $SIG{__WARN__} = \&Carp::cluck;
  
  correct();
  incorrect();
  
  sub correct{
    print_value("Perl");
  }
  
  sub incorrect{
    print_value();
  }
  
  sub print_value{
    my $var = shift;
    print "My value is $var\n";
  }

Now when we execute it, we see:

  Use of uninitialized value at ./warnings2.pl line 19.
    main::print_value() called at ./warnings2.pl line 14
    main::incorrect() called at ./warnings2.pl line 7

Take a moment to understand the calls stack trace. The deepest calls are printed first. So the second line tells us that the warning was triggered in print_value() and the third--that print_value() was called by incorrect() subroutine.

  script => incorrect() => print_value()

We go into a incorrect() and indeed see that we forgot to pass the variable. Of course when you write a subroutine like print_value it could be a good idea to check the passed arguments before starting execution. But it was ``good'' enough to show you how to ease the debugging process.

Sure, you say. I could find that problem by simple inspection of the code. You're right, but I promise you that your task would be quite complicated and time consuming for code of some thousands of lines.

In addition, under mod_perl certain uses of the eval operator and ``here documents'' are known to throw off Perl's line numbering, so the line numbers are often incorrect when reporting warnings and errors. Getting the trace helps a lot. In the future I'll show how to correct the line numbering.


Using global variables and sharing them between modules/packages


Making the variables global

When you first wrote $x in your code you created a global variable. It is visible everywhere in the file you have use it. or if defined it inside a package - it is visible inside this package. But it will work only if you do not use strict pragma and you HAVE to use this pragma if you want to run your scripts under mod_perl.


Making the variables global with strict pragma On

First you use :

  use strict;

Then you use:

 use vars qw($scalar %hash @array);

Starting from this moment the variables are global in the package you defined them, if you want to share global variables between packages, here what you can do.


Using Exporter.pm to share global variables

Assume that you want to share the CGI.pm's object (I will use $q) between your modules. For example you create it in the script.pl, but want it to be visible in My::HTML. First - you make $q global.

  script.pl:
  ----------------
  use vars qw($q);
  use CGI;
  use lib qw(.); 
  use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl
  $q = new CGI;
  
  My::HTML::printmyheader();
  ----------------

Note that we have imported $q from My::HTML. And the My::HTML which does the export of $q:

  My/HTML.pm
  ----------------
  package My::HTML;
  use strict;
  
  BEGIN {
    use Exporter ();
  
    @My::HTML::ISA         = qw(Exporter);
    @My::HTML::EXPORT      = qw();
    @My::HTML::EXPORT_OK   = qw($q);
  
  }
  
  use vars qw($q);
  
  sub printmyheader{
    # Whatever you want to do with $q... e.g.
    print $q->header();
  }
  1;
  -------------------

So the $q is being shared between the My::HTML package and the script.pl. It will work vice versa as well, if you create the object in the My::HTML but use it in the script.pl. You have a true sharing, since if you change $q in script.pl, it will be changed in My::HTML as well.

What if you need to share $q between more than 2 packages? For example you want My::Doc to share $q as well.

You leave the My::HTML untouched, modify the script.pl to include:

 use My::Doc qw($q);

And write the My::Doc exactly like My::HTML - of course that the content is different :).

One possible pitfall is when you want to use the My::Doc in both My::HTML and script.pl. Only if you add:

  use My::Doc qw($q);

Into a My::HTML, the $q will be shared. Otherwise My::Doc will not share the $q anymore. To make things clear here is the code:

  script.pl:
  ----------------
  use vars qw($q);
  use CGI;
  use lib qw(.); 
  use My::HTML qw($q); # My/HTML.pm in the same dir as script.pl
  use My::Doc  qw($q); # Ditto
  $q = new CGI;
  
  My::HTML::printmyheader();
  ----------------

  My/HTML.pm
  ----------------
  package My::HTML;
  use strict;
  
  BEGIN {
    use Exporter ();
  
    @My::HTML::ISA         = qw(Exporter);
    @My::HTML::EXPORT      = qw();
    @My::HTML::EXPORT_OK   = qw($q);
  
  }
  
  use vars     qw($q);
  use My::Doc  qw($q);
  
  sub printmyheader{
    # Whatever you want to do with $q... e.g.
    print $q->header();
  
    My::Doc::printtitle('Guide');
  }
  1;
  -------------------

  My/Doc.pm
  ----------------
  package My::Doc;
  use strict;
  
  BEGIN {
    use Exporter ();
  
    @My::Doc::ISA         = qw(Exporter);
    @My::Doc::EXPORT      = qw();
    @My::Doc::EXPORT_OK   = qw($q);
  
  }
  
  use vars qw($q);
  
  sub printtitle{
    my $title = shift || 'None';
    
    print $q->h1($title);
  }
  1;
  -------------------


Using aliasing perl feature to share global variables

As the title says you can import a variable into a script/module without using an Exporter.pm. I have found it useful to keep all the configuration variables in one module My::Config. But then I have to export all the variables in order to use them in other modules, which is bad for two reasons: polluting other packages' name spaces with extra tags which rise up the memory requirements, adding an overhead of keeping track of what variables should be exported from the configuration module and what imported for some particular package. I solve this problem by keeping all the variables in one hash %c and exporting only it. Here is an example of My::Config:

  package My::Config;
  use strict;
  use vars qw(%c);
  %c = (
    # All the configs go here
    scalar_var => 5,
  
    array_var  => [
                   foo,
                   bar,
                  ],
  
    hash_var   => {
                   foo => 'Foo',
                   bar => 'BARRR',
                  },
  );
  1;

Now in packages that want to use the configuration variables I have either to use the fully qualified names like $My::Config::test, which I dislike or import them as described in the previous section. But hey, since we have only one variable to handle, we can make things even simpler and save the loading of the Exporter.pm package. We will use aliasing perl feature for exporting and saving the keystrokes:

  package My::HTML;
  use strict;
  use lib qw(.);
    # Global Configuration now aliased to global %c
  use My::Config (); # My/Config.pm in the same dir as script.pl
  use vars qw(%c);
  *c = \%My::Config::c;
  
    # Now you can access the variables from the My::Config
  print $c{scalar_val};
  print $c{array_val}[0];
  print $c{hash_val}{foo};

Of course $c is global everywhere you use it as described above, and if you change it somewhere it will affect any other packages you have aliased $My::Config::c to.

Note that aliases work either with global or local() vars - you cannot write:

  my *c = \%My::Config::c;

Which is an error. But you can:

  local *c = \%My::Config::c;


The Scope of the Special Perl Variables

Special Perl variables like $| (buffering), $^T (time), $^W (warnings), $/ (input record separator), $\ (output record separator) and many more are all global variables. This means that you cannot localize them with my(). Only local() is permitted to do that. Since the child server doesn't usually exit, if in one of your scripts you modify a global variable it will be changed for the rest of the process' life and will affect all the scripts executed by the same process.

We will demonstrate the case on the input record separator variable. If you undefine this variable, a diamond operator will suck in the whole file at once if you have enough memory. Remembering this you should never write code like the example below.

  $/ = undef; 
  open IN, "file" ....
    # slurp it all into a variable
  $all_the_file = <IN>;

The proper way is to have a local() keyword before the special variable is being changed, like this:

  local $/ = undef; 
  open IN, "file" ....
    # slurp it all inside a variable
  $all_the_file = <IN>;

But there is a catch. local() will propagate the changed value to any of the code below it. The modified value will be in effect until the script terminates, unless it is changed again somewhere else in the script.

A cleaner approach is to enclose the whole of the code that is affected by the modified variable in a block, like this:

  {
    local $/ = undef; 
    open IN, "file" ....
      # slurp it all inside a variable
    $all_the_file = <IN>;
  }

That way when Perl leaves the block it restores the original value of the $/ variable, and you don't need to worry about its value anywhere else in your program.


Compiled Regular Expressions

When using a regular expression that contains an interpolated Perl variable, if it is known that the variable (or variables) will not vary during the execution of the program, a standard optimization technique consists of adding the /o modifier to the regexp pattern. This directs the compiler to build the internal table once, for the entire lifetime of the script, rather than every time the pattern is executed. Consider:

  my $pat = '^foo$'; # likely to be input from an HTML form field
  foreach( @list ) {
    print if /$pat/o;
  }

This is usually a big win in loops over lists, or when using grep() or map() operators.

In long-lived mod_perl scripts, however, this can pose a problem if the variable changes according to the invocation. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by the httpd child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is dependent on. Your script will appear broken.

There are two solutions to this problem:

The first -- is to use eval q//, to force the code to be evaluated each time. Just make sure that the eval block covers the entire loop of processing, and not just the pattern match itself.

The above code fragment would be rewritten as:

  my $pat = '^foo$';
  eval q{
    foreach( @list ) {
      print if /$pat/o;
    }
  }

Just saying:

  foreach( @list ) {
    eval q{ print if /$pat/o; };
  }

is going to be a horribly expensive proposition.

You can use this approach if you require more than one pattern match operator in a given section of code. If the section contains only one operator (be it an m// or s///), you can rely on the property of the null pattern, that reuses the last pattern seen. This leads to the second solution, which also eliminates the use of eval.

The above code fragment becomes:

  my $pat = '^foo$';
  "something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
  foreach( @list ) {
    print if //;
  }

The only gotcha is that the dummy match that boots the regular expression engine must absolutely, positively succeed, otherwise the pattern will not be cached, and the // will match everything. If you can't count on fixed text to ensure the match succeeds, you have two possibilities.

If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can use the dummy match:

  "$pat" =~ /\Q$pat\E/; # guaranteed if no meta-characters present

If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the non-search-able \377 character as follows:

  "\377" =~ /$pat|^[\377]$/; # guaranteed if meta-characters present

Another approach:

It depends on the complexity of the regexp you apply this technique to. One common usage where compiled regexp is usually more efficient is to ``match any one of a group of patterns'' over and over again.

Maybe with some helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book ``Mastering Regex''.

  #####################################################
  # Build_MatchMany_Function
  # -- Input:  list of patterns
  # -- Output: A code ref which matches its $_[0]
  #            against ANY of the patterns given in the
  #            "Input", efficiently.
  #
  sub Build_MatchMany_Function {
    my @R = @_;
    my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
    my $matchsub = eval "sub { $expr }";
    die "Failed in building regex @R: $@" if $@;
    $matchsub;
  }

Example usage:

  @some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
  $Known_Browser=Build_MatchMany_Function(@some_browsers);

  while (<ACCESS_LOG>) {
    # ...
    $browser = get_browser_field($_);
    if ( ! &$Known_Browser($browser) ) {
      print STDERR "Unknown Browser: $browser\n";
    }
    # ...
  }


Next month

Next month I'll cover a few other very important Perl topics you have to know for blissful mod_perl programming.