mod_perl Coding Guidelines - Part III

By: Stas Bekman


Title: mod_perl Coding Guidelines - Part III


Summary

After you have refreshed your Perl knowledge it's a time to talk about writing mod_perl friendly code and porting the existing scripts to run under mod_perl.

There is quite a lot of things to talk about, so in the following articles we will cover most of the details you need to know to have painless coding experience with mod_perl.

In this article we we will expose Apache::Registry secrets, discuss why scripts sometimes work and sometimes don't. We will also talk about script's namespace and behavior of @INC under mod_perl.


Exposing Apache::Registry secrets

Let's start with some simple code and see what can go wrong with it, detect bugs and debug them, discuss possible pitfalls and how to avoid them.

I will use a simple CGI script, that initializes a $counter to 0, and prints its value to the screen while incrementing it.

  counter.pl:
  ----------
  #!/usr/bin/perl -w
  use strict;
  
  print "Content-type: text/plain\r\n\r\n";
  
  my $counter = 0;
  
  for (1..5) {
    increment_counter();
  }
  
  sub increment_counter{
    $counter++;
    print "Counter is equal to $counter !\r\n";
  }

You would expect to see the output:

  Counter is equal to 1 !
  Counter is equal to 2 !
  Counter is equal to 3 !
  Counter is equal to 4 !
  Counter is equal to 5 !

And that's what you see when you execute this script the first time. But let's reload it a few times... See, suddenly after a few reloads the counter doesn't start its count from 1 any more. We continue to reload and see that it keeps on growing, but not steadily starting almost randomly at 10, 10, 10, 15, 20... Weird...

  Counter is equal to 6 !
  Counter is equal to 7 !
  Counter is equal to 8 !
  Counter is equal to 9 !
  Counter is equal to 10 !

We saw two anomalies in this very simple script: Unexpected increment of our counter over 5 and inconsistent growth over reloads. Let's investigate this script.


The First Mystery

First let's peek into the error_log file. Since we have enabled the warnings what we see is:

  Variable "$counter" will not stay shared 
  at /home/httpd/perl/conference/counter.pl line 13.

The Variable "$counter" will not stay shared warning is generated when the script contains a named nested subroutine (a not anonymous subroutine defined inside another subroutine) that refers to a lexically scoped variable defined outside this nested subroutine. This effect was explained in one of the previous articles.

Do you see a nested named subroutine in my script? I don't! What's going on? Maybe it's a bug? But wait, maybe the perl interpreter sees the script in a different way, maybe the code goes through some changes before it actually gets executed? The easiest way to check what's actually happening is to run the script with a debugger.

But since we must debug it when it's being executed by the webserver, a normal debugger wouldn't help, because the debugger has to be invoked from within the webserver. Luckily Doug MacEachern wrote the Apache::DB module and we will use it to debug my script. While Apache::DB allows you to debug the code interactively, we will do it non-interactively.

Modify the httpd.conf file in the following way:

  PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1 frame=2"
  PerlModule Apache::DB
  <Location /perl>
    PerlFixupHandler Apache::DB
    SetHandler perl-script
    PerlHandler Apache::Registry
    Options ExecCGI
    PerlSendHeader On
  </Location>

Restart the server and issue a request to counter.pl as before. On the surface nothing has changed--we still see the correct output as before, but two things happened in the background:

First, the file /tmp/db.out was written, with a complete trace of the code that was executed.

Second, error_log now contains the real code that was actually executed. This is produced as a side effect of reporting the Variable "$counter" will not stay shared at... warning that we saw earlier.

Here is the code that was actually executed:

  package Apache::ROOT::perl::conference::counter_2epl;
  use Apache qw(exit);
  sub handler {
    BEGIN {
      $^W = 1;
    };
    $^W = 1;
    
    use strict;
    
    print "Content-type: text/plain\r\n\r\n";
    
    my $counter = 0;
    
    for (1..5) {
      increment_counter();
    }
    
    sub increment_counter{
      $counter++;
      print "Counter is equal to $counter !\r\n";
    }
  }

The original code wasn't idented. I've idented it for you to stress that the code was wrapped inside the handler() subroutine.

What do we learn from this?

First, that every cgi script is cached under a package whose name is formed from the Apache::ROOT:: prefix and the relative part of the script's URL (perl::conference::counter_2epl) by replacing all occurrences of / with ::. That's how mod_perl knows what script should be fetched from the cache--each script is just a package with a single subroutine named handler.

Second, you see now why the diagnostics pragma talked about an inner (nested) subroutine--increment_counter is actually a nested subroutine.

With mod_perl, each subroutine in every Apache::Registry script is nested inside the handler subroutine.

It's important to understand that the inner subroutine effect happens only with code that Apache::Registry wraps with a declaration of the handler subroutine. If you put your code into a library or module, which the main script require()'s or use()'s, this effect doesn't occur.

For example if we put the subroutine increment_counter() into mylib.pl, save it in the same directory as the main script and require() it, there will be no problem at all. (Don't forget the 1; at the end of the library or the require() might fail.)

  mylib.pl:
  ---------
  sub increment_counter{
    $counter++;
    print "Counter is equal to $counter !\r\n";
  }
  1;

  counter.pl:
  ----------
  #!/usr/bin/perl -w
  
  use strict;
  require "./mylib.pl";
  
  print "Content-type: text/plain\r\n\r\n";
  
  my $counter = 0;
  
  for (1..5) {
    increment_counter();
  }

Personally, unless the script is very short, I tend to write all the code in external libraries, and to have only a few lines in the main script. Generally the main script simply calls the main function of my library. Usually I call it init(). I don't worry about nested subroutine effects anymore (unless I create them myself :).

Other possible workarounds for this problem were discussed in the previous articles.

You shouldn't be intimidated by this issue at all, since Perl is your friend. Just keep the warnings mode On and Perl will gladly tell you whenever you have this effect, by saying:

  Variable "$counter" will not stay shared at ...[snipped]

Just don't forget to check your error_log file, before going into production!

By the way, the above example was pretty boring. In my first days of using mod_perl, I wrote a simple user registration program. I'll give a very simple representation of this program.

  use CGI;
  $q = new CGI;
  my $name = $q->param('name');
  print_respond();
  
  sub print_respond{
    print "Content-type: text/plain\r\n\r\n";
    print "Thank you, $name!";
  }

My boss and I checked the program at the development server and it worked OK. So we decided to put it in production. Everything was OK, but my boss decided to keep on checking by submitting variations of his profile. Imagine the surprise when after submitting his name (let's say ``The Boss'' :), he saw the response ``Thank you, Stas Bekman!''.

What happened is that I tried the production system as well. I was new to mod_perl stuff, and was so excited with the speed improvement that I didn't notice the nested subroutine problem. It hit me. At first I thought that maybe Apache had started to confuse connections, returning responses from other people's requests. I was wrong of course.

Why didn't we notice this when we were trying the software on our development server? Keep reading and you will understand why.


The Second Mystery

Let's return to our original example and proceed with the second mystery we noticed. Why did we see inconsistent results over numerous reloads?

That's very simple. Every time a server gets a request to process, it hands it over one of the children, generally in a round robin fashion. So if you have 10 httpd children alive, the first 10 reloads might seem to be correct because the effect we've just talked about starts to appear from the second re-invocation. Subsequent reloads then return unexpected results.

Moreover, requests can appear at random and children don't always run the same scripts. At any given moment one of the children could have served the same script more times than any other, and another may never have run it. That's why we saw the strange behavior.

Now you see why we didn't notice the problem with the user registration system in the example. First, we didn't look at the error_log. (As a matter of fact we did, but there were so many warnings in there that we couldn't tell what were the important ones and what were not). Second, we had too many server children running to notice the problem.

A workaround is to run the server as a single process. You achieve this by invoking the server with the -X parameter (httpd -X). Since there are no other servers (children) running, you will see the problem on the second reload.

But before that, let the error_log help you detect most of the possible errors--most of the warnings can become errors, so you should make sure to check every warning that is detected by perl, and probably you should write the code in such a way that no warnings appear in the error_log. If your error_log file is filled up with hundreds of lines on every script invocation, you will have difficulty noticing and locating real problems.

Of course none of the warnings will be reported if the warning mechanism is not turned On.


Sometimes it Works, Sometimes it Doesn't

When you start running your scripts under mod_perl, you might find yourself in a situation where a script seems to work, but sometimes it screws up. And the more it runs without a restart, the more it screws up. Often the problem is easily detectable and solvable. You have to test your script under a server running in single process mode (httpd -X).

Generally the problem you have is of using global variables. Because global variables don't change from one script invocation to another unless you change them, you can find your scripts do strange things.

Let's look at three real world examples:


An Easy Break-in

The first example is amazing--Web Services. Imagine that you enter some site where you have an account, perhaps a free email account. Now you want to see other users' mail.

You type in a username you want to peek at and a dummy password and try to enter the account. On some services this will work!!!

You say, why in the world does this happen? The answer is simple: Global Variables. You have entered the account of someone who happened to be served by the same server child as you. Because of sloppy programming, a global variable was not reset at the beginning of the program and voila, you can easily peek into others' email! Here is an example of sloppy code:

  use vars ($authenticated);
  my $q = new CGI;
  my $username = $q->param('username');
  my $passwd   = $q->param('passwd');
  authenticate($username,$passwd);
    # failed, break out
  unless ($authenticated){
    print "Wrong passwd";
    exit;
  }
    # user is OK, fetch user's data
  show_user($username);
  
  sub authenticate{
    my ($username,$passwd) = @_;
        # some checking
    $authenticated = 1 if SOME_USER_PASSWD_CHECK_IS_OK;
  }

Do you see the catch? With the code above, I can type in any valid username and any dummy passwd and enter that user's account, if someone has successfully entered his account before me using the same child process! Since $authenticated is global--if it becomes 1 once, it'll stay 1 for the remainder of the child's life!!! The solution is trivial--reset $authenticated to 0 at the beginning of the program.

A cleaner solution of course is not to rely on global variables, but rely on the return value from the function.

  my $q = new CGI;
  my $username = $q->param('username');
  my $passwd   = $q->param('passwd');
  my $authenticated = authenticate($username,$passwd);
    # failed, break out
  unless ($authenticated){
    print "Wrong passwd";
    exit;
  }
    # user is OK, fetch user's data
  show_user($username);
  
  sub authenticate{
    my ($username,$passwd) = @_;
        # some checking
    return (SOME_USER_PASSWD_CHECK_IS_OK) ? 1 : 0;
  }

Of course this example is trivial--but believe me it happens!


Thinking mod_cgi

Just another little one liner that can spoil your day, assuming you forgot to reset the $allowed variable. It works perfectly OK in plain mod_cgi:

  $allowed = 1 if $username eq 'admin';

But using mod_perl, and if your system administrator with superuser access rights has previously used the system, anybody who is lucky enough to be served later by the same child which served your administrator will happen to gain the same rights.

The obvious fix is:

  $allowed = $username eq 'admin' ? 1 : 0;


Regular Expression Memory

Another good example is usage of the /o regular expression modifier, which compiles a regular expression once, on its first execution, and never compiles it again. This problem can be difficult to detect, as after restarting the server each request you make will be served by a different child process, and thus the regex pattern for that child will be compiled afresh. Only when you make a request that happens to be served by a child which has already cached the regex will you see the problem. Generally you miss that. When you press reload, you see that it works (with a new, fresh child). Eventually it doesn't, because you get a child that has already cached the regex and won't recompile because of the /o modifier.

An example of such a case would be:

  my $pat = $q->param("keyword");
  foreach( @list ) {
    print if /$pat/o;
  }

To make sure you don't miss these bugs always test your CGI in single process mode.

We have talked about this issue in details in one of the previous articles.


Script's name space

Scripts under Apache::Registry do not run in package main, they run in a unique name space based on the requested URI. For example, if your URI is /perl/test.pl the package will be called Apache::ROOT::perl::test_2epl.


@INC and mod_perl

The basic Perl @INC behaviour was explained in the previous articles.

When running under mod_perl, once the server is up @INC is frozen and cannot be updated. The only opportunity to temporarily modify @INC is while the script or the module are loaded and compiled for the first time. After that its value is reset to the original one. The only way to change @INC permanently is to modify it at Apache startup.

Two ways to alter @INC at server startup:


Next month

Next month we will talk about various techniques of files and modules reloading, both for development and production environments.