It has been said by even the world's greatest programmers that developing software is not easy. Of course, the challenge is largely responsible for making software development fun and interesting. Still, few, if any, would pass on an opportunity to ease the development cycle. This article will present you with such an opportunity, to ease one of the more frustrating areas of software development; debugging.
No matter what stage of development a piece of software may be in, chances are high it will not always behave as it is expected to. Such conditions are classified as a "bug". In general, a bug is simply a piece of data who's value at a particular point triggers abnormal behavior of an application. However, something so "simple" is often most difficult to resolve, such as "finding a needle in a haystack".
Many bugs are experienced by the programmer during development or by users, who normally report the bug through various channels, such as email, online reporting tools, telephone and yelling across a room in the office. The report data will hopefully lead a developer in the right direction to finding and squashing the bug. In any case, some level of "debugging" is most always required to solve the problem at hand.
The universal debugging tool across nearly all platforms and programming languages is the printf() or equivalent output function, which can send data to the console, a file, application window and so on. With an idea of where and when the bug is triggered, a developer can insert printf() statements in the source code to examine the value of data at certain points of execution. Many programs and libraries provide handy wrappers around an output function to dump complex data structures, include the source file name and line information, a date stamp, and anything else that may help lead to the cause of strange behavior. However, it is rather difficult to anticipate all possible directions a program might take and what data to suspect of causing trouble. In addition, inline debugging code tends to add bloat and degrade performance of an application. Although, most applications offer inline debugging as a compile time option to avoid these hits. In any case, this information tends to only be useful to the programmer who added the trace statement in first place.
The most effective way to track down a bug is running the program with an interactive debugger. The majority of programming languages have such a tool available that allows one to see what is happening inside an application while it is running. Basic features of an interactive debugger allow you to:
Most C and C++ programmers are familiar with the interactive GNU debugger (gdb). GDB is a standalone program that requires your code to be compiled with debugging symbols to be useful. While gdb can be used to debug the perl interpreter program itself, it cannot be used to debug your own Perl programs. Not to worry, Perl provides it's own interactive debugger, called perldb. Giving control of your Perl program to the interactive debugger is simply a matter of specifying the -d command line switch. When this switch is used, Perl will insert debugging hooks into the program syntax tree, but leaves the actual job of debugging to a Perl module outside of the perl binary program itself.
During the three years that mod_perl has been available to the public, only a small handful of people have ever asked how to use the Perl interactive debugger with mod_perl. In fact, it was not until recent weeks the interactive debugger could actually be used in the mod_perl environment and nobody had really complained about this in the past. This leads me to believe very few mod_perl users have taken advantage of this valuable Perl feature in the past. Myself included, it was not until recent weeks that I became dependent on the debugger when it saved my life.
I was pulled in to help a project meet a tight release deadline. A project that had been in development for 18+ months, already rolled umpteen releases, touched by more than a dozen different developers with over 100 custom Perl modules, close to 100 scripts and countless CPAN modules, adding up to well over 100,000 lines of source code. A project whose source code was brand new to me. While I can understand Perl code quite well by looking just looking at it, no mere mortal can begin to understand what will actually happen in such a large application until the code is running. With so many different files and lines of code, I didn't know where to start adding trusty printf() statements to see what was happening inside. So, I invested some time in making the Perl interactive debugger work in the mod_perl environment. This new module, Apache::DB allowed me to jump right into the action and add new features without digging around in the application source code or adding a single trace statement anywhere in the code. Not only did it help me, it helped other developers on the team a great deal, who have been intimate with the source code for quite some time.
This article will start off by introducing a few basic concepts and commands of the Perl interactive debugger. These basic warm up examples are all run from the command line, outside of the mod_perl, but are all still relevant once we do go inside Apache.
You may want to keep the perldebug manpage handy for reference during this article and for future debugging sessions on your own.
The interactive debugger will attach to the current terminal and present you with a prompt just before the first program statement is executed. For example:
% perl -d -le 'print "hello world"' Loading DB routines from perl5db.pl version 1.0402 Emacs support available. Enter h or `h h' for help. main::(-e:1): print "hello world" DB<1>The source line shown is that which Perl is about to execute, the next command will cause this line to be executed and stop again right before the next line:
main::(-e:1): print "hello world" DB<1> n hello world Debugged program terminated. Use q to quit or R to restart, use O inhibit_exit to avoid stopping after program termination, h q, h R or h O to get additional info. DB<1>In this case, our example code is only one line long, so we are done interacting after the first line of code is executed. Let's try again with a long example which is the following script:
my $word = 'hello'; my @to = qw(world); print "$word @to\n";Save the script in a file named hello.pl and run with the -d switch:
% perl -d hello.pl Loading DB routines from perl5db.pl version 1.0402 Emacs support available. Enter h or `h h' for help. main::(hello.pl:1): my $word = 'hello'; DB<1> n main::(hello.pl:2): my @to = qw(world); DB<1>At this point, the first line of code has been executed and the variable $word has been assigned to the value hello. We can check that assumption by using the print command:
main::(hello.pl:2): my @to = qw(world); DB<1> p $word helloThe print command works just like the Perl built in print() function, but adds a trailing newline and outputs to the $DB::OUT filehandle, which is normally opened to the terminal where perl was launched from. Let's carry on:
DB<2> n main::(hello.pl:4): print "$word @to\n"; DB<2> p @to world DB<3> n hello world Debugged program terminated. Use q to quit or R to restart, use O inhibit_exit to avoid stopping after program termination, h q, h R or h O to get additional info. DB<3>You should notice by now, there is some valuable information to the left of each executable statement:
main::(hello.pl:4): print "$word @to\n"; DB<2>First is the current package name, in this case main::. Next is the current filename and statement line number, hello.pl and 4 in the example above. The number presented at the prompt is the command number which can be used to recall commands in session history. For example, !1 would repeat the first command:
% perl -d -e0 Loading DB routines from perl5db.pl version 1.0402 Emacs support available. Enter h or `h h' for help. main::(-e:1): 0 DB<1> p $] 5.00556 DB<2> !1 p $]5.00556 DB<3>Things start to get more interesting as the code does. In the example script below we've increased the number of source files and packages by including the standard Symbol module, along with invoking its gensym() function:
use Symbol (); my $sym = Symbol::gensym(); print "$sym\n";Let's give it a run:
% perl -d hello.pl Loading DB routines from perl5db.pl version 1.0402 Emacs support available. Enter h or `h h' for help. main::(hello.pl:3): my $sym = Symbol::gensym(); DB<1> n main::(hello.pl:5): print "$sym\n"; DB<1> n GLOB(0x829f990)First, notice the debugger did not stop at the first line of the file, this is because use ... is a compile-time statement, not a run-time statement. Also notice, there was more work going on than the debugger revealed. The next command does not enter subroutine calls, so we introduce the step command which does:
main::(hello.pl:3): my $sym = Symbol::gensym(); DB<1> s Symbol::gensym(/usr/local/devperl/lib/5.00556/Symbol.pm:86): 86: my $name = "GEN" . $genseq++; DB<1>Notice the source line information has changed to the Symbol::gensym package and the Symbol.pm file. We can carry on by hitting the return key at each prompt, which causes the debugger to repeat the last step or next command. The debugger will return out of the subroutine and back to our main program:
DB<1>
Symbol::gensym(/usr/local/devperl/lib/5.00556/Symbol.pm:87):
87: my $ref = \*{$genpkg . $name};
DB<1>
Symbol::gensym(/usr/local/devperl/lib/5.00556/Symbol.pm:88):
88: delete $$genpkg{$name};
DB<1>
Symbol::gensym(/usr/local/devperl/lib/5.00556/Symbol.pm:89):
89: $ref;
DB<1>
main::(hello.pl:5): print "$sym\n";
DB<1>
GLOB(0x81ab870)
Our line-by-line debugging approach has served us well for this small
program, but imagine the time it takes to step through a large
application at the same pace. There are several ways to speed up a
debugging session, one of which is known as "setting a breakpoint".
The breakpoint command can be used for instructing the debugger
to stop at a named subroutine or at line of a given file. In this
example session, we will set a breakpoint at the
Symbol::gensym subroutine at the first prompt, telling the
debugger to stop at the first line of this routine when it is called.
Rather than move along with next or step we enter the
continue command which tells the debugger to execute each line
without stopping until it reaches a breakpoint:
% perl -d hello.pl Loading DB routines from perl5db.pl version 1.0402 Emacs support available. Enter h or `h h' for help. main::(hello.pl:3): my $sym = Symbol::gensym(); DB<1> b Symbol::gensym DB<2> c Symbol::gensym(/usr/local/devperl/lib/5.00556/Symbol.pm:86): 86: my $name = "GEN" . $genseq++;Now let's pretend we are debugging a large application where Symbol::gensym might be called in various places. When the subroutine breakpoint is reached, the debugger does not reveal where it was called from by default. One way to find out this information is with the Trace command:
DB<2> T $ = Symbol::gensym() called from file `hello.pl' line 3 DB<2>In this example, the call stack is only one level deep, so only that line is printed, we'll look at an example with a deeper stack later. The left-most character reveals the context in which the subroutine was called. In our case, $ represents a scalar context, in others you may see @ which represent a list context or . which represents a void context.
Below we've made our hello.pl example a little more complex. First, a My::World package declaration at the top so we are no longer working in the main package. Next, we've added a subroutine named do_work() which invokes Symbol::gensym, along with the Symbol::qualify function and returns a hash reference of the results. The do_work() routine is invoked inside a for loop which will be run twice:
package My::World;
use Symbol ();
for (1,2) {
do_work("now");
}
sub do_work {
my($var) = @_;
return undef unless $var;
my $sym = Symbol::gensym();
my $qvar = Symbol::qualify($var);
my $retval = {
'sym' => $sym,
'var' => $qvar,
};
return $retval;
}
We'll start by setting our breakpoint again and continue until it is
reached:
% perl -d hello.pl
Loading DB routines from perl5db.pl version 1.0402
Emacs support available.
Enter h or `h h' for help.
My::World::(hello.pl:5): for (1,2) {
DB<1> b Symbol::qualify
DB<2> c
Symbol::qualify(/usr/local/devperl/lib/5.00556/Symbol.pm:95):
95: my ($name) = @_;
Oops! We wanted to break at Symbol::gensym, not
Symbol::qualify. The List command will display the
current breakpoints:
DB<2> L /usr/local/devperl/lib/5.00556/Symbol.pm: 95: my ($name) = @_; break if (1)The filename and line number of the breakpoint are displayed just before the source line itself. After that is the condition on which to stop, in our case as the constant value 1 indicates, we will always stop at this breakpoint. Later on you'll see how to specify a certain condition. The delete command is used to remove certain breakpoints by specifying the line number of the breakpoint:
DB<2> d 95The Delete command (with a capital `D'), could have also been used in our case to remove all currently installed breakpoints.
Next in our example session, a breakpoint is set a Symbol::gensym and we continue. When the breakpoint is reached we ask for a Trace, where this time we see there is another level in the call stack:
DB<3> b Symbol::gensym
DB<4> c
Symbol::gensym(/usr/local/devperl/lib/5.00556/Symbol.pm:86):
86: my $name = "GEN" . $genseq++;
DB<4> T
$ = Symbol::gensym() called from file `hello.pl' line 12
. = My::World::do_work('now') called from file `hello.pl' line 6
DB<4>
As you can also see, the stack trace prints the values which are
passed into the subroutine. Ah, and perhaps we've found our first
bug, as we can see do_work() was called in a void context, so
the return value was lost into thin air. Let's change the for loop
logic to check the return value of do_work():
for (1,2) {
my $stuff = do_work("now");
if ($stuff) {
print "work is done\n";
}
}
In this session we will set a breakpoint at line 7 of
hello.pl where we check the return value of do_work():
% perl -d hello.pl
Loading DB routines from perl5db.pl version 1.0402
Emacs support available.
Enter h or `h h' for help.
My::World::(hello.pl:5): for (1,2) {
DB<1> b 7
DB<2> c
My::World::(hello.pl:7): if ($stuff) {
DB<2>
Our program is still small, but it is getting more difficult to
understand the context of just one line of code, the window
command will list the first few lines of code that surround the
current line:
DB<2> w
4
5: for (1,2) {
6: my $stuff = do_work("now");
7==>b if ($stuff) {
8: print "work is done\n";
9 }
10 }
11
12 sub do_work {
13: my($var) = @_;
The arrow points to the line which is about to be executed and also
contains a `b' indicating we have set a breakpoint at this
line. The breakable lines of code include a `:' just after the
line number.
Now, let's take a look at the value of the $stuff variable with the trustly old print command:
DB<2> p $stuff HASH(0x82b8924)Shucks, that's not very useful information. Remember, the print command works just as the builtin print() functions does. The x command evaluates a given expression and prints the results in a "pretty" fashion:
DB<3> x $stuff
0 HASH(0x82b8924)
'sym' => GLOB(0x82a5350)
-> *Symbol::GEN0
'var' => 'My::World::now'
There, things seem to be okay, lets double check by calling
do_work() with a different value and print the results:
DB<4> x do_work('later')
0 HASH(0x82f814c)
'sym' => GLOB(0x82f9e78)
-> *Symbol::GEN1
'var' => 'My::World::later'
We can see the symbol was incremented from GEN0 to GEN1
and the variable later was qualified, as expected.
Now let's change the test program a little to iterate over a list of arguments held in @args and print a slightly different message:
package My::World;
use Symbol ();
my @args = qw(now later);
for my $arg (@args) {
my $stuff = do_work($arg);
if ($stuff) {
print "do your work $arg\n";
}
}
sub do_work {
my($var) = @_;
return undef unless $var;
my $sym = Symbol::gensym();
my $qvar = Symbol::qualify($var);
my $retval = {
'sym' => $sym,
'var' => $qvar,
};
return $retval;
}
There are only two arguments in the list, so stopping to look at each
one isn't too time consuming, but consider the debugging pace with a
large list of 100 or so entries. It is possible to customize
breakpoints by specifying a condition. Each time a breakpoint
is reached, the condition is evaluated, stopping only if the condition
is true. In the session below the window command shows
breakable lines and we set a breakpoint at line 7 with the
condition $arg eq 'later'. As we continue, the breakpoint is
skipped when $arg has the value of now and stops when it
has the value of later:
% perl -d hello.pl
Loading DB routines from perl5db.pl version 1.0402
Emacs support available.
Enter h or `h h' for help.
My::World::(hello.pl:5): my @args = qw(now later);
DB<1> w
2
3: use Symbol ();
4
5==> my @args = qw(now later);
6: for my $arg (@args) {
7: my $stuff = do_work($arg);
8: if ($stuff) {
9: print "do your work $arg\n";
10 }
11 }
DB<1> b 7 $arg eq 'later'
DB<2> c
do your work now
My::World::(hello.pl:7): my $stuff = do_work($arg);
DB<2> n
My::World::(hello.pl:8): if ($stuff) {
DB<2> x $stuff
0 HASH(0x81f6d54)
'sym' => GLOB(0x81f6cf4)
-> *Symbol::GEN1
'var' => 'My::World::later'
DB<3> c
do your work later
Debugged program terminated. Use q to quit or R to restart,
use O inhibit_exit to avoid stopping after program termination,
h q, h R or h O to get additional info.
DB<3>
There are plenty more tricks left to pull from the perldb bag, but you should understand enough about the debugger to try them on your own with the perldebug manpage by your side. Now we'll turn to looking at how the interactive debugger is used in an Apache/mod_perl environment. The Apache::DB module available from CPAN provides a wrapper around perldb for debugging Perl code running under mod_perl.
The server must be run in non-forking mode to use the interactive debugger, this mode is turned on by passing the -X flag to apache. It is convenient to use an IfDefine section around the Apache::DB configuration, the example below does this using the name PERLDB. With this setup, debugging is only turned on by passing -D PERLDB to apache. This section should be at the top of your main configuration file, before any Perl code is pulled in, so debugging symbols will be inserted into the syntax tree, triggered by the call to Apache::DB->init. The Apache::DB::handler can be configured using any of the Perl*Handler directives, in this case we use a PerlFixupHandler so handlers in the response phase will bring up the debugger prompt:
<IfDefine PERLDB>
<Perl>
use Apache::DB ();
Apache::DB->init;
</Perl>
<Location />
PerlFixupHandler Apache::DB
</Location>
</IfDefine>
In our first example, we will debug the standard Apache::Status
module, which is configured like so:
PerlModule Apache::Status
<Location /perl-status>
PerlHandler Apache::Status
SetHandler perl-script
</Location>
When the server is started with the debugging flag, a notice will be
printed to the console:
% ./bin/httpd -X -DPERLDB [notice] Apache::DB initialized in child 880The debugger prompt will not be available until the first request is made, in this case to http://localhost/perl-status. Once we are at the prompt, all the standard debugging commands are available. First we run the window for some context of the code being debugged, move to the next statement after $r has been assigned to and print the request uri. If no breakpoints are set, the continue command will give control back to Apache and the request will finish with the Apache::Status main menu showing up in the browser window:
Loading DB routines from perl5db.pl version 1.0402
Emacs support available.
Enter h or `h h' for help.
Apache::Status::handler(/usr/local/devperl/lib/site_perl/
5.00557/i686-linux/Apache/Status.pm:55):
55: my($r) = @_;
DB<1> w
52 }
53
54 sub handler {
55==> my($r) = @_;
56: Apache->request($r); #for Apache::CGI
57: my $qs = $r->args || "";
58: my $sub = "status_$qs";
59: no strict 'refs';
60
61: if($qs =~ /^noh_/) {
DB<1> n
Apache::Status::handler(/usr/local/devperl/lib/site_perl/
5.00557/i686-linux/Apache/Status.pm:56):
56: Apache->request($r); #for Apache::CGI
DB<1> p $r->uri
/perl-status
DB<2> c
All the techniques from our warm-up session can be applied, but we'll
leave that exercise for you to play with.
Debugging Apache::Registry scripts is somewhat different, because the handler routine does quite a bit of work before it reaches your script. In this example, we make a request for /perl/test.pl, which consists of this code:
use strict;
my $r = shift;
$r->send_http_header('text/plain');
print "hello world";
When a request is issued, the debugger stops at line 24
of Apache/Registry.pm. We set a breakpoint at line 135,
which is the line that actually calls the script wrapper subroutine.
The continue command will bring us to that line, where we can
step into the script handler:
Apache::Registry::handler(/usr/local/devperl/lib/site_perl/
5.00557/i686-linux/Apache/Registry.pm:24):
24: my $r = shift;
DB<1> b 135
DB<2> c
Apache::Registry::handler(/usr/local/devperl/lib/site_perl/
5.00557/i686-linux/Apache/Registry.pm:135):
135: eval { &{$cv}($r, @_) } if $r->seqno;
DB<2> s
Apache::ROOT::perl::test_2epl::handler((eval 61):3):
3: my $r = shift;
Notice the funny package name, that's generated from the uri of the
request for namespace protection. The filename is not displayed,
since the code was compiled via eval(), but the print
command can be used to show you $r->filename:
DB<2> n
Apache::ROOT::perl::test_2epl::handler((eval 61):4):
4: $r->send_http_header('text/plain');
DB<3> p $r->filename
/home/dougm/httpd/perl/test.pl
The line number might seem off too, but the window command will
give you a better idea where you are:
DB<4> w
1: package Apache::ROOT::perl::test_2epl
;use Apache qw(exit);sub handler {use strict;
2
3: my $r = shift;
4==> $r->send_http_header('text/plain');
5
6: print "hello world";
7
8 }
9 ;
The code from the test.pl file is between lines 2 and 7,
the rest is the Apache::Registry magic to cache your code
inside a subroutine.
It will always take some practice and patience when putting together debugging strategies that make effective use of the interactive debugger for various situations. Once you do have a good strategy in mind, bug squashing can actually be quite a bit of fun!