From Perl4 to Perl5
By: Mark-Jason Dominus
What's That Mean?
Hi. Today we're going to see a line of Perl code that looks like it
should work, but doesn't. Here it is:
print "From: mjd@plover.com\n";
What could possibly be wrong with this?
Of course, since something is wrong, Perl will tell you:
In string, @plover now must be written as \@plover at ...
Or, in older versions of Perl (5.003 and earlier)
Literal @plover now requires backslash at ...
This can be a frustrating error message. It's so clear, and the
clarity is annoying: Since Perl knows that the backslash is missing,
why doesn't it just put it in for you? And why is it required anyway?
And how come Perl doesn't deliver that message consistently?
Sometimes you get it, sometimes you don't.
The basic problem here is easy to understand: Perl is trying to decide
whether `@plover' should be taken literally or whether it should try
to insert the value of the array `@plover'. But on top of this simple
problem is piled layer upon layer of historical complication. To
unravel the history, let's take a trip in the Wayback Machine, back to
the very Dawn of Perl Itself... Sherman, set the dials for 1987!
In Perl 1 (and later, Perl 2) the situation was simple: Arrays didn't
interpolate into double-quoted strings. The ambiguity that gives
Perl problems in 1999 didn't exist: When Perl saw
print "From: mjd@plover.com\n";
it knew you meant to print out `@plover' literally, and not to look
for an array `@plover'. If you really wanted to print out the
elements of some array, say `@array', you'd have had to do something
like this:
print "@array contains: [", (join " ", @array), "].\n";
Perl 1 was surprisingly limited by modern standards: $a[1]
wouldn't interpolate either; if @a contained (4, 5, 6) and $a
contained `ouch', then
print "$a[1]"
would print "ouch[1]".
Now let's jump forward to 1989, and Perl 3.
Perl 1 was recognizably Perl; you couldn't mistake it for anything
else. But Perl 3 is the first version that really _feels_ like Perl.
Perl 3 introduced packages, sockets, tied hashes, and a lot of other
stuff, including the ability to interpolate arrays and array values
into double-quoted strings.
For the first time, Perl had to deal with the possible ambiguity of
print "From: mjd@plover.com\n";
and decide whether you wanted to interpolate the @plover array or not.
The example here isn't so good any more, because it's obvious to a
human that the `@plover' should not be interpolated. But suppose it
looked like this instead:
print "Three kinds of plovers are [@plover]\n";
To Perl, this looks just like the first example, so can't expect Perl
to read our minds and know when we want the arrays interpolated.
We'll only see the first example again, but even so, please remember
that the decision about whether or not to interpolate can go either
way.
With the addition of the new array interpolation feature to Perl 3, of
course there was a compatibility problem: There were now pre-existing
programs written for Perl 1 and Perl 2 that used @ signs in strings,
never dreaming that Perl would be trying to decide whether or not to
do array interpolation. Perl couldn't simply interpolate every
possible array, because that would have changed the meaning of a Perl
2 program that contained the `print' line above. Instead Perl 3
needed to use a rule to decide when to interpolate and when not, and
the rule had to be convenient for Perl 3 programmers while still
respecting code written for Perl 1 and Perl2.
The rule that was chosen was this: By the time the string in
print "From: mjd@plover.com\n";
is evaluated, Perl knows whether or not you actually used the array
`@plover' in your program. If you did, Perl will interpolate it into
the string. But otherwise, if you never used `@plover' anywhere, Perl
will assume that the `@' should be taken literally.
What if you wanted to have an array named `@plover' and still use
`@plover' in a string without interpolation? The solution is
familiar: Put a \ before the @:
print "From: mjd\@plover.com\n";
This never interpolates, whether or not you have an @plover array.
These rules persisted for a long time, through Perl 3 and Perl 4,
until Perl 5 came out in 1994.
In Perl 5 there was a subtle change in the way strings worked.
As you probably know, Perl is a `demicompiler', which means that it
runs in two phases. The first phase is a compilation phase, in which
is reads and parses your program and translates it into internal data
structus that explain how to execute it. When the program is
completely compiled, Perl enters the second phase, the `run' phase,
in which it executes your program. Because of this, it makes sense to
distinguish between the things that Perl does at compile time and the
things it does at run time.
Perl 3 figures out how to interpolate strings at run time: It would
get to the part of your program that constructed the string, and then
it would look through the string for things to interpolate. This
means that in
for (1 .. 1_000_000) { print "mjd@plover.com\n" }
Perl 3 will look through the string for things that look like arrays,
see `@plover', and decide (not) to interpolate, over and over, one
million times. Clearly this is horribly inefficient because the
string never changes; it's better to figure out how to parse the
string all at once, at compile time, and figure out how to do the
interpolation then, and not to worry about it again.
But there's a problem with that. Perl 3 would decide whether or not
to interpolate `@plover' based on whether or not you had used the
`@plover' array somewhere else in the program. It could do this
because it was making the decision at run time, after it had already
seen and compiled the entire program. But if you want to make the
decision earlier, at compile time, you run into difficulties: If the
Perl 5 compiler sees
print "From: mjd@plover.com\n";
on line 3, how can it know in advance whether or not the array
`@plover' will be mentioned on line 997 when it hasn't read that far
yet? Obviously it can't. You might think it could solve the problem
by reading ahead, but it can't do that either, because `@plover' might
not be mentioned in the file at all, but rather in some other file
that is loaded in by `do' or `require' much later on, when the program
is actually running. So the old rule is unworkable; the information
about whether or not `@plover' is used somewhere simply isn't
available at compile time
Since the rule had to change anyway, the authors of Perl decided to
make it simpler: `@plover' would *always* be interpolated in a
double-quoted string, unless there was a backslash before it.
Unfortunately, this simple rule was a substantial change from Perl 3
and Perl 4, and a complete change from Perl 1 and Perl 2. It couldn't
be implemented right away, because doing so would break thousands of
old programs. For example, the line
print "From: mjd@plover.com\n";
which would print everything literally in every version of Perl up
through Perl 4, would instead print
From: mjd.com
with the new rule, because the nonexistent empty array `@plover' would
be interpolated. Any program that depended on the old behavior sould
change its output, and worse, the change would be *silent*, which
means that it would be undiagnosed; you wouldn't know anything was
wrong until you suddently started getting mysteriously broken output
from a program that used to work.
Obviously, silently breaking thousands of old programs that had worked
for years and years was not a viable option. But they had to be
broken, because the old rule just wouldn't work any more. So the only
option was to break them audibly instead of silently.
Perl 5 will *always* interpolate arrays in double-quoted strings,
unless they're preceded by a backslash. But it has to worry that
perhaps it's compiling an old program from Perl 4 or earlier that
didn't want the interpolation to occur. When it sees something like
print "From: mjd@plover.com\n";
it has to make a decision about whether interpolation is safe or not.
If it has seen you use `@plover' already in the part of the program
that
it has already compiled, it can safely assume that `@plover' should
ber interpolated, because that's what Perl 3 would have done; Perl 5
and Perl 3 behaviours are the same in that case. But if it hasn't
seen you mention `@plover' earlier, it can't be sure, because Perl 3
would have chosen to interpolate or not based on whether `@plover'
appeared later, and it doesn't know yet what Perl 3 would have seen
later on. So rather than
finish the compilation and leave you with a program that might produce
the wrong output, it gives up and says
In string, @plover now must be written as \@plover at ...
and refuses to run your program until you've cleared up the
ambiguity. If you don't want any interpolation, you have to insert
the backslash. If you *do* want interpolation, you have to find some
way to warn the compiler that you're planning to use the array
`@plover' before you actually do use it. All you have to do is
mention it somehow. Saying this:
$plover[17] = 'Semipalmated'
will do, and so will this
@plover = ();
If you don't have a real reason to mention the array in advance, you
can `declare' it at the top of your program by saying this:
use vars '@plover';
Any of these will warn the compiler that `@plover' is a real array, so
it knows that interpolating it into strings is safe, because any
version of Perl back to Perl 3 would have done the same thing.
Someday, when all Perl 4 programs buried safely in the La Brea tar
pits, this error message might be removed, and Perl will just assume
that all strings that contain @ signs will interpolate arrays, and
that everyone who wants a literal @ sign in a string will always
precede it with a backslash. But for now, the world is still full of
good Perl 4 code that works and works well, and it can't be silently
ruined.
Now let's return the the questions I posed at the beginning of the
article.
"Since Perl knows that the backslash is missing,
why doesn't it just put it in for you?"
It doesn't really know that it's missing; it's not sure if you wanted
the array to be interpolated or not.
"And why is it required anyway?"
Because starting with Perl 5, @array in a double-quoted string
*always* interpolates the array. To prevent this, you must use the
backslash.
"And how come Perl doesn't deliver that message consistently?"
If you happened to mention the array before you mentioned the string,
Perl can be sure that nothing will break if it interpolates the array
into the string, so it goes ahead and does that. But if not, it can't
be sure, and it has to take the safe route and warn you that something
might be wrong.
Finally:
"What can I do to fix it?"
If you wanted the array to be interpolated, then declare it at the top
of your program with `use vars'.
If not, put a backslash before the @ sign.