mod_perl Strategy and Implementation - Part 3

By: Stas Bekman


Title: mod_perl Strategy and Implementation - Part 3


Adding a Proxy Server in http Accelerator Mode

At the beginning there were 2 servers: one - plain apache server, which was very light, and configured to serve static objects, the other -- mod_perl enabled, which was very heavy and aimed to serve mod_perl scripts. We named them: httpd_docs and httpd_perl appropriately. The two servers coexisted at the same IP (DNS) by listening to different ports: 80 -- for httpd_docs (e.g. http://www.nowhere.com/images/test.gif ) and 8080 -- for httpd_perl (e.g. http://www.nowhere.com:8080/perl/test.pl ). Note that I did not write http://www.nowhere.com:80 for the first example, since port 80 is a default http port. (Later on, I will be moving the httpd_docs server to port 81.)

Now I am going to convince you that you want to use a proxy server (in the http accelerator mode). The advantages are:

The disadvantages are:

Have I succeeded in convincing you that you want the proxy server?

If you are on a local area network (LAN), then the big benefit of the proxy buffering the output and feeding a slow client is gone. You are probably better off sticking with a straight mod_perl server in this case.

As of this writing the two proxy implementations are known to be used in bundle with mod_perl - squid proxy server and mod_proxy which is a part of the apache server. This month we will talk about apache's mod_proxy:


An Apache's mod_proxy pros and cons

I do not think the difference in speed between apache's mod_proxy and squid is relevant for most sites, since the real value of what they do is buffering for slow client connections. However squid runs as a single process and probably consumes fewer system resources. The trade-off is that mod_rewrite is easy to use if you want to spread parts of the site across different back end servers, and mod_proxy knows how to fix up redirects containing the back-end server's idea of the location. With squid you can run a redirector process to proxy to more than one back end, but there is a problem in fixing redirects in a way that keeps the client's view of both server names and port numbers in all cases. The difficult case being where you have DNS aliases that map to the same IP address for an alias and you want the redirect to use port 80 (when the server is really on a different port) but you want it to keep the specific name the browser sent so it does not change in the client's Location window.

The Advantages:

The Disadvantages:


Installation and Configuration

To build it into apache just add --enable-module=proxy during the apache configure stage.

Now we will talk about apache's mod_proxy and understand how it works.

The server on port 80 answers http requests directly and proxies the mod_perl enabled server in the following way:

  ProxyPass        /modperl/ http://localhost:81/modperl/
  ProxyPassReverse /modperl/ http://localhost:81/modperl/

PPR is the saving grace here, that makes apache a win over Squid. It rewrites the redirect on its way back to the original URI.

You can control the buffering feature with ProxyReceiveBufferSize directive:

  ProxyReceiveBufferSize 1048576

The above setting will set a buffer size to be of 1Mb. If it is not set explicitly, then the default buffer size is used, which depends on OS, for Linux I suspect it is somewhere below 32k. So basically to get an immediate release of the mod_perl server from stale awaiting, ProxyReceiveBufferSize should be set to a value greater than the biggest generated respond produced by any mod_perl script.

The ProxyReceiveBufferSize directive specifies an explicit buffer size for outgoing HTTP and FTP connections. It has to be greater than 512 or set to 0 to indicate that the system's default buffer size should be used.

As the name states, its buffering feature applies only to downstream data (coming from the origin server to the proxy) and not upstream (i.e. buffering the data being uploaded from the client browser to the proxy, thus freeing the httpd_perl origin server from being tied up during a large POST such as a file upload).

Apache does caching as well. It's relevant to mod_perl only if you produce proper headers, so your scripts' output can be cached. See apache documentation for more details on configuration of this capability.

Ask Bjoern Hansen has written a mod_proxy_add_forward module for apache, that sets the X-Forwarded-For field when doing a ProxyPass, similar to what squid can do. (Its location is specified in the help section). Basically, that module adds an extra HTTP header to proxying requests. You can access that header in the mod_perl-enabled server, and set the IP of the remote server. You won't need to compile anything into the back-end server, if you are using Apache::{Registry,PerlRun} just put something like the following into start-up.pl:

  sub My::ProxyRemoteAddr ($) {
    my $r = shift;
   
        # we'll only look at the X-Forwarded-For header if the requests
        # comes from our proxy at localhost
        return OK unless ($r->connection->remote_ip eq "127.0.0.1");
   
        if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) {
          $r->connection->remote_ip($ip);
        }
        
        return OK;
  }

And in httpd.conf:

  PerlPostReadRequestHandler My::ProxyRemoteAddr

Different sites have different needs. If you're using the header to set the IP address, apache believes it is dealing with (in the logging and stuff), you really don't want anyone but your own system to set the header. That's why the above ``recommended code'' checks where the request is really coming from, before changing the remote_ip.

From that point on, the remote IP address is correct. You should be able to access REMOTE_ADDR as usual.

You could do the same thing with other environment variables (though I think several of them are preserved, you will want to run some tests to see which ones).


HTTP Authentication with 2 servers + proxy

Assuming that you have a setup of one ``front-end'' server, which proxies the ``back-end'' (mod_perl) server, if you need to perform the authentication in the ``back-end'' server, it should handle all authentication itself. If apache proxies correctly, it seems like it would pass through all authentication information, making the ``front-end'' apache somewhat ``dumb'', as it does nothing, but passes through all the information.

The only possible caveat in the config file is that your Auth stuff needs to be in <Directory ...> ... </Directory> tags because if you use a <Location /...> ... </Location> the proxypass server takes the auth info for its own authentication and would not pass it on.


Next month

Next month I'll continue talking about proxy servers and will present the squid proxy server: its drawbacks and benefits, configuration details.