mod_perl Strategy and Implementation - Part 4

By: Stas Bekman


The Squid Server

The Advantages:

The Disadvantages:

The presented pros and cons lead to an idea, that probably you might want squid more for its dynamic content buffering features, but only if your server serves mostly dynamic requests. So in this situation it is better to have a plain apache server serving static objects, and squid proxying the mod_perl enabled server only. At least when performance is the goal.


Running 2 webservers and squid in httpd accelerator mode

While I have detailed the mod_perl server installation, you are on your own with installing the squid server. I run linux, so I downloaded the rpm package, installed it, configured the /etc/squid/squid.conf, fired off the server and was all set. Basically once you have the squid installed, you just need to modify the default squid.conf the way I will explain below, then you are ready to run it.

First, let's understand what do we have in hands and what do we want from squid. We have an httpd_docs and httpd_perl servers listening on ports 81 and 8080 accordingly (we have to move the httpd_docs server to port 81, since port 80 will be taken over by squid). Both reside on the same machine as squid. We want squid to listen on port 80, forward a single static object request to the port httpd_docs server listens to, and dynamic request to httpd_perl's port. Both servers return the data to the proxy server (unless it is already cached in the squid), so user never sees the other ports and never knows that there might be more then one server running. Proxy server makes all the magic behind it transparent to user. Do not confuse it with mod_rewrite, where a server redirects the request somewhere according to the rules and forgets about it. The described functionality is being known as httpd accelerator mode in proxy dialect.

You should understand that squid can be used as a straight forward proxy server, generally used at companies and ISPs to cut down the incoming traffic by caching the most popular requests. However we want to run it in the httpd accelerator mode. Two directives: httpd_accel_host and httpd_accel_port enable this mode. We will see more details in a few seconds. If you are currently using the squid in the regular proxy mode, you can extend its functionality by running both modes concurrently. To accomplish this, you extend the existent squid configuration with httpd accelerator mode's related directives or you just create one from scratch.

As stated before, squid listens now to the port 80, we have to move the httpd_docs server to listen for example to the port 81 (your mileage may vary :). So you have to modify the httpd.conf in the httpd_docs configuration directory and restart the httpd_docs server (But not before we get the squid running if you are working on the production server). And as you remember httpd_perl listens to port 8080.

Let's go through the changes we should make to the default configuration file. Since this file (/etc/squid/squid.conf) is huge (about 60k+) and we would not use 95% of it, my suggestion is to write a new one including only the modified directives.

We want to enable the redirect feature, to be able to serve requests, by more then one server (in our case we have httpd_docs and httpd_perl) servers. So we specify httpd_accel_host as virtual. This assumes that your server has multiple interfaces - Squid will bind to all of them.

  httpd_accel_host virtual

Then we define the default port - by default, if not redirected, httpd_docs will serve the pages. We assume that most requests will be of the static nature. We have our httpd_docs listening on port 81.

  httpd_accel_port 81

And as described before, squid listens to port 80.

  http_port 80

We do not use icp (icp used for cache sharing between neighbor machines), which is more relevant in the proxy mode.

  icp_port 0

hierarchy_stoplist defines a list of words which, if found in a URL, causes the object to be handled directly by this cache. In other words, use this to not query neighbor caches for certain objects. Note that I have configured the /cgi-bin and /perl aliases for my dynamic documents, if you named them in a different way, make sure to use the correct aliases here.

  hierarchy_stoplist /cgi-bin /perl

Now we tell squid not to cache dynamic pages.

  acl QUERY urlpath_regex /cgi-bin /perl
  no_cache deny QUERY

Please note that the last two directives are controversial ones. If you want your scripts to be more complying with the HTTP standards, the headers of your scripts should carry the Caching Directives according to the HTTP specs. You will find a complete tutorial about this topic in Tutorial on HTTP Headers for mod_perl users by Andreas J. Koenig (at http://perl.apache.org ). If you set the headers correctly there is no need to tell squid accelerator to NOT try to cache something. The headers I am talking about are Last-Modified and Expires. What are they good for? Squid would not bother your mod_perl server a second time if a request is (a) cachable and (b) still in the cache. Many mod_perl applications will produce identical results on identical requests at least if not much time goes by between the requests. So your squid might have a hit ratio of 50%, which means that mod_perl servers will have as twice as less work to do than before. This is only possible by setting the headers correctly.

Even if you insert user-ID and date in your page, caching can save resources when you set the expiration time to 1 second. A user might double click where a single click would do, thus sending two requests in parallel, squid could serve the second request.

But if you are lazy, or just have too many things to deal with, you can leave the above directives the way I described. But keep in mind that one day you will want to reread this snippet and the Andreas' tutorial and squeeze even more power from your servers without investing money for additional memory and better hardware.

While testing you might want to enable the debugging options and watch the log files in /var/log/squid/. But turn it off in your production server. I list it commented out. (28 == access control routes).

  # debug_options ALL, 1, 28, 9

We need to provide a way for squid to dispatch the requests to the correct servers, static object requests should be redirected to httpd_docs (unless they are already cached), while dynamic should go to the httpd_perl server. The configuration below tells squid to fire off 10 redirect daemons at the specified path of the redirect daemon and disables rewriting of any Host: headers in redirected requests (as suggested by squid's documentation). The redirection daemon script is enlisted below.

  redirect_program /usr/lib/squid/redirect.pl
  redirect_children 10
  redirect_rewrites_host_header off

Maximum allowed request size in kilobytes. This one is pretty obvious. If you are using POST to upload files, then set this to the largest file's size plus a few extra kbytes.

  request_size 1000 KB

Then we have access permissions, which I will not explain. But you might want to read the documentation so to avoid any security flaws.

  acl all src 0.0.0.0/0.0.0.0
  acl manager proto cache_object
  acl localhost src 127.0.0.1/255.255.255.255
  acl myserver src 127.0.0.1/255.255.255.255
  acl SSL_ports port 443 563
  acl Safe_ports port 80 81 8080 81 443 563
  acl CONNECT method CONNECT
  
  http_access allow manager localhost
  http_access allow manager myserver
  http_access deny manager
  http_access deny !Safe_ports
  http_access deny CONNECT !SSL_ports
  # http_access allow all

Since squid should be run as non-root user, you need these if you are invoking the squid as root.

  cache_effective_user squid
  cache_effective_group squid

Now configure a memory size to be used for caching. A squid documentation warns that the actual size of squid can grow three times larger than the value you are going to set.

  cache_mem 20 MB

Keep pools of allocated (but unused) memory available for future use. Read more about it in the squid documents.

  memory_pools on

Now tight the runtime permissions of the cache manager CGI script (cachemgr.cgi,that comes bundled with squid) on your production server.

  cachemgr_passwd disable shutdown
  #cachemgr_passwd none all

Now the redirection daemon script (you should put it at the location you have specified by redirect_program parameter in the config file above, and make it executable by webserver of course):

  #!/usr/local/bin/perl
  
  $|=1;
  
  while (<>) {
      # redirect to mod_perl server (httpd_perl)
    print($_), next if s|(:81)?/perl/|:8080/perl/|o;

      # send it unchanged to plain apache server (http_docs)
    print;
  }

In my scenario the proxy and the apache servers are running on the same machine, that's why I just substitute the port. In the presented squid configuration, requests that passed through squid are converted to point to the localhost (which is 127.0.0.1). The above redirector can be more complex of course, but you know the perl, right?

A few notes regarding redirector script:

You must disable buffering. $|=1; does the job. If you do not disable buffering, the STDOUT will be flushed only when the buffer becomes full and its default size is about 4096 characters. So if you have an average URL of 70 chars, only after 59 (4096/70) requests the buffer will be flushed, and the requests will finally achieve the server in target. Your users will just wait until it will be filled up.

If you think that it is a very ineffective way to redirect, I'll try to prove you the opposite. The redirector runs as a daemon. It fires up N redirect daemons, so there is no problem with perl interpreter loading, exactly like mod_perl -- perl is loaded all the time and the code was already compiled, so redirect is very fast (not slower if redirector was written in C or alike). Squid keeps an open pipe to each redirect daemon, thus there is even no overhead of the expensive system calls.

Now it is time to restart the server; in Linux I do it with:

  /etc/rc.d/init.d/squid restart

Now the setup is complete ...


Next month

Next month I'll complete the ``squid and 2 webserver'' scenario and present a simpler ``squid and 1 webserver'' scenario, including the implementation details as usual.