mod_perl Strategy and Implementation - Part 5

By: Stas Bekman


Proxy Servers (Continued from the previous article)

So we have finished the squid's setup in the last article. Almost...

When you try the presented setup, you will be surprised and upset to discover a port 81 showing up in the URLs of the static objects (like htmls). Hey, we did not want the user to see the port 81 and use it instead of 80, since then it will bypass the squid server and the hard work we went through was just a waste of time?

The solution is to run both squid and httpd_docs at the same port. This can be accomplished by binding each one to a specific interface. Modify the httpd.conf in the httpd_docs configuration directory:

  Port 80
  BindAddress 127.0.0.1
  Listen 127.0.0.1:80

Modify the squid.conf:

  http_port 80
  tcp_incoming_address 123.123.123.3
  tcp_outgoing_address 127.0.0.1
  httpd_accel_host 127.0.0.1
  httpd_accel_port 80

Where 123.123.123.3 should be replaced with IP of your main server. Now restart squid and httpd_docs in either order you want, and voila the port number has gone.

You must also have in the /etc/hosts an entry (most chances that it's already there):

  127.0.0.1  localhost.localdomain   localhost

Now if your scripts were generating HTML including fully qualified self references, using the 8080 or other port -- you should fix them to generate links to point to port 80 (which means not using the port at all). If you do not, users will bypass squid, like if it was not there at all, by making direct requests to the mod_perl server's port.

The only question left is what to do with users who bookmarked your services and they still have the port 8080 inside the URL. Do not worry about it. The most important thing is for your scripts to return a full URLs, so if the user comes from the link with 8080 port inside, let it be. Just make sure that all the consecutive calls to your server will be rewritten correctly. During a period of time users will change their bookmarks. What can be done is to send them an email if you have one, or to leave a note on your pages asking users to update their bookmarks. You could avoid this problem if you did not publish this non-80 port in first place.

To save you some keystrokes, here is the whole modified squid.conf:

  http_port 80
  tcp_incoming_address 123.123.123.3
  tcp_outgoing_address 127.0.0.1
  httpd_accel_host 127.0.0.1
  httpd_accel_port 80
  
  icp_port 0
  
  hierarchy_stoplist /cgi-bin /perl
  acl QUERY urlpath_regex /cgi-bin /perl
  no_cache deny QUERY
  
  # debug_options ALL,1 28,9
  
  redirect_program /usr/lib/squid/redirect.pl
  redirect_children 10
  redirect_rewrites_host_header off
  
  request_size 1000 KB
  
  acl all src 0.0.0.0/0.0.0.0
  acl manager proto cache_object
  acl localhost src 127.0.0.1/255.255.255.255
  acl myserver src 127.0.0.1/255.255.255.255
  acl SSL_ports port 443 563
  acl Safe_ports port 80 81 8080 81 443 563
  acl CONNECT method CONNECT
  
  http_access allow manager localhost
  http_access allow manager myserver
  http_access deny manager
  http_access deny !Safe_ports
  http_access deny CONNECT !SSL_ports
  # http_access allow all
  
  cache_effective_user squid
  cache_effective_group squid
  
  cache_mem 20 MB
  
  memory_pools on
  
  cachemgr_passwd disable shutdown

Note that all directives should start at the beginning of the line.


Running 1 webserver and squid in httpd accelerator mode

When I was first told about squid, I thought: ``Hey, Now I can drop the httpd_docs server and to have only squid and httpd_perl servers``. Since all my static objects will be cached by squid, I do not need the light httpd_docs server. But it was a wrong assumption. Why? Because you still have the overhead of loading the objects into squid at first time, and if your site has many of them -- not all of them will be cached (unless you have devoted a huge chunk of memory to squid) and my heavy mod_perl servers will still have an overhead of serving the static objects. How one would measure the overhead? The difference between the two servers is memory consumption, everything else (e.g. I/O) should be equal. So you have to estimate the time needed for first time fetching of each static object at a peak period and thus the number of additional servers you need for serving the static objects. This will allow you to calculate additional memory requirements. I can imagine, this amount could be significant in some installations.

So I have decided to have even more administration overhead and to stick with squid, httpd_docs and httpd_perl scenario, where I can optimize and fine tune everything. Of course this can be not your case. If you are feeling that the scenario from the previous section is too complicated for you, make it simpler. Have only one server with mod_perl built in and let the squid to do most of the job that plain light apache used to do. As I have explained in the previous paragraph, you should pick this lighter setup only if you can make squid cache most of your static objects. If it cannot, your mod_perl server will do the work we do not want it to.

If you are still with me, install apache with mod_perl and squid. Then use a similar configuration from the previous section, but now httpd_docs is not there anymore. Also we do not need the redirector anymore and we specify httpd_accel_host as a name of the server and not virtual. There is no need to bind two servers on the same port, because we do not redirect and there is neither Bind nor Listen directives in the httpd.conf anymore.

The modified configuration (see the explanations in the previous section):

  httpd_accel_host put.your.hostname.here
  httpd_accel_port 8080
  http_port 80
  icp_port 0
  
  hierarchy_stoplist /cgi-bin /perl
  acl QUERY urlpath_regex /cgi-bin /perl
  no_cache deny QUERY
  
  # debug_options ALL, 1, 28, 9
  
  # redirect_program /usr/lib/squid/redirect.pl
  # redirect_children 10
  # redirect_rewrites_host_header off
  
  request_size 1000 KB
  
  acl all src 0.0.0.0/0.0.0.0
  acl manager proto cache_object
  acl localhost src 127.0.0.1/255.255.255.255
  acl myserver src 127.0.0.1/255.255.255.255
  acl SSL_ports port 443 563
  acl Safe_ports port 80 81 8080 81 443 563
  acl CONNECT method CONNECT
  
  http_access allow manager localhost
  http_access allow manager myserver
  http_access deny manager
  http_access deny !Safe_ports
  http_access deny CONNECT !SSL_ports
  # http_access allow all
  
  cache_effective_user squid
  cache_effective_group squid
  
  cache_mem 20 MB
  
  memory_pools on
  
  cachemgr_passwd disable shutdown

That's all!


Next month

Next month I'll start ``mod_perl coding guidelines'' series of articles.