The Advantages:
Caching of static objects. So these are being served much faster assuming that your cache size is big enough to keep the most requested objects in the cache.
Buffering of dynamic content, by taking the burden of returning the content generated by mod_perl servers to slow clients, thus freeing mod_perl servers from waiting for the slow clients to download the data. Freed servers immediately switch to serve other requests, thus your number of required servers goes dramatically down.
Non-linear URL space / server setup. You can use Squid to play some tricks with the URL space and/or domain based virtual server support.
The Disadvantages:
Proxying dynamic content is not going to help much if all the clients are on a fast local net. Also, a message on the squid mailing list implied that squid only buffers in 16k chunks so it would not allow a mod_perl to complete immediately if the output is larger.
Speed. Squid is not very fast today when compared to plain file based web servers available. Only if you are using a lot of dynamic features such as mod_perl or similar speed is a reason to use Squid, and then only if the application and server is designed with caching in mind.
Memory usage. Squid uses quite a bit of memory.
HTTP protocol level. Squid is pretty much a HTTP/1.0 server, which seriously limits the deployment of HTTP/1.1 features.
HTTP headers, dates and freshness. The squid server might give out ``old'' pages, confusing downstream/client caches. Also chances are that you will be giving out stale pages. (You update the some documents on the site, but squid will still serve the old ones.)
Stability. Compared to plain web servers Squid is not the most stable.
The presented pros and cons lead to an idea, that probably you might want squid more for its dynamic content buffering features, but only if your server serves mostly dynamic requests. So in this situation it is better to have a plain apache server serving static objects, and squid proxying the mod_perl enabled server only. At least when performance is the goal.
While I have detailed the mod_perl server installation, you are on your own
with installing the squid server. I run linux, so I downloaded the rpm
package, installed it, configured the
/etc/squid/squid.conf, fired off the server and was all set. Basically once you have the squid
installed, you just need to modify the default squid.conf the way I will explain below, then you are ready to run it.
First, let's understand what do we have in hands and what do we want from
squid. We have an httpd_docs and httpd_perl servers listening on ports 81 and 8080 accordingly (we have to move the
httpd_docs server to port 81, since port 80 will be taken over by squid).
Both reside on the same machine as squid. We want squid to listen on port
80, forward a single static object request to the port httpd_docs server
listens to, and dynamic request to httpd_perl's port. Both servers return
the data to the proxy server (unless it is already cached in the squid), so
user never sees the other ports and never knows that there might be more
then one server running. Proxy server makes all the magic behind it
transparent to user. Do not confuse it with mod_rewrite, where a server redirects the request somewhere according to the rules and
forgets about it. The described functionality is being known as httpd accelerator mode in proxy dialect.
You should understand that squid can be used as a straight forward proxy
server, generally used at companies and ISPs to cut down the incoming
traffic by caching the most popular requests. However we want to run it in
the httpd accelerator mode. Two directives:
httpd_accel_host and httpd_accel_port enable this mode. We will see more details in a few seconds. If you are
currently using the squid in the regular proxy mode, you can extend its
functionality by running both modes concurrently. To accomplish this, you
extend the existent squid configuration with httpd accelerator mode's related directives or you just create one from scratch.
As stated before, squid listens now to the port 80, we have to move the httpd_docs server to listen for example to the port 81 (your mileage may vary :). So you have to modify the httpd.conf in the httpd_docs configuration directory and restart the httpd_docs server (But not before we get the squid running if you are working on the production server). And as you remember httpd_perl listens to port 8080.
Let's go through the changes we should make to the default configuration
file. Since this file (/etc/squid/squid.conf) is huge (about 60k+) and we would not use 95% of it, my suggestion is to
write a new one including only the modified directives.
We want to enable the redirect feature, to be able to serve requests, by
more then one server (in our case we have httpd_docs and httpd_perl)
servers. So we specify httpd_accel_host as virtual. This assumes that your server has multiple interfaces - Squid
will bind to all of them.
httpd_accel_host virtual
Then we define the default port - by default, if not redirected, httpd_docs will serve the pages. We assume that most requests will be of the static nature. We have our httpd_docs listening on port 81.
httpd_accel_port 81
And as described before, squid listens to port 80.
http_port 80
We do not use icp (icp used for cache sharing between neighbor machines), which is more relevant in the proxy mode.
icp_port 0
hierarchy_stoplist defines a list of words which, if found in a URL, causes the object to be
handled directly by this cache. In other words, use this to not query
neighbor caches for certain objects. Note that I have configured the /cgi-bin and /perl aliases for my dynamic documents, if you named them in a different way,
make sure to use the correct aliases here.
hierarchy_stoplist /cgi-bin /perl
Now we tell squid not to cache dynamic pages.
acl QUERY urlpath_regex /cgi-bin /perl no_cache deny QUERY
Please note that the last two directives are controversial ones. If you
want your scripts to be more complying with the HTTP standards, the headers
of your scripts should carry the Caching Directives
according to the HTTP specs. You will find a complete tutorial about this
topic in Tutorial on HTTP Headers for mod_perl users by Andreas J. Koenig (at http://perl.apache.org ). If you set the
headers correctly there is no need to tell squid accelerator to NOT
try to cache something. The headers I am talking about are
Last-Modified and Expires. What are they good for? Squid would not bother your mod_perl server a
second time if a request is (a) cachable and (b) still in the cache. Many
mod_perl applications will produce identical results on identical requests
at least if not much time goes by between the requests. So your squid might
have a hit ratio of 50%, which means that mod_perl servers will have as
twice as less work to do than before. This is only possible by setting the
headers correctly.
Even if you insert user-ID and date in your page, caching can save resources when you set the expiration time to 1 second. A user might double click where a single click would do, thus sending two requests in parallel, squid could serve the second request.
But if you are lazy, or just have too many things to deal with, you can leave the above directives the way I described. But keep in mind that one day you will want to reread this snippet and the Andreas' tutorial and squeeze even more power from your servers without investing money for additional memory and better hardware.
While testing you might want to enable the debugging options and watch the
log files in /var/log/squid/. But turn it off in your production server. I list it commented out. (28
== access control routes).
# debug_options ALL, 1, 28, 9
We need to provide a way for squid to dispatch the requests to the correct
servers, static object requests should be redirected to httpd_docs (unless
they are already cached), while dynamic should go to the httpd_perl server.
The configuration below tells squid to fire off 10 redirect daemons at the
specified path of the redirect daemon and disables rewriting of any Host: headers in redirected requests (as suggested by squid's documentation). The
redirection daemon script is enlisted below.
redirect_program /usr/lib/squid/redirect.pl redirect_children 10 redirect_rewrites_host_header off
Maximum allowed request size in kilobytes. This one is pretty obvious. If you are using POST to upload files, then set this to the largest file's size plus a few extra kbytes.
request_size 1000 KB
Then we have access permissions, which I will not explain. But you might want to read the documentation so to avoid any security flaws.
acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl myserver src 127.0.0.1/255.255.255.255 acl SSL_ports port 443 563 acl Safe_ports port 80 81 8080 81 443 563 acl CONNECT method CONNECT http_access allow manager localhost http_access allow manager myserver http_access deny manager http_access deny !Safe_ports http_access deny CONNECT !SSL_ports # http_access allow all
Since squid should be run as non-root user, you need these if you are invoking the squid as root.
cache_effective_user squid cache_effective_group squid
Now configure a memory size to be used for caching. A squid documentation warns that the actual size of squid can grow three times larger than the value you are going to set.
cache_mem 20 MB
Keep pools of allocated (but unused) memory available for future use. Read more about it in the squid documents.
memory_pools on
Now tight the runtime permissions of the cache manager CGI script (cachemgr.cgi,that comes bundled with squid) on your production server.
cachemgr_passwd disable shutdown #cachemgr_passwd none all
Now the redirection daemon script (you should put it at the location you
have specified by redirect_program parameter in the config file above, and make it executable by webserver of
course):
#!/usr/local/bin/perl
$|=1;
while (<>) {
# redirect to mod_perl server (httpd_perl)
print($_), next if s|(:81)?/perl/|:8080/perl/|o;
# send it unchanged to plain apache server (http_docs)
print;
}
In my scenario the proxy and the apache servers are running on the same
machine, that's why I just substitute the port. In the presented squid
configuration, requests that passed through squid are converted to point to
the localhost (which is 127.0.0.1). The above redirector can be more complex of course, but you know the
perl, right?
A few notes regarding redirector script:
You must disable buffering. $|=1; does the job. If you do not disable buffering, the STDOUT will be flushed only when the buffer becomes full and its default size is
about 4096 characters. So if you have an average URL of 70 chars, only
after 59 (4096/70) requests the buffer will be flushed, and the requests
will finally achieve the server in target. Your users will just wait until
it will be filled up.
If you think that it is a very ineffective way to redirect, I'll try to prove you the opposite. The redirector runs as a daemon. It fires up N redirect daemons, so there is no problem with perl interpreter loading, exactly like mod_perl -- perl is loaded all the time and the code was already compiled, so redirect is very fast (not slower if redirector was written in C or alike). Squid keeps an open pipe to each redirect daemon, thus there is even no overhead of the expensive system calls.
Now it is time to restart the server; in Linux I do it with:
/etc/rc.d/init.d/squid restart
Now the setup is complete ...
Next month I'll complete the ``squid and 2 webserver'' scenario and present a simpler ``squid and 1 webserver'' scenario, including the implementation details as usual.