In this paper I will go over the steps involved in implementing redundant, load-distributed proxy caches using carp(4) (Common Address Redundancy Protocol) and round-robin DNS. This is an effective solution because CARP provides service redundancy and round-robin DNS will add primitive (but effective) load distribution between the two services. There are, of course, issues with round-robin DNS, specifically nameserver/resolver caching. This may be a problem in some situations; it was not in our case. Some may prefer a method such as ARP balancing. I will describe this all in terms of FreeBSD (5.4) but the steps should be pretty close among other systems too. The specific service I will be discussing is Squid which is an open-source, caching web proxy.
Squid is part of the FreeBSD ports collection (www/squid/) and can be installed as such. The configuration of Squid is well beyond the scope of this guide but there are a few important options to look at. Specifically, we can save bandwidth by having both instances of Squid query eachother for an item before going and fetching it remotely. Consider the following example which will enable this peering between the two hosts:
# ICP (Internet Cache Protocol, RFC 2816) # Used to query neighbor cache for unfound items before fetching them icp_port 3130 acl my_proxies src host-a.example.com acl my_proxies src host-b.example.com icp_access deny !my_proxies cache_peer host-b.example.com sibling 3128 3130 proxy-only # <-- this on host A cache_peer host-a.example.com sibling 3128 3130 proxy-only # <-- this on host B |
Check out the default squid.conf installed with the port for other available options to suit your needs. To have Squid start automatically, simply add the following line to /etc/rc.conf:
squid_enable="YES"
You will also want to create a WPAD (Web Proxy Auto Discovery) script for clients to automatically configure their settings. This script can be pretty complex, and there are some good examples available online2.
function FindProxyForURL(url, host) {
if (shExpMatch(host, "localhost") || isPlainHostName(host)) {
return "DIRECT";
} else {
return "PROXY proxy.example.com; DIRECT";
// if the proxy is inaccessible, go DIRECT
}
}
|
Make this script available on your web server, and be sure Apache is configured to serve it with the correct MIME type (see the mime.types file):
application/x-ns-proxy-autoconfig dat
You can set up ISC DHCP with the following options so that DHCP clients can automatically learn the URL for the WPAD file instead of performing a series of guesses. Add the following to your dhcpd.conf file:
option wpad-url code 252 = text; option wpad-url "http://www.example.com/wpad.dat\n";
Your clients can configure their browser with the location of the WPAD file, or they can select "Automatically Discover Proxy Settings." If, for some reason, DHCP INFORM is not working, most browsers will try a series of lookups starting from their current domain. For example, host-c.office.example.com might try wpad.office.example.com, and then wpad.example.com. Therefore, it is a good idea to set these entries up in your DNS zone files.
You will need to have CARP support in the kernels of both hosts. This can be done simply by adding
device carpto the kernels and recompiling as described in the FreeBSD Handbook.
You will also need to edit /etc/rc.conf so that the interfaces are automatically created once the machine is rebooted:
host A:
cloned_interfaces="carp0 carp1" ifconfig_carp0="up vhid 1 advbase 1 advskew 0 pass 4jsfkekUL3z 192.168.10.100/24" ifconfig_carp1="up vhid 2 advbase 1 advskew 100 pass dlfKo3dF13 192.168.10.101/24" |
host B:
cloned_interfaces="carp0 carp1" ifconfig_carp0="up vhid 1 advbase 1 advskew 100 pass 4jsfkekUL3z 192.168.10.100/24" ifconfig_carp1="up vhid 2 advbase 1 advskew 0 pass dlfKo3dF13 192.168.10.101/24" |
This will create two CARP interfaces on both hosts. Note the `advskew` parameter, which is flipped between hosts. The carp(4) man page describes this parameter as follows:
The advbase parameter stands for "advertisement base". It is measured in seconds and specifies the base of the adverisement interval. The advskew parameter stands for "advertisement skew". It is measured in 1/256 of seconds. It is added to the base advertisement interval to make one host advertise a bit slower that the other does. Both advbase and advskew are put inside CARP advertisments.
Whichever host has a higher advskew on a particular address will advertise slightly slower, thus making the other host the "preferred master." This means we will have two virtual addresses, shared by both host A and host B. It should look something like this in a normal state:
host A: 192.168.10.100 (master)
192.168.10.101 (backup)
host B: 192.168.10.100 (backup)
192.168.10.101 (master)
You also need to enable preemption on both hosts so they can take over an address when they are preferred master.
# echo "net.inet.carp.preempt=1" >> /etc/sysctl.conf
Double-check all of your configurations and then reboot the machines to load the new kernel and create the interfaces.
Now that we have two shared addresses, we can distribute the load between them using round-robin DNS. The idea behind this is extremely simple, and the configuration is even simpler. Round-robin DNS configuration is accomplished by having multiple address records for a particular host such as the example below. This will cause the DNS server to return alternating answers (at least in later versions of BIND). For example, consider the following:
proxy.example.com. IN A 192.168.10.100 proxy.example.com. IN A 192.168.10.101 |
When a type-A Resource Record is requested for proxy.example.com, the server will first answer with 192.168.10.100. The second query will receive an answer of 192.168.10.101. Remember that both of these addresses are CARP-enabled. The clients will be distributed among both addresses, but in the event that one host is unavailable (the entire host, or just that CARP-enabled address), the other will take over. This results in a fairly even distribution over time. Don't forget to create the PTR records for those IP addresses as well.
Verify your settings and restart any services you've changed to reload their configuration files.
It is always a good idea to firewall servers, and there are a few things to note in this setup in regard to firewalling. I won't go over how to write the rules, but here are a couple of tips:
Credit to Nick Hatch for proposing this idea as a solution to implement redundant critical services.