Caching Updates With Apt-Cacher

If your Linux distribution is Debian or one of those that is based on Debian, such as Ubuntu, and you have more than one machine, and you keep all of the machines at the same distribution level, you might want to consider using apt-cacher to cache a local copy of distribution updates and patches, so that all of your machines need not repeatedly download them off the Internet.

You should choose a machine which will hold the cache repository and set up apt-cacher there. We usually pick a machine with enough spare disk space to support the cache, that is visible to all of the other systems and is running at all times.

Begin by installing the apt-cacher package thusly:

     sudo apt-get install apt-cacher

Once this is done, edit the apt-cacher configuration file (/etc/apt-cacher/apt-cacher.conf) to set up your specific options. Although you shouldn't have to change much, here are a few things for your consideration.

The default port that apt-cacher runs on is port 3142. You might want to change it if it collides with something else on your system (we can't think what, but ya never know).

By default, all hosts are allowed to use the repository cache. You probably want to lock this down so that only hosts on your local subnet can use it. For example, if your local subnet is 192.168.1, and you want to allow all machines on it and the local host to use the cache, you might set:

     allowed_hosts=192.168.1.0/24, 127.0.1.1

The traditional local host (127.0.0.1) is allowed by default so it is not necessary to add it. Apparently, on Ubuntu boxes, 127.0.1.1 is also considered to be the local host (if this is really true, nice going boneheads -- just what we need is another way of doing the same old thing differently) so, just in case somebody has a go at the cache with that IP address we allow it too.

By default, apt-cacher creates a report on a daily basis on how efficient your cache was. You might think, what's the point, who needs this? If so, you can turn it off:

     generate_reports=0

The part of the configuration that you do have to set up is the path map. This tells apt-cacher where to look for the packages that are to be fetched for its clients and stored in the cache. You'll probably want to include all of the repositories, that are going to be used by the clients, in this list.

We begin making up the list by examining the original /etc/apt/sources.list files on one or more, perhaps all, of the client machines. Here's a sample:

/etc/apt/sources.list:

     deb http://us.archive.ubuntu.com/ubuntu jaunty \
         main restricted universe multiverse
     deb http://us.archive.ubuntu.com/ubuntu jaunty-updates \
         main restricted universe multiverse
     deb http://security.ubuntu.com/ubuntu jaunty-security \
         main restricted universe multiverse
     deb http://archive.canonical.com/ubuntu jaunty partner
     deb-src http://us.archive.ubuntu.com/ubuntu jaunty-updates \
         main restricted universe multiverse
     deb-src http://us.archive.ubuntu.com/ubuntu jaunty \
         main restricted universe multiverse
     deb-src http://security.ubuntu.com/ubuntu jaunty-security \
         main restricted universe multiverse
     deb-src http://archive.canonical.com/ubuntu jaunty partner

There's a lot of commonality in the URLs that are found in this sample file. Essentially, there are three types: the Ubuntu archives; the security archives; and the archives of the bonus packages from Canonical. In the apt-cacher path list, we need to map these URLs to an alias name that the clients can use to reference those archives. The names chosen for the aliases don't matter but it is better to choose something that makes sense to you. Based on the URLs in the original /etc/apt/sources.list, we might set the path list up like this:

     path_map = ubuntu us.archive.ubuntu.com/ubuntu ; \
                security security.ubuntu.com/ubuntu ; \
                canonical archive.canonical.com/ubuntu

Normally, its OK to use the URLs that your system installation chose. But, if you are experiencing performance delays or other problems, now is a good time to choose mirrors that actually work for you. If you need help deciding which ones to use, the list can be found at:

     https://launchpad.net/ubuntu/+archivemirrors

According to the Ubuntu documentation, "if you are unsure which mirror to select the best option is [iso-country-code].archive.ubuntu.com where iso-country-code is the two character country abbreviation of a country near you. For example, if you live in the United States you could choose us.archive.ubuntu.com".

Once you've set up aliases to all of the URLs in the path list, you can access them on the client machines like this:

     repository_cache_machine:port/alias_name

So, for instance, we can access the ubuntu and security repositories through:

     http://myrepository:3142/ubuntu
     http://myrepository:3142/security

Now that we've chosen all the local options, here is a complete listing of a sample apt-cacher configuration file:

/etc/apt-cacher/apt-cacher.conf:

     #################################################################
     # This is the config file for apt-cacher. On most Debian systems
     # you can safely leave the defaults alone.
     #################################################################
     # cache_dir is used to set the location of the local cache. This can
     # become quite large, so make sure it is somewhere with plenty of space.
     cache_dir=/var/cache/apt-cacher
     # The email address of the administrator is displayed in the info page
     # and traffic reports.
     admin_email=root@localhost
     # For the daemon startup settings please edit the file
     # /etc/default/apt-cacher.
     # Daemon port setting, only useful in stand-alone mode. You need to run the
     # daemon as root to use privileged ports (<1024).
     daemon_port=3142
     # optional settings, user and group to run the daemon as. Make sure they have
     # sufficient permissions on the cache and log directories. Comment the
     # settings to run apt-cacher as the native user.
     group=www-data
     user=www-data
     # optional setting, binds the listening daemon to specified IP(s). Use IP
     # ranges for more advanced configuration, see below.
     # daemon_addr=localhost
     # If your apt-cacher machine is directly exposed to the Internet and you are
     # worried about unauthorised machines fetching packages through it, you can
     # specify a list of IPv4 addresses which are allowed to use it and another
     # list of IPv4 addresses which aren't.
     # Localhost (127.0.0.1) is always allowed. Other addresses must be matched
     # by allowed_hosts and not by denied_hosts to be permitted to use the cache.
     # Setting allowed_hosts to "*" means "allow all".
     # Otherwise the format is a comma-separated list containing addresses,
     # optionally with masks (like 10.0.0.0/22), or ranges of addresses (two
     # addresses separated by a hyphen, no masks, like '10.100.0.3-10.100.0.56').
     allowed_hosts=192.168.1.0/24, 127.0.1.1
     denied_hosts=
     # And similarly for IPv6 with allowed_hosts_6 and denied_hosts_6.
     # Note that IPv4-mapped IPv6 addresses (::ffff:w.x.y.z) are truncated to
     # w.x.y.z and are handled as IPv4.
     allowed_hosts_6=fec0::/16
     denied_hosts_6=
     # This thing can be done by Apache but is much simpler here - limit access
     # to Debian mirrors based on server names in the URLs
     #allowed_locations=ftp ftp.uni-kl.de,ftp ftp.nerim.net,debian.tu-bs.de
     # Apt-cacher can generate usage reports every 24 hours if you set this
     # directive to 1. You can view the reports in a web browser by pointing
     # to your cache machine with '/apt-cacher/report' on the end, like this:
     #      http://yourcache.example.com/apt-cacher/report
     # Generating reports is very fast even with many thousands of logfile
     # lines, so you can safely turn this on without creating much 
     # additional system load.
     generate_reports=1
     # Apt-cacher can clean up its cache directory every 24 hours if you set
     # this directive to 1. Cleaning the cache can take some time to run
     # (generally in the order of a few minutes) and removes all package
     # files that are not mentioned in any existing 'Packages' lists. This
     # has the effect of deleting packages that have been superseded by an
     # updated 'Packages' list.
     clean_cache=1
     # Apt-cacher can be used in offline mode which just uses files already
     # cached, but doesn't make any new outgoing connections by setting this to 1.
     offline_mode=0
     # The directory to use for apt-cacher access and error logs.
     # The access log records every request in the format:
     # date-time|client ip address|HIT/MISS/EXPIRED|object size|object name
     # The error log is slightly more free-form, and is also used for debug
     # messages if debug mode is turned on.
     # Note that the old 'logfile' and 'errorfile' directives are
     # deprecated: if you set them explicitly they will be honoured, but it's
     # better to just get rid of them from old config files.
     logdir=/var/log/apt-cacher
     # apt-cacher can use different methods to decide whether package lists need
     # to be updated,
     # A) looking at the age of the cached files
     # B) getting HTTP header from server and comparing that with cached data.
     # This method is more reliable and avoids desynchronisation of data and index
     # files but needs to transfer a few bytes from the server every time somebody
     # requests the files ("apt-get update")
     # Set the following value to the maximum age (in hours) for method A or to 0
     # for method B
     expire_hours=0
     # Apt-cacher can pass all its requests to an external http proxy like
     # Squid, which could be very useful if you are using an ISP that blocks
     # port 80 and requires all web traffic to go through its proxy. The
     # format is 'hostname:port', eg: 'proxy.example.com:8080'.
     #http_proxy=proxy.example.com:8080
     # Use of an external proxy can be turned on or off with this flag.
     # Value should be either 0 (off) or 1 (on).
     use_proxy=0
     # External http proxy sometimes need authentication to get full access. The
     # format is 'username:password'.
     #http_proxy_auth=proxyuser:proxypass
     # Use of external proxy authentication can be turned on or off with this
     # flag.  Value should be either 0 (off) or 1 (on).
     use_proxy_auth=0
     # This sets the interface to use for the upstream connection.
     # Specify an interface name, an IP address or a host name.
     # If unset, the default route is used.
     #interface=
     # Rate limiting sets the maximum bandwidth in bytes per second to use
     # for fetching packages. Syntax is fully defined in 'man wget'.
     # Use 'k' or 'm' to use kilobits or megabits / second: eg, 'limit=25k'.
     # Use 0 or a negative value for no rate limiting.
     limit=0
     # Debug mode makes apt-cacher spew a lot of extra debug junk to the
     # error log (whose location is defined with the 'logdir' directive).
     # Leave this off unless you need it, or your error log will get very
     # big. Acceptable values are 0 or 1.
     debug=0
     # To enable data checksumming, install libberkeleydb-perl and set this option
     # to 1. Then wait until the Packages/Sources files have been refreshed once
     # (and so the database has been built up). You can also nuke them in the
     # cache to trigger the update.  
     # checksum=1
     # Print a 410 (Gone) HTTP message with the specified text when accessed via
     # CGI. Useful to tell users to adapt their sources.list files when the
     # apt-cacher server is being relocated (via apt-get's error messages while
     # running "update")
     #cgi_advise_to_use = Please use http://cacheserver:3142/ as apt-cacher \
     #                    access URL
     #cgi_advise_to_use = Server relocated. To change sources.list, run \
     #                    perl -pe "s,/apt-cacher\??,:3142," \
     #                    -i /etc/apt/sources.list
     # Server mapping - this allows to hide real server names behind virtual paths
     # that appear in the access URL. This method is known from apt-proxy. This is
     # also the only method to use FTP access to the target hosts. The syntax is
     # simple, the part of the beginning to replace, followed by a list of mirror
     # urls, all space separated. Multiple profile are separated by semicolons
     # Note that you need to specify all target servers in the allowed_locations
     # options if you make use of it. Also note that the paths should not overlap
     # each other. FTP access method not supported yet, maybe in the future.
     # path_map = debian ftp ftp.uni-kl.de/pub/linux/debian \
     #            ftp ftp2.de.debian.org/debian ; ubuntu archive.ubuntu.com/ubuntu ; \
     #            security security.debian.org/debian-security \
     #            ftp ftp2.de.debian.org/debian-security
     path_map = ubuntu us.archive.ubuntu.com/ubuntu ; \
                security security.ubuntu.com/ubuntu ; \
                canonical archive.canonical.com/ubuntu
     # Permitted package files - this is a perl regular expression which matches
     # all package-type files (files that are uniquely identified by their
     # filename).  The default is: 
     #package_files_regexp = (?:\.deb|\.rpm|\.dsc|\.tar\.gz|\.diff\.gz|\.udeb|\
     #                       index\.db-.+\.gz|\.jigdo|\.template)$
     # Permitted Index files - this is the perl regular expression which matches
     # all index-type files (files that are uniquely identified by their full path
     # and need to be checked for freshness). 
     # The default is:
     #index_files_regexp = (?:Index|Packages\.gz|Packages\.bz2|Release|\
     #                     Release\.gpg|Sources\.gz|Sources\.bz2|\
     #                     Contents-.+\.gz|pkglist.\.bz2|release|release\..|\
     #                     srclist.*\.bz2|Translation-.+\.bz2)$

Note that, at this point in time, there is a bug that occurs with Translation files. Essentially, it is OK for the client to ask for a translation file that does not exist but when it does so and the file is not found, apt-cacher returns an error code that is incorrect. It doesn't appear to cause any real problems (and there is a fix in the works) but you will notice the errors on your repository cache client machines. You may be tempted to try and fix it by altering the index_files_regexp (in the apt-cacher configuration file) but don't bother. Its a bug in apt-cacher so you won't be able to do so.

We never could understand the reasoning behind this, but, when you install apt-cacher, it is installed deactivated. The startup script is put in the proper place in /etc/init.d and symlinks added for the right run levels. None the less, you still need to activate it manually:

/etc/default/apt-cacher:

     AUTOSTART=1

Once you've done this, you can restart apt-cacher:

     sudo /etc/init.d/apt-cacher restart

The next time you boot the system, it should come up all on its own.

After apt-cacher is successfully set up and running, its time to update all of the repository cache clients so that they will use the local cache instead of the WAN-based repositories. This is done by editing their source list so that it points to the local repository cache. And, the first candidate for this treatment might as well be the machine where the repository cache actually resides so that any packages that it requests will prime the cache.

If the sample sources list, that was shown above, is altered to use the local repository cache, it should now be changed to look something like this:

/etc/apt/sources.list:

     deb http://myrepository:3142/ubuntu jaunty \
         main restricted universe multiverse
     deb http://myrepository:3142/ubuntu jaunty-updates \
         main restricted universe multiverse
     deb http://myrepository:3142/security jaunty-security \
         main restricted universe multiverse
     deb http://myrepository:3142/canonical jaunty partner
     deb-src http://myrepository:3142/ubuntu jaunty-updates \
         main restricted universe multiverse
     deb-src http://myrepository:3142/ubuntu jaunty \
         main restricted universe multiverse
     deb-src http://myrepository:3142/security jaunty-security \
         main restricted universe multiverse
     deb-src http://myrepository:3142/canonical jaunty partner

Go through the configuration files on all of the client machines and update them in a similar fashion. Once that is done, you should be able to apply updates to them, via the local repository cache, in your usual fashion.

If you set up apt-cacher after you've already applied packages to the machine that has the repository cache on it, it is possible that it already has those package files cached in its local repository. If you wish, you can import them to the apt-cacher repository, where they will be available to everyone. One of the ancilliary scripts installed in the /usr/share/apt-cacher directory, that was created when you installed apt-cacher, is apt-cacher-import.pl, which handles this task.

In a misguided attempt at security, the writer of the script forces it to run as the user/group set in /etc/apt-cacher/apt-cacher.conf. Unfortunately, this prevents it from doing anything useful with the local repository (it wants to move all of the packages that it finds from the local repository to the cache), since all of the files in the local repository are owned by whoever installed them (typically root).

Once could change the user/group in /etc/apt-cacher/apt-cacher.conf to root/root temporarily, we suppose, but this defeats the purpose of using a separate user for the cache. If you'd like to keep the separate user/group (www-data, by default) for the cache files, just comment out the code at about line 75 that reads:

     setup_ownership($cfg);

Change it to read:

     # setup_ownership($cfg);

People go way too far overboard with security. If the guy is running as super user, he should be able to do whatever he wants. Peeling his permissions back to something less ain't going to cut it. He's just going to edit your stupid code and do what he wants. Think about that....

Anyway, once you've fixed the code, to import the package files from the system's local repository (/var/cache/apt/archives) to the apt-cacher repository, run:

     sudo /usr/share/apt-cacher/apt-cacher-import.pl /var/cache/apt/archives

The apt-cacher directory (/var/cache/apt-cacher/packages) should now be filled up with all of the packages that were in the apt local repository. At the same time, the local repository should now be empty, since apt-cacher-import.pl moves the packages that it finds to the repository cache.

If you left the directive generate_reports set to 1 in the apt-cacher config file, apt-cacher will generate a report on cache usage every day. You can view the report by pointing your Web browser at the URL:

     http://myrepository:3142/report

Where "myrepository" is the name of the machine that runs apt-cacher.

If you need to regenerate the report at any time other than when it is normally done, run:

     sudo /usr/share/apt-cacher/apt-cacher-report.pl