Creating a Local Repository With Rsync

If your Linux distribution is RedHat or one of those that is based on RedHat, such as CentOS, and you have more than one machine, and you keep all of the machines at the same distribution level, you might want to consider creating a local repository to contain a cached copy of the distribution and all of its updates and patches, so that all of your machines need not repeatedly download them off the Internet.

The local repository can be primed using the distribution disk and then all of the updates can be downloaded regularly using rsync to keep the local repository in sync with one of the OS distribution mirrors. Since rsync only downloads differences in the files that it mirrors, it is quite efficient at reducing bandwidth usage.

These notes assume that you are using CentOS and are, therefore, specific to it. However, you should be able to get things working with RedHat by altering some of the file/path names herein, as appropriate.

First, create two directories that will be the local repository for all of the RPM files for the OS and its updates. We like to put this directory in the /var/cache tree so that we know what its for:

     su
     mkdir -p /var/cache/CentOS/6/os/i386
     mkdir -p /var/cache/CentOS/6/updates/i386

This creates the directory structure for the i386 distribution of the OS. If you are using the x86_64 version instead, make these directories:

     su
     mkdir -p /var/cache/CentOS/6/os/x86_64
     mkdir -p /var/cache/CentOS/6/updates/x86_64

Note that, regardless of which point release of the OS you are using, you must only use the major release number in the directory tree. Thus, for CentOS 6.0, 6.1, 6.2, etc., just use 6.

If you'd like to change the permissions on the subdirectory tree, say to give group permissions to one of the worker bees who will maintain it, now's the time to do so. For example:

     su
     chown root:wbee -R /var/cache/CentOS
     chmod g+w -R /var/cache/CentOS

Now, mount the CentOS installation DVD somewhere and copy all of the relevant files (e.g. for the i386 distribution, copy the i386.rpm and noarch.rpm files, as well as the repository files in the mounted i386 .iso directory or on the i386 DVD). You can either create the required subdirectories and copy the files directly into them from another machine on the network, as shown for 32-bit CentOS 5:

     su
     mkdir /var/cache/CentOS/5/os/i386/CentOS
     mkdir /var/cache/CentOS/5/os/i386/repodata
     (copy using FTP, NFS or Samba)

Or, as shown for 32-bit CentOS 6:

     su
     mkdir /var/cache/CentOS/6/os/i386/Packages
     mkdir /var/cache/CentOS/6/os/i386/repodata
     (copy using FTP, NFS or Samba)

Or, you can try mounting the .iso file, in loopback mode, if it is available to you, as shown for 32-bit CentOS 5:

     su
     mount -o loop,unhide -t iso9660 \
           -r /my/iso/images/CentOS-5.6-i386-bin-DVD.iso /mnt
     cp -rp /mnt/CentOS /mnt/repodata /var/cache/CentOS/5/os/i386/
     umount /mnt

Or, as shown for 32-bit CentOS 6:

     su
     mount -o loop,unhide -t iso9660 \
           -r /my/iso/images/CentOS-6.3-i386-bin-DVD1.iso /mnt
     cp -rp /mnt/Packages /mnt/repodata /var/cache/CentOS/6/os/i386/
     umount /mnt
     mount -o loop,unhide -t iso9660 \
           -r /my/iso/images/CentOS-6.3-i386-bin-DVD2.iso /mnt
     cp -rp /mnt/Packages /var/cache/CentOS/6/os/i386/
     umount /mnt

Or, you can actually mount the DVD directly on the repository machine and do the copy from it, as shown for 32-bit CentOS 5, like this:

     su
     mount /dev/cdrom /mnt
     cp -rp /mnt/CentOS /mnt/repodata /var/cache/CentOS/5/os/i386/
     umount /mnt

Or, as shown for 32-bit CentOS 6, like this:

     su
     mount /dev/cdrom /mnt  (the first DVD)
     cp -rp /mnt/Packages /mnt/repodata /var/cache/CentOS/6/os/i386/
     umount /mnt
     mount /dev/cdrom /mnt  (the second DVD)
     cp -rp /mnt/Packages /var/cache/CentOS/6/os/i386/
     umount /mnt

Again, if you don't like the permissions that got applied to the copied files or preserved from the DVD, you can set them to your liking. For example:

     su
     chown root:wbee -R /var/cache/CentOS/6/os/i386/
     chmod ug=rw,o=r -R /var/cache/CentOS/6/os/i386/
     chmod ugo+x /var/cache/CentOS/6/os/i386/*

In order for yum (the package manager) to access the local repository cache, it must be set up as a Web service. We presume that you don't want to set it up on the main Web service port (i.e. port 80) for a number of reasons, not the least of which is that you are probably using that one for a real Web service. Another reason is so that you can block access, with your firewall, to the local repository port and not make your CentOS archive available to the entire world. Even if you do nothing, the port we've picked herein is not exposed to the outside world by most firewalls, whereas port 80 probably is.

We'll add a virtual host that listens on port 3142 (the same port as is used by apt-cacher) to the Apache configuration on your system. The following lines should be added to your Apache config file, in the virtual hosts section:

/etc/httpd/conf/httpd.conf:

.

       .

##
## CentOS repository Virtual Host Context ##
Listen 3142

     <VirtualHost default:3142>
     #
     #  Document root directory for the CentOS repository.  This overrides the
     #  main Web server's document root.
     #
     DocumentRoot "/var/cache/CentOS/html"
     <Directory "/var/cache/CentOS/html">
         AllowOverride None
         Require all granted
     </Directory>
     #
     #  The centos directory is how yum gets its files.
     #
     Alias /centos "/var/cache/CentOS"
     <Directory "/var/cache/CentOS">
         Options Indexes
         AllowOverride None
         Require all granted
     </Directory>
     #
     #  Directories defined in the main server that we don't want people to see
     #  under this port.
     #
     Alias /manual "/var/cache/CentOS/limbo"
     Alias /doc "/var/cache/CentOS/limbo"
     </VirtualHost>
          .
          .
          .

Note that we must provide a document root directory that points somewhere or Apache will thoughtfully show you the index.html file from your main Web service (how convenient). In this case, we've aimed it at /var/cache/CentOS/html and we'll address that problem in the next step. You could aim the document root at a non-existant directory, let's say /var/cache/CentOS/limbo, if you want, providing you don't mind Apache whining that the directory doesn't exist, every time it starts.

Also note that the directory permissions for /var/cache/CentOS apply equally to the document root directory, since it is a subdirectory thereof. Finally, most default Apache config files come with /manual and /doc defined. We alias these to limbo so that you don't see what's in those directories, by default.

Presuming that you chose to aim document root at /var/cache/CentOS/html, we now need to create an index file that will be shown whenever the user hits the bare URL (otherwise, Apache thoughtfully shows the one from the main Web service). Start with:

     su
     mkdir /var/cache/CentOS/html
     chown root:wbee /var/cache/CentOS/html
     chmod ug=rwx,o=rx /var/cache/CentOS/html

Then create an index.html file in this directory that will be shown by Apache whenever someone hits the bare URL. We find it convenient to give a little blurb and link to the actual repository:

/var/cache/CentOS/html/index.html:

     <html>
     <head>
     <title>CentOS Repository</title>
     </head>
     <body>
     <p> Welcome to the local CentOS repository cache.  The links will take you
     to the various distributions that are cached.
     <p><a href="/centos/6">CentOS 6</a>
     </body>

Note that you can update this page, as more distributions of the OS are added to the repository.

Recycle Apache so that the changes to its config file take effect:

     su
     /etc/init.d/httpd restart

At this point you have a local mirror for the distribution´s installation media set up. To verify that it is working, launch your Web browser and open this URL:

     http://myrepository:3142/centos/6/os/i386

or

     http://myrepository:3142/centos/6/os/x86_64

Replace "myrepository" with the name or IP address of your repository server. You should see a directory listing containing two directories, repodata and CentOS. There must not be a "File not found message".

Now, on to the updates. Unlike the installation media, updates change often. This being the case, we need a way to keep our local repository in sync with the distribution updates on the update servers. Thanks to Tridge, rsync is the tool for the job. It compares a remote directory tree with a local one, scaning for any changes and then applying the changes to the local directory. Since it transfers only the deltas between the remote and local files, it is very efficient and economical on bandwith.

To keep our repository up to date, we'll set up rsync to run once per day and resynch the local copy of the repository by pulling all of the updates since yesterday. This can be done when cron runs the daily jobs, in the wee hours of the morning. To do so, we create a new daily job file (as root) for the resync and put the following content into it:

/etc/cron.daily/yum-repos-update:

     #!/bin/sh
     #
     # Source the clustering configuration.
     #
     if [ -f /etc/sysconfig/clustering ] ; then
         . /etc/sysconfig/clustering
     else
         SERVERROLE=Standalone
     fi
     if [ x"$SERVERROLE" == x ]; then
         SERVERROLE=Standalone
     fi
     #
     # Run rsync to synchronize the current CentOS 6 local repository with the
     # remote repository.
     #
     if [ ${SERVERROLE} != "Secondary" ] ; then
         /usr/local/bin/rsync -av --delete \
             --exclude-from=/var/cache/CentOS/6/CentOS-6.3.excludes \
             rsync://your.rsync.mirror.server/centos/6.3/updates/i386 \
             --exclude=debug/ /var/cache/CentOS/6/updates/ > /dev/null
         chown root:wbee -R /var/cache/CentOS/6/updates/
         chmod g+w -R /var/cache/CentOS/6/updates/
     fi

Replace "rsync://your.rsync.mirror.server/centos" with an rsync server (such as "rsync://mirror.trouble-free.net/centos/") that's near you. You can choose the correct URL from the list at:

     http://www.centos.org/modules/tinycontent/index.php?id=30

Also, replace "6.3" with the version of the OS that you are actually mirroring. And, replace "i386" with "x86_64", if the archive that you're mirroring is the 64-bit archive. And, if you don't care about (or are happy with) the permissions set on the retrieved packages, you can remove the two lines that set the owner/group and permissions after the rsync.

Next, make the file executable:

     su
     chmod u=rwx,go=rx /etc/cron.daily/yum-repos-update

Now, before we go any further, you'll note that the rsync command line, in the yum-repos-update script has an --exclude-from file name supplied. This is where you put file name patterns that specify all of the crap that's in the repository. If you put the names of packages that you never install in this file, you'll save a lot of bandwidth by not downloading them.

You should create this file, before you run the script, as follows:

     su
     touch /var/cache/CentOS/6/CentOS-6.3.excludes
     chown root:wbee /var/cache/CentOS/6/CentOS-6.3.excludes
     chmod ug=rw,o=r /var/cache/CentOS/6/CentOS-6.3.excludes

You can leave the file empty, if you wish, or use your favorite text editor to edit it and add all of the packages you'd like to exclude from the local repository. Look at the remote HTML directory that corresponds to the rsync server's updates directory for the mirrored version of the OS and follow the directory tree down to either the RPMS or Packages/drpms directories. There, you'll see all of the files that will be mirrored. Pick the ones that you don't care about and make up file name patterns to exclude them. Here's our suggestions:

/var/cache/CentOS/6/CentOS-6.3.excludes:

     kde
     libreoffice

Note that the rules for the patterns that are used, etc., can be found in the rsync man page under the FILTER RULES section.

You can also look a your local repository in the CentOS/Packages directory under the os/i386 or os/x86_64 tree to see the file names of all of the packages that can be installed. This will give you an idea of other possibilities for file name patterns.

OK, to prime the repository and ensure that the script works properly, run the update once, manually:

     su
     /etc/cron.daily/yum-repos-update

It will probably take several hours and afterwards there should be quite a few files in your updates directory that, for CentOS 5, will look like this:

     /var/cache/CentOS/5/updates/i386/repodata
                                     /repodata/filelists.sqlite.bz2
                                               filelists.xml.gz
                                               other.sqlite.bz2
                                                    .
                                                    .
                                                    .
                                     /RPMS
                                     /RPMS/acpid-1.0.4-9.el5_4.1.i386.rpm
                                           acpid-1.0.4-9.el5_4.2.i386.rpm
                                           autofs-5.0.1-0.rc2.131.el5_4.1.i386.rpm
                                                .
                                                .
                                                .

For CentOS 6, your updates directory will look like this:

     /var/cache/CentOS/6/updates/i386/drpms
                                     /drpms/389-ds-base-1.2.10.2...i686.drpm
                                            apr-1.3.9-3...i686.drpm
                                            bind-chroot-9.7.0...i686.drpm
                                                 .
                                                 .
                                                 .
                                     /Packages
                                     /Packages/389-ds-base-1.2.10.2...i686.rpm
                                               apr-1.3.9-5.el6_2.i686.rpm
                                               bind-9.8.2-0.10...i686.rpm
                                                    .
                                                    .
                                                    .
                                     /repodata
                                     /repodata/0041b...lists.sqlite.bz2
                                               00446...-prestodelta.xml.gz 00787...-primary.xml.gz
                                                    .
                                                    .
                                                    .

You can visually check that the files were mirrored properly by looking at the remote HTML directory that corresponds to the rsync directory. As before, if you follow the directory tree down to the repodata and RPMS or Packages/drpms directories, you'll see all of the files that should have been mirrored. Inspect a few names to see if they all appear to be present except for those that you've specifically excluded in your excludes file.

Note that, as usual, if your repository is 64-bit, you should replace "i386" wherever it appears, above, with "x86_64".

To begin actually using your new repository, all of the individual clients will have to be modified so that they point to the local repository, instead of the standard, default repositories. The list of repositories that the CentOS package manager (yum) uses is maintained in files under the /etc/yum.repos.d directory.

In the files in that directory, you'll see either a parameter named "baseurl" or "mirrorlist". The former points directly at a repository whereas the latter uses the yum mirror plugin to point at a mirror list server, thereby allowing yum to automatically select the optimal repository server to use. Depending on how your default repositories are set up, you could see either of these two parameters. Regardless of which one is employed, you need to comment them out and replace them with a baseurl that points to the local repository. Do that with your favorite text editor:

/etc/yum.repos.d/CentOS-Base.repo:

.

       .

[base]
name=CentOS-$releasever - Base
#lm mirrorlist=http://mirrorlist.centos.org/?release=$releasever\ #lm &arch=$basearch&repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/ baseurl=http://myrepository:3142/centos/$releasever/os/$basearch/ gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

     #released updates 
     [updates]
     name=CentOS-$releasever - Updates
     #lm mirrorlist=http://mirrorlist.centos.org/?release=$releasever\
     #lm    &arch=$basearch&repo=updates
     #baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
     baseurl=http://myrepository:3142/centos/$releasever/updates/$basearch/
     gpgcheck=1
     gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
          .
          .
          .

In the above changes, we commented out the mirrorlist parameter with "#lm". The baseurl parameter was already commented out by the package installer when yum was upgraded to use the mirror list plugin. We added the baseurl parameter that points to our local repository. The server name "myrepository" should be replaced with your actual server's name or IP address and the port "3142" should be replaced with the port number that you used for the repository virtual host in the Apache config file (above). Note that the changes are the same, regardless of what release of CentOS you are mirroring (or even if you have mirror repositories for multiple versions of CentOS, for that matter).

Finally, we can run yum on the modified machine to see if the new repository works:

     su
     yum clean all
     yum update

If all runs well, you have properly set up a local CentOS repository. You can apply the repository list changes to all of the client machines that are to use the local repository for their updates.