J.P. Larocque (jp_larocque) wrote,
J.P. Larocque

  • Music:

Debian repository caching

There's about a dozen programs out there to lazily cache Debian repositories. What these do is act as a proxy to Debian apt repositories, serving apt on local Debian machines. There are three key features that have drawn me to the idea:

  1. Caching! But any HTTP proxy can do this.
  2. Indirection. To change to another mirror, change your configuration in one place.
  3. Keeping your cache when you move to another repository. A standard HTTP proxy on its own can't do this.

It all started with apt-proxy. Being a shell script web server (no joke), it suffered from all kinds of slowness and reliability problems. I threw that out about a year ago.

I don't remember how I ended up with Approx. Interestingly, it's written in OCaml. It's moderately fast. But it does have its shortcomings:

  1. Stale objects, at random, don't get refreshed when requested by a client. That is, Release will be up-to-date, but Sources.bz2 won't, and you'll have "MD5Sum mismatch" errors.
  2. Approx does the worst possible thing on a failed download: pretend that it's complete.
  3. As I recall, it also does "bad things" when you interrupt a download. Or worse, when you request the same object concurrently from two clients.

And then I realized there's an out-of-box solution. Plug any caching HTTP proxy into some web server proxying requests for /foo to http://some-repository.example.com/debian/foo. Configure apt clients to use the caching proxy to access the indirection-server; then your cached downloads will carry over even if you switch to another repository. (In theory.)

Since I use Polipo as my proxy—the same instance I use for my normal Interwebs experience—I get the following additional wins: pipelining, IPv6, partially-cached objects, concurrent client access to the same object1, and (oh oh!) STALE OBJECT INVALIDATION. You know, CORRECTNESS.

I have this stuffed in a <VirtualHost> directive:

# Repositories.
#ProxyPass /main/                       http://debian.oregonstate.edu/debian/ # Slow!  2006-10-21
#ProxyPass /main/                       http://mirrors.kernel.org/debian/ # Down!  2006-11-16
#ProxyPass /main/                       http://mirrors.usc.edu/pub/linux/distributions/debian/ # Faulty!  etch Packages lists are here, but not some packages.
#ProxyPass /main/                       http://ftp.us.debian.org/debian/ # Faulty!  See above.
ProxyPass /main/                        http://ftp-mirror.internap.com/pub/debian/
#ProxyPass /main/                       http://debian.crosslink.net/debian/
#ProxyPass /main/                       http://ftp.debian.org/debian/ # Not carrying some things: powerpc testing, i386-hurd unstable
ProxyPass /non-US/                      http://non-us.debian.org/debian-non-US/
ProxyPass /security/                    http://security.debian.org/
ProxyPass /blackdown-java/              http://mirrors.ibiblio.org/pub/mirrors/blackdown/debian/
#ProxyPass /marillat/                   ftp://ftp.nerim.net/debian-marillat/
ProxyPass /debian-mm/                   http://www.debian-multimedia.org/
ProxyPass /bunk/                        http://www.fs.tum.de/~bunk/debian/
#ProxyPass /amd64/                      http://mirror.espri.arizona.edu/debian-amd64/debian/ # Mirrored archive missing, 2006-10-11.
ProxyPass /amd64/                       http://debian.csail.mit.edu/debian-amd64/debian/
ProxyPass /debian-secure-testing/       http://secure-testing.debian.net/debian-secure-testing/

ProxyPass /ubuntu/                      http://ftp.osuosl.org/pub/ubuntu/
ProxyPass /ubuntu-security/             http://security.ubuntu.com/ubuntu/

# For GNU/Hurd.
ProxyPass /gnuab/                       http://ftp.gnuab.org/debian/

  1. In theory—empirical tests show serialization of requests, but the second-in-line gets a cached copy.
Tags: debian
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.