Summary

When updating the list of available packages, or upgrading to a new version of a package, most of the data is already on the system. Using an rsync-based download method should significantly speed up the download. The apt-sync program already does this, or at least most of it. apt-sync should be packaged and uploaded to Ubuntu karmic, and benchmarked to evaluate its impact on the archive, mirrors, and various kinds of systems running Ubuntu.

Release Note

The apt-sync package is now included in Ubuntu and will make upgrades faster.

Rationale

Bandwidth is scarce or expensive in large parts of the world. Users may have Internet access only via a dial-up modem, GPRS, 3G. Even users on fast connections may be paying by the byte, or have monthly download limits. Avoiding needless downloads is a worthy goal and would benefit a lot of Ubuntu users.

When downloading a new version of the Packages file, it will almost always be only slightly different from the one on the system already. By using the rsync algorithm, it is possible to download only the changed parts.

Likewise, when upgrading packages to newer versions, the new version is often mostly identical to the old one, with parts of it changed. Again, it would be good to only download the changed parts.

The rsync program of the rsync algorithm is unsuitable for this, primarily because it requires a lot of server-side resources. However, the zsync implementation works around this: the server is a standard HTTP server, and a new data file is added (containing the rsync "signature" data). All the rest of the computation will then happen on the client, which uses HTTP Range requests to fetch parts of the file from the server.

The apt-sync program was written in 2006 to implement this. It is not, however, deployed in Debian or Ubuntu at the current time.

apt-sync should be packaged for Debian and Ubuntu, and deployed in an experimental manner, to gather feedback from users. Additionally, some benchmarks should be run to see how big an impact it has on the archive mirrors, various kinds of client systems, and how much benefit it has as far as the amount of bandwidth spent downloading.

apt-sync may require changes to combine .aptsync files per source package: otherwise the number of files in the archive will increase by the number of .deb files, i.e., probably about a quarter million, and this is a big impact on the archive and mirror network.

User stories

Assumptions

Design

The RPM world has presto, which relies on the package archive generating delta-RPM packages. These are then downloaded normally, and applied by the client differently from normal RPMs. This requires upgrades to happen from a specific version, which the rsync/zsync/apt-sync approach avoids.

Benchmarks

At least the following results need to be produced by the benchmarks:

Implementation

Benchmarks

For bandwidth use:

For .deb reconstruction:

Impact on server resources:

Archive size impact:

Packages files:

Benchmarks

UI Changes

There should be no UI changes, except that apt-get and Synaptic and update-manager might be modified to tell how much data apt-sync managed to NOT transfer. This would allow users to see that it is working.

Code Changes

The apt package may require minor changes to support apt-sync as a download method, but probably not.

The apt-sync program should need fairly little changes to get it usable.

Launchpad will require code to generate .aptsync files for the entire archive, but this can happen later, after apt-sync has been proven to be a useful thing. For testing, the .aptsync files can be provided by different servers than the usual mirrors.

Migration

If the user is using a mirror without .aptsync files, everything should work just fine, the way it works now. Once .aptsync files get added to the mirrors, upgrade downloads should happen faster. The archives can add only those .aptsync files that are deemed necessary (e.g., only for -security).

If apt-sync turns out to work well, it should be installed and enabled by default, and users should not have to change anything.

Test/Demo Plan

Suggestion: we modify apt and update-manager to report how much data it managed to NOT transfer, thanks to apt-sync. This will subtly tell users that things work well.

Unresolved issues

Using HTTP content negotiation would allow another way to hide the large number of extra files from mirrors that don't want the impact. The apt-sync client would specifically tell the HTTP server it can handle apt-sync files, and the server would give it if necessary. However, this would complicate things for all mirrors who do want to support apt-sync, so the single apt-sync file per release, suggested by Colin, would probably work better.


CategorySpec

AptSyncInKarmicSpec (last edited 2009-10-21 14:39:46 by xdsl-237-224)