I see several problems with this scheme:
1) xdeltas on gzipped data tend to be very inefficient. A small change in the original data set tends to make the gzip stream radically different.
2) It requires that the user still have the old package on hand to patch
I have a different proposal:
Have the delta-deb contain the full control info, and xdeltas for all of the non config files in the data.tar section. That way the user does not need the original .deb, if they have the package installed, then they just need to download the xdeltas for the installed files and patch them in place.
The package system already knows which files are config files and which are not, and it knows the md5 sums for those files so it can verify that they are correct before patching them with the xdelta.
I have noticed that bsdiff always produced smaller diffs than xdelta, usually more than 10% smaller, sometimes over 80% smaller. Perhaps bsdiff should be used instead? See my blog entry for raw data.
It has been suggested that zsync be used instead. This is similar to the rsync algorithm, but has the advantage of working well with gzipped files created without the --rsyncable option, and does not need any special software running on the server. Also although the zsync system requires .zsync files to put up on the web, these files are small control files, only ~ 1% of the size of the files to be downloaded.
Over the bsdiff/xdelta proposal, this has the advantage that we can put up a single .zsync file for each deb, and users can use zsync to reduce the bandwidth required for download regardless of which package they have. Infact zsync will work with the output of dpkg-repack, so you can use zsync to upgrade installed packages for which we no longer have the orignal deb file.
However, zsync does not currently support the "ar" format used by deb, although it should be "easy" to modify to support this format. Even if it were modified to support ar files, it could not be used for debs packed using bzip2 rather than gzip. Furthermore, while zsync reduces the download by 70%, bsdiff is able to reduce the download by 90-95%. This means that bsdiff would make updating over a 56K modem feasible.
For this reason I propose that we have bsdiff based debdiffs against all the files on the official cd(s). This means that the user can immediately upgrade their Ubuntu install, without worrying about bandwidth. Since they have just installed Ubuntu, the cd will be in the drive with all the deb files needing to be patched. Because the bsdiffs are only against the files on the cd, this will only need an additional ~100MB on the servers.
We may also put up debdiffs against files that have been updated in the last 10 days. This means that people can regularly and efficiently keep their machines up-to-date without need for more than a few extra MB on the servers.
* New: Unfortunately users may not have access to the original deb files on their install cd, as the live cd and install cd are to be merged.
* Also I know that many people believe that zsync could not possibly work effectively on zip files if --rsyncable is not used. However, please read how zsync achieves this in this paper
* It is possible that the Ubuntu debs will be switched from gzip/bzip2 to the 7z format. Although it looks possible to modify zsync to support 7z, this would probably not be an easy task.
It is must-have feature. 17 days after Dapper release there are more then 100Mb of updates to packages that ships on cds.
It is not feasible to add --rsyncable support for 7z as it would negate most of the increased compression from switching to that format from gzip. 7z derives most of it's increased compression from the use of very large dictionaries which require even larger data streams to operate on. --rsyncable causes gzip to to compress small blocks at a time rather than the entire stream, which means even smaller dictionary sizes, and thus compression rates, than normal.