Launchpad Entry: distributed-development-debian-import
Packages affected: bzr, bzr-builddeb
The next stage of DistributedDevelopment/Specification is to provide branches of both Debian and Ubuntu, with shared revision history. This specification is about how we get there from where we are now.
Ubuntu development is based on Debian, most of our packages come from there, and we regularly merge changes from there. Just having Ubuntu in bzr has limited benefit without also having Debian in bzr, so we want to make Debian import branches available, with a shared revision history, allowing you to merge from Debian in bzr.
- Anne wants to merge the upload that was just done to Debian experimental in to the Ubuntu package.
- It's deep in to Ubuntu's freeze and Robert finds a patch for a critical bug in a recently uploaded Debian package. The upload also contained lots of changes not suitable for that point in Ubuntu's release process, so he wishes to just merge the critical bug fix.
- Thomas wants to examine the differences between the Debian and Ubuntu packages, and then find who changed a particular line that is causing him problems.
- Lara wants to take the changes in the Ubuntu package and send them as a diff to the latest Debian package to the Debian maintainer.
- We don't have a large amount of history of uploads to Debian. We can only rely on having the versions in the various releases, plus a little more that can be found on e.g. snapshot.debian.net
- Investigating the full history of changes to Debian using these branches is not going to be a common activity.
- Few people have work based on the current Ubuntu branches as they are currently hosted read-only.
- Debian rarely includes Ubuntu changelog stanzas if it merges changes from Ubuntu.
Debian branches will be created in much the same way as Ubuntu ones were. They will include revisions for each of the package versions that can easily be found. These branches will then be made available under a "debian" namespace so that they can be browsed/branched/merged as needed.
As part of this process the Ubuntu branch history will be re-written so that it merges from the Debian branches at appropriate points, as determined by the changelog. This serves two purposes. The first is to give a richer representation of history. The second is to give bzr the most information we can, so that merges can reduce the number of spurious conflicts. Depending on when the last merge was done from Debian, and how the packages have changed since there may unfortunately be a few packages where the first merge will be more painful than it has to be. This is a limitation imposed by the lack of available Debian history.
As the branches are being re-written records of what is being done will be created. These will form a map from original ubuntu branch revision to new ubuntu branch revision. These maps can then be used by an external tool to perform the same translation on any copies of the Ubuntu branches that people may have. This means that outstanding work does not have to be discarded when the Ubuntu branches are re-written.
Two lists are first built, one for Debian, one for Ubuntu of all the package versions we have available for import. These two lists are then merged in version number order, with Debian coming first if there are equal version numbers. This list is then processed, with the package version being imported on to the correct branch.
As each version is imported on to a branch the added changelog entries are searched for in the other branches, and if they are found then a merge parent is added to the resulting revision. This means that if one distribution merges from the other this will be reflected in bzr, as long as the changelog stanzas were preserved. This is the norm for Ubuntu, but not for Debian, so most merges will be one way.
The before and after Ubuntu branches can then be compared to extract the revision id maps that are needed for the re-write tool, as we know that the versions in the two sets of branches will be the same.
The rewrite tool is what will be made available to those who may have branches of the existing Ubuntu branches that they may wish to preserve. Any exact copies could be discarded and new branches made, but any locally comitted revisions would have to be preserved using this tool.
The tool is a fairly straightforward tweak to the rebase tool. It first searches backwards through the revision history of the branch to be re-written to collect all of the revisions that are not present in the re-write map. It then rebases those revisions on top of the corresponding revisions in the new branch. This is guaranteed not to conflict, as there are no content changes in the new revisions.
This tool will be made available as an external tool for Jaunty, so that the time pressures of getting it in before feature freeze don't impair the quality. This should cause little disruption as few people will need to re-write branches.
Richer history representation
Just basing the history representation on the inclusion of changelog entries leads to a poorer representation of history than would be desired. There are several strategies that could give a richer representation than we have. This representation can be expressed in a few places.
- Revision parents
- Revision properties, including the author, etc.
- Commit messages
There are a few packages that were created in Ubuntu, and are now in Debian. Some did not preserve the changelog entries from Ubuntu before the package was uploaded to Debian though, so this will not be represented in the revision history. In order to represent history more accurately these packages can be special cased to initialise the Debian branches from the Ubuntu ones. This will accurately reflect the package's origin, regardless of the changelog history. The list of packages will be taken by reviewing the list of packages where the changelog in the earliest version of the package recorded in Ubuntu contains an Ubuntu changelog entry.
It would also be possible to add merges of Ubuntu to Debian revisions where the Debian package includes a change from Ubuntu. The problem with doing this is that the extra parent tells bzr that Debian reviewed all of Ubuntu changes and rejected any that are not included, when they may in fact have just reviewed one or two. This would lead to bzr doing the wrong thing in later merges.
Therefore we must only include these extra parents when we know that all changes were reviewed. We could add the parents where Debian did a full merge from Ubuntu. This could be spotted automatically if the merge of the current Ubuntu branch in to the old Debian version had changes, but the merge in to the new Debian version didn't (accounting for the maintainer field change). There may be cases where the merge does not reflect what actually happened, if Debian came up with the same solution independently. It's not clear whether this would be a problem.
This approach would give a little bit richer history, but would be expensive to check on every package version, so we will not pursue it.
The author of the revision is taken from the footer of the last changelog entry, who will be the person that prepared the upload. There is a convention for indicating multiple authors of the version in the changelog body, which could be parsed out. However, bzr only allows for a single author to be specified. There is the committer field as well, but that is currently set to be the bot that does the imports so that it is clear it was an automatic import. While this is useful information this field could be used. However, this still only leaves us with two places to put authors, when there may be more than that. Inventing extra revision properties would allow us to store the extra information, but as there is no real standard they may not be used everywhere. It is possible to hook in to bzr log from bzr-builddeb to make it display the properties, so we can do that if there are suitable things to store.
If we were to use committer for the person named in the footer of the changelog, and use the author for the person that did the bulk of the work on the change if different then it leaves us to decide who did the bulk of the work. This would have to be a manual step, and would be far too time consuming to be feasible. Any suggestion that Ubuntu contributors are placed there if they contributed to the Debian upload would be a bad one, as it would be raising their contributions above that of others when they may have done a minor part of the work on the upload.
We will list the extra authors in custom revision properties. We can extract this from the changelog using the multi-author convention of "[ Name ]", and when someone is thanked, where that can be parsed. This will allow us to more easily query for this information later.
We could add a revision property that notes when an Ubuntu person contributed to the Debian upload, but again this wouldn't be shown by any tools, and so would be of little use, except for statistical purposes. Therefore we will not do this; other efforts, such as usertagging are more effective for tracking this.
One revision property that is standard and could be used is the bugs one. It would be possible to automatically fill in more of this information as the importer runs. Where "Closes:" or "LP:" is mentioned in the imported changelog entry the property can be set. Using launchpad we can also look up the bug number in the other distribution, when reported and linked, as well. This would allow us to record when an Ubuntu developer fixed a bug in Debian as well. As this will often be a bug where the Ubuntu developer forwarded the patch to Debian highlighting the Debian bug will draw attention to that contribution.
While we can't edit the changelog that is stored, we could tweak it when it is being used for the commit message. This would allow us to highlight certain things to people who read the commit message.
A common convention when including a patch that is attached to a bug report (as is usually done with forwarded patches from Ubuntu) is to thank the submitter, e.g.
... thanks James Westby
... thanks James Westby <email@example.com>
One thing we could do is transform the former in to the latter when it is a known Ubuntu contributor. Just having this in the revision message wouldn't highlight the contribution though, so we will not attempt to do this.
We could also use the usertags to try and infer this. For instance if the upload closes a bug that is tagged with "ubuntu-patch", but doesn't thank anybody we could add the name and email of the person that submitted the patch to the commit message. There are however some concerns about stealing credit with this. For instance the Debian Developer may fix the bug in a completely different way to the Ubuntu supplied patch, and so having the Ubuntu person mentioned in the changelog when they contributed little to the fix would not be appropriate. Also, the person that forwards the patch from Ubuntu isn't always the person that made the patch. While the forwarding of patches is a valuable activity just crediting the forwarded and not the author would be unwise, so we will not pursue this i
There should be few UI changes, as everything is already written with the idea of having branches for multiple distributions.
The code is already written to handle updating both sets of branches in an ongoing fashion, maintaining merge links between them. It needs to be extended to do the initialisation, but with the existing code this shouldn't be too much work.
The rewrite tool needs to be written, but is similar to the rebase plugin, so may be able to re-use code from that.
The Debian branches and new Ubuntu branches will be prepared in a separate area, with the old Ubuntu branches available as before. Once we are satisfied that the new branches have all the information that we need they will be deployed in place of the old branches. This will cause any existing branches people have to refuse to pull or merge from the new branches. We will be sure to advertise this widely first, with clear instructions for how to proceed. This will involve running the rewrite tool on the branches.
If the timing means that we reach the point when we are ready to this at about the same time that we are ready to move the branches to Launchpad then we can combine the two transitions. We will push the new branches to Launchpad, and then remove the old ones. Doing both at once should reduce the interruption for anyone using the current branches.
If the transitions are combined then we can give users of the old branches a slightly better error if they still try and access the old branches, by slightly abusing bzr's branch pointers. We can replace the current branches with branch pointers that include a helpful message. This would then be displayed to the user as part of the error message when bzr attempts to resolve it. It wouldn't be pretty, but would save a little confusion for anyone that misses the announcements.
There are various things that must be tested. Firstly that the new branches reflect history well and contain all the necessary content. This is hard to verify for every revision, so a sampling method will be used. Secondly the mapping between the old Ubuntu branches and the new Ubuntu branches must be comprehensive. This test can easily be automated.
The rewrite tool must be well tested. It should have unit-tests, and then be tested on various test branches to ensure that the results are as expected.
There are numerous demos we can do. For the transition we can show the use of the rewrite tool, but there will be a small audience for this. There is more use in demos of using the Debian branches for work, interrogating them, merging from them, preparing diffs to send back to Debian with them.
- Identifying the packages that originated in Ubuntu.