This is the specification for the importer part of the Distributed Development effort. The status of the blueprint can be tracked at

Summary

In order to allow some features of a version control system to be used in Ubuntu development, regardless of the package being worked on, bzr branches of the history of the uploads to Ubuntu will be made available for every package.

Release Note

None, this should not affect users.

Rationale

For various tasks having the history of the package in a version control system, in addition to the source packages, can be much more efficient. Currently only a limited number of packages are available in version control, and there is not one single version control system in use. There are tools such as debcheckout that help to work around the last part, however, you still need to know how to get the information from the particular version control system, and some only version the debian/ directory.

To provide the advantages for all packages, and to make sure the developer can work with the branch that is available then all the history of each package in Ubuntu will be made available in bzr branches.

If supporting tools are then provided it should provide a convenient way to inspect the package history.

Use Cases

Amanda would like to find out in which version, and by whom, a particular line was introduced in to the gcc package. She grabs the gcc branch that is available with a single command, and then uses "bzr gannotate" on the file and quickly finds the revision. She can then see who made the change, and see the change along with its changelog entry and the other changes made in the same package revision.

Kiran and Pete want to work together developing a major change to the packaging of postfix. They each grab the branch from launchpad with a single command and start working. They can then push their changes to personal branches on launchpad, and merge from each other to share their work. When they have completed their work they can upload the package. They can either let the importer take care of updating the package branch (which would lose the granularity of commits they would have made), or push to the package branch and tag it is having been uploaded. If they cannot upload the package themselves then they can get sponsorship, and also ask their sponsor to push to the package branch.

Assumptions

Design

There will be an importer tool written that given a list of source packages will turn those in to two branches. One branch will be the "upstream" branch that is just the ".orig.tar.gz" parts of the source packages extracted in sequence and committed. The second branch will be the "ubuntu" one, which will be the full extracted source packages committed in sequence. There will also be extra parents added to link to the upstream branch where appropriate (first upload with a new upstream version).

These branches will then be made available on launchpad, under a new namespace specifically for source package branches. The exact details of the launchpad side are still being worked out, but we have broad agreement on what it will look like.

For the source package branches:

   http://bazaar.launchpad.net/<distro>/<suite>/<package>

e.g.

   http://bazaar.launchpad.net/ubuntu/hardy-updates/gcc

there will also be +trunk to refer to the current development release, e.g.

   http://bazaar.launchpad.net/ubuntu/+trunk/gcc

Personal branches will still live under ~<user>, but will have a similar form, e.g.

   http://bazaar.launchpad.net/~james-w/ubuntu/hardy-updates/gcc/fix-23456

On the bzr client side, and possibly with redirects on launchpad, some of the parts may become optional to save having to remember all the parts.

As well as the initial importer there will be an incremental importer that watches for new versions being published and appends them to the correct branch. This means that the package branches should always be up to date with the archive, perhaps with short delays while the import happens.

The source package branches may be read only for the initial stages of this project. If they are not then the write access control will follow exactly that used for soyuz, so that you can only write to a package branch if you can upload the package. This should prevent anyone trying to abuse the system.

If the branches are not read only then there are two cases of people pushing to the branch that we need to handle. The first is when they are pushing as they upload. We will trust them to push exactly what they upload, with the client side tools helping to ensure that this will be the case (we could check on the server side if there are concerns). The client side tools will indicate to the importer that this is what the developer did, and so prevent the new upload being imported.

The second case is when the developer pushes, but does not upload. There are two concerns with this. One is other people grabbing the branch and thinking that the changes that have been pushed are in the archive when they are not, and the other is someone else uploading the package without integrating the changes. For the former case we may be able to make the tools indicate to the developer when this is happening, but we need to make developers understand that this can happen. For the latter case we can have the importer detect when this is happening. If it detects that this is happening it will do the importer squishing the changes on the branch, and then inform the developers that made those changes that they have been squished and tell them how to resurrect them.

Implementation

The initial importer will run across the whole archive creating a branch from each of the uploads to Ubuntu for that package. Tags will be added to each revision that reference the version that is contained within to allow them to be found quickly later. These tags will be based on the version number. As it is possible to upload the same version number to both Debian and Ubuntu, but have the content be different the tags should include the name of the distribution to avoid tag conflicts when merging. It would also be possible to do this for different suites within Ubuntu, but this should be very rare if it does happen, and so hopefully including the suite shouldn't be necessary.

The importer will be based on code that currently resides in bzr-builddeb. There are plenty of changes to make, caused by several things, such as differing requirements, bad design decisions in that code, and of course bugs. That does have the core code needed for extracting the needed information from a source package, and then importing that on to a branch.

The importer will take each source package that it is given and then unpack it in two steps. The first step is to unpack the upstream part of the source package, and import this on to the upstream branch of the code. It then imports the full source package on to the debianised branch, and then commits it with the just committed upstream revision as one of the parents. This means that a branch is created that has a revision history something like

            0.1       0.2             0.3
upstream     *---------*---------------*
              \         \               \
debianised     *---------*-------*-------*
             0.1-1     0.2-1    0.2-2   0.3-1

and running bzr log in the debianised branch would give

revno: 5
...
  [details of version 0.3-1]
    -----------------------------------------
    revno: 1.1.2
    ...
      Import upstream version 0.3
-----------------------------------------
revno: 4
...
  [details of version 0.2-2]
-----------------------------------------
revno: 3
...
  [details of version 0.2-1]
    -----------------------------------------
    revno: 1.1.1
    ...
      Import upstream version 0.2
-----------------------------------------
revno: 2
...
  [details of version 0.1-1]
-----------------------------------------
revno: 1
...
  Import upstream version 0.1

The main part of the work needed to be done on the bzr-builddeb code to be able to do this is to change it to get the order of the parents correct (a mistake in its development meant that the parents were the wrong way round, and so the example log above would have been different). It could also be changed to use dpkg-source to extract the source package, rather than importing directly from the .orig.tar.gz and the .diff.gz parts. It could also use more information from the source package when committing the revision, for instance setting the uploader as the author of the revision.

The incremental importer will then use the same format of tags for its additions. Before importing a new version of a package it will check for the tags, and if they are present do nothing. This allows a developer to tag when they make an upload and push their changes directly, preserving their commit history since the last upload.

When the importer detects that there is a new package upload that will overwrite some changes made in the VCS and not uploaded it can do one of two things.

  1. Move the branch off to one side, and then import the new revision on top of the last uploaded one, and then
    • mail the developers who had their changes stomped on to let them know and ask them to merge it.
  2. Simply add the import on to the tip of the branch and again email the developers who had their changes stomped
    • on and tell them how to resurrect them (again it's a merge, but it looks slightly different).

The second has the advantage that it doesn't cause existing branches to diverge, but it does introduce extra revisions into the history that may confuse someone looking at it. The current preference is for 2.

Informing the users should email them somehow, and could file a bug. One solution would be to file a bug and then subscribe the relevant users. This assumes that the email address used in whoami is always present in launchpad. If it is not then we could just email that address and point them to the bug. If it is not a valid address then there is not a lot we can do.

UI Changes

The bzr command line client should have the lp: directory service updated to be able to fetch distro source packages, e.g.

      bzr branch lp:ubuntu/hardy/gcc

It would also be possible to slightly extend it to support

      bzr branch ubuntu:hardy/gcc

Also, there will be a revision spec added that allows you to specify the revision corresponding to a particular version of the package easily.

      bzr diff -r package:0.3.2-1..package:0.3.2-1

we may also want a package:latest so you can do

      bzr diff -r package:latest

(I'm not a huge fan of package for this.)

For the case where we need to disambiguate the Debian and Ubuntu versions there could be debian: and ubuntu: ones as well (comes later when we have Debian branches as well.)

There will be a new command/command set added specifically for Ubuntu development. This will be a wrapper over bzr that will remove some of the arguments that can be guessed as we know it is for Ubuntu. (Note that this does not exclude Debian, it should be easy once Debian branches are available to make it work for Debian development as well.)

An initial sketch of the commands could be (this is a very early draft)

      ubzr setup james-w

This creates the working area, does lp-login with the id provided, looks up the primary email address for that id and sets it as the whoami for the area below the working area.

      ubzr branch gcc

This grabs a branch of gcc below the working area, in a package-specific shared repository.

      ubzr upload

This would check the user had appropriate permissions, build the package, upload it to the archive, then tag the release and push it back to launchpad. If they didn't then it push the branch to a user branch of the package and do whatever it needed to request sponsorship.

Code Changes

A broad overview of what we need:

Test/Demo Plan

Firstly the results of the importer need to be checked, so this will be:

For the bzr changes try the following things:

For the Ubuntu specific client:

Outstanding Issues

Further stages of this idea will include having branches for Debian of each package as well. Once the importer is written it could be set to work on Debian as well, however we require that the shared history of the branches be reflected correctly in the branch. It would be possible to write a shared importer that imports both at once, which would save having to stitch the histories together later (requiring everyone to change their branch to keep up), however it is not clear yet whether trying to do this would be enough work to delay having any branches at all for too long. The feasibility is being investigated. If it is decided to have imports of Debian available at the same time then it doesn't change the meat of this spec, it just allows more uses, such as merges in bzr.

Where the branches can be made available until the necessary changes in launchpad can be made. There are two options here:

The launchpad team need to be consulted on this I believe.

Wherever the branches are hosted can be made transparent to the client users as they can be handled by the directory service. Having a good way to transition should be thought about.

Discussion

Please make any comments here:


CategorySpec
CategoryDistributedDevelopment

DistributedDevelopment/ImporterSpecification (last edited 2009-04-02 09:32:53 by i59F72099)