Specification

See /Discussion too.

Vision

The benefits of revision control have been well established in software development for decades. Proper management of source code accelerates development, improves flexibility, and reduces the frequency and impact of mistakes. Ironically, some of the largest repositories of source code (Linux distributions) use revision control only indirectly, in a primitive form, or not at all.

By deploying revision control on a grand scale in Ubuntu, we hope to enable more efficient development, expand community development, and improve coordination of work, by simplifying the common tasks of Ubuntu package maintenance.

Goals

  • For every package included in Ubuntu, the source code is stored in a standard location, under revision control, and a working tree can be obtained with a single command.
  • Developers working within the project iteratively download a branch, make changes to it, build, test, commit, push and release a new version of the package to the archive
  • Developers working outside the project branch their own version of the package, make changes, and submit this branch for review by an official developer, who processes them in the same manner as their own changes.
  • New upstream versions can be merged using a simple procedure.
  • New Debian versions (including those which themselves merge a new upstream version) can be merged using a simple procedure.

Risks

We may find that users do not use the VCS branches, preferring a non-VCS based approach. We raise this risk because there are existing tools to allow individual packages to use VCS packaging, but these are not consistenly used by Ubuntu developers. We believe that there are several reasons for this:

  • Packages that are solo-maintainer-maintained do not gain much from a VCS on its own(the package upload history provides a VCS of sorts).
  • Packages that are team maintained are adopting VCS's.
  • The vast majority of debian/ubuntu toolchains assume the lack of a VCS - because of the lack of a pervasive consistent means of managing packages in a VCS.

By addressing the lack of a pervasive presence of VCS data for Ubuntu we will enable developers to assume the existence of a VCS as they build tools to work with Ubuntu, helping to bootstrap the direct use of a VCS for packaging. Secondly, as the phases of the implementation are completed, they will each add a network effect to the benefits of the use of a VCS for each package, increasing the returns gained for individual developers by adopting a VCS for packaging in Ubuntu.

Design overview

The design will include the following high-level elements:

  • Use of Bazaar and Launchpad to manage all code, whether the upstream and/or debian code is using Bazaar or not.
  • High-level tools to manage these Bazaar branches
  • Automatic import of new Ubuntu and Debian package versions
  • Automatic import of new upstream releases

Branches

For each package we will create a number of branches:

Ubuntu branches - managed by both automation in the datacentre and Ubuntu developers:

  • Ubuntu packages: A package-ubuntu branch containing the tree on disk that 'debuild' is able to be invoked on.
  • Ubuntu Non-native packages: Adds an additional package-ubuntu-orig branch containing the contents of the .orig tarball, which allows clear inspection of the difference between the .orig tarball and a given commit of the packaging branch - which is what we need to generate a source package upload (the diff.gz, changes file etc).

Debian branches - managed by automation in the data centre:

  • Debian packages: A package-debian branch containing the tree on disk that 'debuild' is able to be invoked on.
  • Debian Non-native packages: Adds an additional package-debian-orig branch containing the contents of the .orig tarball. This branch allows the ubuntu orig branch to be different(it is on occasion), and for us to merge new upstream releases via debian by merging on both the -orig and the ubuntu package branch.

Upstream branches - managed by automation in the data centre:

  • Release tarballs: A product/series-release branch containing the contents of release tarballs made by the project. This branch provides inspection of the changes made by the packager between the upstream release and the orig tarball (such as removing non-free firmware or documentation).
  • VCS detailed data: A product/series branch containing the per-commit imported history of the upstream branch. This branch enables cherrypicking of bugfixes and merging of crack-of-the-day code into packages.

Tools

Bzr and launchpad provide existing branch management tools for working with branches. Additional logic will be required to load data directly into bzr branches and provide the replacement/enhancement for existing tools such as uupdate.

The general design approach is to extend bzr via plugins to deliver the required functionality, and call through to those bzr commands when we want to make an existing tool work seamlessly with the bzr packaging branches.

One of the key benefits of the branch layout and namespace is that standard toolsets can be used for nearly all common operations:

  • Getting a working tree
  • Building a package
  • Committing changes
  • Merging of new Debian revisions
  • Merging of new upstream releases
  • Branch associations in Launchpad (e.g. bugs and specifications)
  • Personal and team branches (access control)
  • Merging a personal branch into the package trunk
  • Extracting diffs from the package history (e.g. for submission upstream)
  • Browsing source code (and history) on the web
  • Incremental development of changes which are not yet suitable for release (smaller commits)
  • Efficient storage of history (currently we store copies of every new package revision).

Automatic import of package versions

The bzr import-dsc command already reliably imports packages; simply putting it in a small shell script with appropriate output-naming conventions and push-pull with launchpad will do all the needed work.

Automatic import of new package releases

The launchpad product release finder is capable of finding new upstream tarballs, but the links to source packages are poorly maintained. bzr import is already capable of importing tarballs for us - we need to write a script that asks launchpad for tarballs for a source package and performs the import.

Implementation Strategy

Many of the benefits of using a VCS for Ubuntu are delivered if Ubuntu itself uses a VCS in isolation; as more links in the chain between Ubuntu and Upstream that are also in a common VCS graph, more benefits can be realised. Importantly, many benefits for Ubuntu can only be released through pervasive consistent implementation of VCS based packaging.

Accordingly we will implement Ubuntu distributed development in a series of phases, starting with Ubuntu itself and working towards the upstream imports currently provided by Launchpad. Each phase will be fully deployed before the next phase is initiated (though development of the software for the next phase can overlap this clear completion line).

Deep history, while very interesting for upstream branches, is much less interesting for the day to day operations of Ubuntu developers, and as such is not planned for the import of the distribution packaging branches.

Phase 1: Ubuntu source code managed in bzr

  • Goals
    • Source code for all Ubuntu packages in Bazaar on Launchpad.
    • Layout is trivially buildable and uploadable.
  • Benefits attained
    • All the features of launchpad code become usable on Ubuntu packages:
      • Code viewing
      • Merge requests can be sent to maintainers.
      • Granular history of changes
      • Multiple branches can be used when preparing changes.
    • Homogenous documentation and workflow for all packages: commit, build, upload.
  • Requirements
    • Namespace for putting package branches on launchpad
    • script to run across all packages and tarballs, incrementally & ongoing once users start using bzr

    • About 44GB (22GB (all source) * 2 copies (editable and mirrored)) available storage on the launchpad supermirror, storing 2 branches but relying on launchpad's stacked-branches feature.

Phase 2: Merges of Debian using bzr

  • Goals
    • Source code for all Debian [sid & experimental] packages in Bazaar on Launchpad, imported from source packages in the same layout as we are using for Ubuntu packages.

    • Consistent file-ids between Debian package branches and Ubuntu package branches
    • Debian branches can be merged from, and after a merge/sync occurs appropriate branch ancestry is stored.
  • Benefits attained
    • Many of the features of launchpad code become usable on Debian packages:
      • Code/history viewing
      • Multiple branches can be used when preparing changes.
    • Can merge new versions from Debian using standard Bazaar operations
    • Can cherry-pick changes (with upload granularity) from Debian into Ubuntu
  • Requirements
    • Extend the package and orig importer to import debian packages
    • A converter to migrate ubuntu-only history that has accumulated since phase 1 to have appropriate correspondence with the debian package and orig branches
    • About 44GB (22GB (all source) * 2 copies (editable and mirrored)) available storage on the launchpad supermirror, storing 4 branches but relying on launchpad's stacked-branches feature.

Phase 3: Current debian in Launchpad/Bazaar

  • Goals
    • Where debian packages have a VCS tree this is provided as the launchpad Bazaar branch for the debian package.
  • Benefits attained
    • Individual changes made to packages can be cherry picked using bzr
    • Greater detail is available in the debian package branches which can help when sending upstream or reviewing

changes.

  • Changes to be submitted to Debian are able to be done against HEAD rather than $last-upload as the current HEAD can be merged to a temporary branch to allow conflict resolution.
  • Requirements
    • Mapping facility from debian package to the VCS maintained source [the VCS header may be sufficient]
    • The package import logic for debian packages needs to be extended to:
      • Pull data from debian packages with a VCS we choose to support
      • Combine packaging-only-vcs-branches with the orig tarballs to fit our model
      • identify and label uploads with the matching vcs source

Phase 4: Mergable upstream releases

  • Goals
    • Upstream source code tarballs imported into Bazaar
    • Appropriate branch ancestry relating Debian, Ubuntu and upstream branches
  • Benefits attained
    • New releases of packaged software can be merged directly into the ubuntu-orig package branch.
    • Differences between upstream release and ubuntu-orig can be shown trivially via bzr/launchpad code.
  • Requirements
    • Mapping facility from ubuntu and debian packages to upstream release tarballs (the product release finder in launchpad can do this for launchpad product series but there is no comprehensive mapping from source package to series).
    • Deploy the tarball importer on the tarballs found in launchpad.
    • Extend the package branch converter to add links to appropriate upstream tarballs when they exist.
    • About 44GB (22GB (all source) * 2 copies (editable and mirrored)) available storage on the launchpad supermirror, storing 5 branches but relying on launchpad's stacked-branches feature.

Phase 5: Current upstream in Bazaar

  • Goals
    • Upstream source code in Bazaar w/VCS granularity where available
    • Appropriate branch ancestry relating Debian, Ubuntu and upstream branches
  • Benefits attained
    • Unified, fine-grained, browseable history of Debian and upstream
    • Can merge or cherry-pick fine-grained changes from upstream into Ubuntu
    • Changes to be submitted to Upstream are able to be done against HEAD rather than $last-release as the current HEAD can be merged to a temporary branch to allow conflict resolution (which upstreams often prefer/request).
  • Requirements
    • Extend the launchpad-imports system to handle additional VCS's for upstreams that are strategical to us (e.g. the linux kernel)
    • Reliable mapping from source package to upstream VCS branch/configuration
    • Extend the package branch converter to migrate tarball import branches as well, using upstream revisions that match, or otherwise heuristics/human confirmation of an exceptions list.

Design details

Package namespace

It is important to have a predictable namespace to upload branches too. Ideally launchpad will allow package branches to exist in the namespace - e.g. bzr+ssh://bazaar.launchpad.net/ubuntu/PACKAGE/packaging and bzr+ssh://bazaar.launchpad.net/ubuntu/PACKAGE/orig. Until launchpad has such a namespace, we define one that can be used consistently (and thus migrated by launchpad automatically when launchpad gains a packaging branch namespace):

Client tools to support the workflow

Either the standard Debian tools could be modified to work with the new system, or analogous tools could be provided.

Getting the source

To get the source either "apt-get source" needs to be taught to try for an ubuntu packaging branch first, with fallback to accessing the archive, or a different tool written to do the same thing.

Updating to a new upstream

To update to a new upstream either "uupdate" should be taught to import the tarball as a new revision on the upstream branch and merge that, or provide a different tool do the same thing.

Uploading a version

To upload a new version both the source package needs to be uploaded. Either this can be done with modifications to "dput" (it should be given a branch and revision id in the .changes file via bzr builddeb, so that it can do a bzr push to the correct branch as part of uploads to ubuntu), or by wrapping dput in a command that does both steps.

Ubuntu / Debian imports

The bulk of the import code already exists in the bzr builddeb plugin.

bzr import-package [--initial] <distro> <package> <version>

Import a package into the system. The first time a package is imported, the procedure is:

  • run the import-package command - e.g. bzr import-package --initial debian hello-world 1.2.3

    • This will:
    • Obtain the source package .dsc and associated files from the appropriate mirror.
    • setup a temporary shared repo
    • run bzr import-dsc to create a new branch of both the .orig and the packaging data and setup <distro>=<version> tags.

    • Push both branches to launchpad
    • delete the repo

Once these branches have been set up, importing subsequent package versions is straightforward:

  • run the import-package command again - e.g. bzr import-package ubuntu hello-world 1.2.3-0ubuntu1

    • This will:
    • Obtain the source package .dsc and associated files from the appropriate mirror.
    • setup a temporary shared repo
    • pull down the existing distro and distro-orig branch for this package.
    • run bzr import-dsc to add a new revision to the packaging branch (and -orig branch if needed) with appropriate history against the parent distro where relevant tags exist.

    • Push the new commits back to launchpad
    • delete the temporary repository

Running this script in a wrapper script that checks for new versions, in a loop will generate a mirror in bzr of all of ubuntu and debian automatically with low latency. Ideally we can hook this into the end of the package build process of soyuz once the kinks are worked out of the system, to provide for instant-availability.

In doing these imports race conditions exist where a user uploads a package and the packaging branch has been separately changed. In these cases the import will:

  • Append a commit to the packaging branch which contains the uploaded package source verbatim
  • File a bug on the package explaining that the separate changes have been backed out, and providing a bzr merge command to resurrect them.
  • CC the bug filing mail to the committer-id's of the commits which were reversed.

Upstream Imports

bzr import-upstream <product series> <upstream version number>

Import a new upstream release wholesale as a new revision on the project series tarball branch for the package. This is used in cases where Ubuntu is packaging upstream directly, rather than using Debian's source package, and could be done automatically based on a debian/watch file.

This allows new branches to be based directly on upstream code, and future upstream releases merged in.

The command is a simple wrapper around 'bzr import' to:

  • Get the release tarball from launchpad
  • Get the current branch from launchpad
  • import the release tarball onto the current branch, adding links to the upstream import branch if it exists.
  • push that branch back to launchpad

This should be run as part of the launchpad services connected to the upstream release finder; however during busy periods developers will likely need to run this themselves (consider when gnome releases and seb snarfs-and-uploads the entire release in a very short timeframe).

Uploading

NB: This is already in hardy except for the ppa changes distribution header support.

bzr build-deb [--to gutsy|hardy|...]

Marshal a Bazaar working tree into a Debian source package suitable for upload. This will tag the release in the bzr branch.

Branching for Ubuntu

branch-package --base debian|debian-orig|upstream-releases|upstream-vcs

Create a new Ubuntu branch based on:

  • The debian branch (source package derived from Debian)

  • The debian-orig branch (.orig.tar.gz shared with Debian, separate packaging)

  • The upstream-releases branch (based directly on upstream releases)

  • The upstream-vcs branch (based directly on upstream VCS)

Fixing Direct Uploads

When someone uploads directly without tagging in bzr the package importer will have filed a bug and sent mail about the discrepancy.

apt-get source package
cd working-directory
bzr merge -r <MERGE_STRING_FROM_BUG>
# work on package as normal

Rationale

VCS vs. Tarballs

It is generally acknowledged that a more complete data model is conceivable, wherein the graph also includes revisions from upstream VCS. In such a model, the upstream-releases branch would be derived from the upstream-vcs branch. In addition to being more elegant, this would provide the following tangible benefits:

  1. cherrypicking of patches from upstream-vcs branches to ubuntu branches using Bazaar

  2. reduced storage requirements through shared Bazaar repositories

However, at present, these benefits are not believed to outweigh the following problems:

  1. There is no reliable way to determine the correct upstream VCS revision from which the release tarball is derived.
    • A number of heuristics are possible, including measuring the size of the delta between the tarball and each revision, but none are considered to be accurate enough for unattended use. Automation is essential for scalability, as any approach which requires the confirmation of a human will not scale to hundreds or thousands of packages per developer (as is the case in Ubuntu).
  2. A substantial subset of the upstream VCS code for Ubuntu is not available in a standard VCS
    • While a facility exists for importing code from other VCS implementations into Bazaar, it is limited by:
    • an inherent unreliability derived from the need to reconstruct information missing from the upstream VCS (raising the same scalability concerns as above)
    • a lack of support for upstream branches (a common case which may be addressed in the future)

In addition, a significant subset of Ubuntu packages do not have an upstream VCS, yet still require package maintenance. This means that our solution must be general enough to handle this case, and therefore the proposed solution is believed to provide a useful foundation for future VCS-based work if and when solutions to these problems are found.

Open Issues

  • Package derivation sometimes changes, e.g.:
    • Rebasing from Debian to upstream
    • Rebasing from upstream releases to upstream VCS
    • Merging a new upstream revision which hasn't landed in Debian yet
    • These arrangements can all be represented in this model, but we are missing:
      • A way to record which is the focus of development (the default branch that developers get for the package)
      • A means to rebase third party branches across such a change
  • Some packages are already maintained in revision control. Where do they fit in?
    • Branches which contain only the debian directory

    • Native packages maintained in Bazaar
    • Packages branched from imports of Debian VCS
  • There are some corner cases in the package import process which need to be addressed
    • Mismatched .orig.tar.gz between Debian and Ubuntu

  • Putting the entire .orig.tar.gz under revision control leaves you vulnerable to the timestamp handling of the revision control system, and can result in autostuff rebuilding files you didn't intend it to. (Being fixed in bzr).

  • Make sure the tarball never contains .bzrignore

  • Need to establish which tool is responsible for downloading ../...orig.tar.gz for package builds, or if we can do without it somehow

  • Should branches such as debian-orig vanish when they're unnecessary (e.g. native packages)? This could be hidden by the tools so that you don't need to care whether they're there or not.

  • A procedure for importing a series of Ubuntu uploads (for history) would be useful. import-package could operate on an existing branch if present.

  • Will Ubuntu developers need to upload upstream release tarballs (seems likely, so they should have upload access to those branches? - or they block on uploading their -orig tarballs? - or we allow -orig tarball to come later ?)
  • product-release-finder and product-series <-> ubuntu package mapping?

  • branching for ubuntu - branch-package seems no different to 'bzr branch $base'
  • converter logic needs description
  • combining of packaging-only-branches with origs to get full-trees needs explanation

Future Directions

This framework will also provide the basis for future experimentation and development in the following areas:

  • Automatic source package uploads when a new package release is tagged
  • Automatic merging of a new source package version in Debian, or a new direct upload to Ubuntu
  • Integration with upstream VCS
    • Tracking of patches which are extracted from upstream VCS
    • Packages based on VCS snapshots rather than upstream releases
    • Ability to merge (or cherrypick?) directly from upstream VCS representation in Bazaar
  • Management of separated patches (using Bazaar looms?)
  • Migration to full upstream VCS imports at some future date

Debian/Ubuntu Patches

An ideal model would also represent patches (e.g. debian/patches) as first-class objects which can be manipulated. This would provide the following tangible benefits:

  1. More intelligent merging
    • With the approach described herein, there are merge conflict scenarios which Bazaar cannot detect because it is not aware of the patches (changes introduced by upstream may cause a patch in debian/patches to fail to apply).

  2. A standard (and automatable) process for iterating, adding, removing and modifying patches in distro-specific formats
    • This would create the possibility for building higher-level tools which work with these patches

However, the following problems make this impractical for us at present:

  1. There is a proliferation of different patch formats in use in Debian, many of which are difficult or impossible to operate on programmatically
  2. There does not yet exist an equivalent representational facility in Bazaar (though "looms" show some promise here)

In light of further developments here, there should be no reason that such a capability could not be added to the design as an extension in the future.


CategoryDistributedDevelopment

DistributedDevelopment/Specification (last edited 2009-11-02 14:40:42 by lec67-4-82-230-53-244)