HctAndBazNgConvergence

Converging HCT/Bazaar with BazaarNG

Status

Introduction

HCT will make use of the Bazaar NG API's. Bazaar NG will deliberately expose such API's as are needed to let HCT cleanly use its facilities. Bazaar NG will also update its design based on lessons learnt from Bazaar NG.

Rationale

HCT is a package management specific revision control. Like Bazaar it needs to support Bazaar-NG as Bazaar-NG becomes stable. We need a roadmap with clear steps and identified techniques or further-specifications-needed included in it.

Scope and Use Cases

HCT is a tool built on top of baz used to track upstream, Debian and Ubuntu versions, and patches applied at each version. It can import and unpack Debian source packages, and merge Ubuntu changes with others.

HCT creates a very large number of branches, on the order of 100,000. While the kernel creates a very large tree with many revisions this exercises a different dimension: a large number of branches.

HCT uses an interesting design where patches are represented as descended vertically from the upstream version of the package and horizontally as an evolution of a previous version of that patch. In Arch this is represented by a combination of tags and patch-log updates. Similar mechanisms are present in bzr.

Issues to be considered:

  • Performance with very large numbers of related branches; time to create branches, disk usage, and time to find merge basis points.
  • Is it sufficient to represent branches as directories when there are this many?
  • Import performance could be better than in baz.

HCT as a meta-versioning tool

HCT provides a level of meta-versioning on top of simple version control. This pattern is already seen in other systems, including distribution maintainers and kernel hackers. That is that patches evolve in their own right, and are then mixed in or out of the final version. This is arguably harder to understand than simply modifying, branching and merging the source tree, since everything is done at one level of indirection. It may be that when sufficiently good version control tools are available it will not longer be attractive. But there are some reasons to think this practice will persist since it allows separation of interwoven streams of development.

Although it is currently used for assembling .deb packages, HCT might also be used for assembling collections of patches instead of Quilt.

Bound branches

HCT may provide a good application for BazaarNG bound branches. Use case:

  • Several developers co-operate on a large package such as the Ubuntu Linux kernel. They want to directly collaborate without needing to generate a package, upload it, etc. It is not convenient for them to all work in the same directory or in the same machine.

    BoundBranches allow them to have separate working directories viewing a shared history, as with CVS. When one of them commits a change to a patch branch that change can be integrated by other developers.

This would probably require that the HCT manifest also be versioned. HCT would need to have an option to make bound branches, and possibly a wrapper around other operations.

Manifest versioning

HCT has a "manifest" which describes how the source package is assembled from components such as tarballs and patches. This corresponds to the .dsc or .spec file, or Quilt series file.

At the moment this is maintained in the LaunchPad database and locally stored in an {hct} directory. The filename within this directory is named so Bazaar 1.0 will treat it as a precious file and never allow it to be committed within revision control; the directory itself is named to fit the same pattern used for the {arch} directory. The reason for this directory is to act as a place to collect the branches that make up the source package, and it can be initially empty

Looking ahead to Bazaar-NG this top-level directory has less importance, the branch/directory duality means that we could just remember the locations of the directories on the disk and use the top-level ~/.hct directory to record where the user was last working on a branch. This has the deliberate bonus that if the user moves or copies the branch to a different location, it isn't used for assembly (as in the current system within the source directory).

Manifests could then also be stored in the top-level ~/.hct and version-controlled.

Implementation Plan

Much of what HCT needs in the version control system is already present in BazaarNG. HCT does not at present make use of the version control system's merge functions, this is because it has to already do much of the heavy lifting to work out what merge functions it would need to perform so it is easier (and far faster) to simply create a delta and apply it.

With Bazaar-NG, because HCT can utilise bzrlib directly, it could use the code there to perform both pieces of heavy lifting; in effect, using the bzr merge functionality but in two parts.

bzrlib makes some distinction between internal and external interfaces and this should be continued.

Data Preservation and Migration

The main data migration issue is making sure bazaar branches can be losslessly converted into BazaarNG, which is already a priority for Bazaar and NG. See BazAndBzrConvergence.

Lessons from HCT

Some design ideas from HCT can be reused in BazaarNG (others are already the same):

  • Command objects, rather than command functions, as a way to attach other class properties such as the command grammar and help.
  • More-developed multi-level API: command-line API, high-level commands for use by GUIs, large operations and then manipulation of internal objects. There is some separation but it should be increased.

Performance

A few performance problems can be fixed by switching to BazaarNG:

  • Because everything can be done in-process in Python, there will be less repeated startup cost of forking baz and having it reread the inventory. (Some parts of baz have already been reimplemented inside HCT in Python to avoid this.)
  • A major performance problem of HCT is repeated fetches of related trees. This can possibly be better with BazaarNG because we only need to fetch the bulk of the history once, and then apply the updates to get each version. These trees can be hardlinked and BazaarNG will do the right thing.

Outstanding Issues

HCT currently uses diff/patch to assemble all the diffs into a final assembly. This fits the way Debian packages work. It might be useful to also offer the option of assembling using BazaarNG merge or changesets.

Warning /!\ this isn't an outstanding issue, but a question for the future. That is, it can remain outstanding, and this spec can be approved. SteveAlexander

LaunchPad dependency

HCT currently depends on LaunchPad to run. It might be useful to make this an optional dependency which would ease adoption by e.g. kernel developers who want to do meta-versioning of patches but with the manifest only stored in their local tree. (Or perhaps they can be persuaded to store this in LaunchPad.)

Warning /!\ need to see mark's opinion on this. I suspect that if we don't add this to HCT eventually, someone else will. SteveAlexander


CategoryUdu CategorySpec

UbuntuDownUnder/BOFs/HctAndBazNgConvergence (last edited 2008-08-06 16:41:01 by localhost)