BazaarImports

Differences between revisions 13 and 14
Revision 13 as of 2005-04-26 01:08:55
Size: 7529
Editor: intern146
Comment: update some bits
Revision 14 as of 2005-04-26 01:18:01
Size: 7595
Editor: intern146
Comment: schduling issues
Deletions are marked like this. Additions are marked like this.
Line 165: Line 165:

Scheduling issues with three people operating the same system.

Bazaar Imports

Status

Introduction

This specification discusses improving and accelerating the bazaar CVS, SVN and Bitkeeper import process in order to make HCT able to leverage upstream changes on an ongoing basis.

Rationale

Baz is emerging as an excellent tool for distributed software development. In order to accelerate the adoption of baz, we are making available imports syncronised on a daily basis with upstream non-Bazaar repositories. The current software works but the process is too long. We need to identify what needs to be corrected, what pieces are missing, and plan to do that.

Scope and Use Cases

On August 1st 2005, 1000 upstream version control archives will have been imported as Bazaar archives and new upstream commits will be translated daily into new Bazaar revisions.

ScottJamesRemnant has also committed that on this date, both release tarball and ubuntu source package imports will have been performed and made available in the same system.

Bazaar imports will enable the full functionality of HCT, improving the productivity of the Ubuntu team and derived distributions.

Regularly updated Bazaar archives tracking upstream repositories in centralised version control systems will make it easy for community members to benefit from version control even if they do not have commit access to the repository.

"Ahead of time" archive conversion will make easy for established projects to evaluate Bazaar as a replacement for CVS or Subversion.

Implementation Plan

To meet this target we need to import, on average, 15 new source repository per working day. The entire Bazaar team will be employed to reduce the work load on each member.

In order to meet the target for release tarballs, we will need:

  • A process to examine upstream FTP/HTTP sites and populate the Product, ProductSeries and ProductRelease tables; placing the source files themselves in the Librarian.

  • A job to take the manifest-less ProductRelease records and import them using Sourcerer.

In order to meet the target for source package imports, we will need:

Computing Bottlenecks

SSH connection caching

The current archive publication scheme involves the creation of one SSH session for each revision, even if it was already imported, between the machine performing the sync and arch.ubuntu.com.

That bottleneck can be removed by changing the SSH configuration to perform session caching.

Huge logs

Buildbot has a performance bug in handling big text logs. Combined with the high volume of logging produced by CSCVS it makes it impossible to import big repositories like OpenOffice or XOrg.

This performance problem can be fixed in two ways:

  • Do not accumulate logs in memory, but use file storage instead.
  • Truncate logs to keep only the last few thousand lines.

The latter solution would be simple to implement.

Although it discards information, that should not be a problem, since failures are generally diagnosed by examining the tail of the conversion log. More difficult conversion problems will be diagnosed by driving imports with a command line tool instead of Buildbot.

Hardware

Each repository must be updated daily. To keep 1000 repositories in sync with only one machine, we need an average sync time of about 90 seconds, and that does not account for cost of new imports.

The initial import of a repository is an expensive operation in terms of CPU and I/O bandwith. To achieve the 15 syncs/day (on average) target, additional hardware will be provided.

  • One system to run the botmaster.
  • Three systems to run imports and syncs.

Baz vs. Buildbot

Bazaar development will be deprioritized in favour of source imports.

However, a level of activity must be maintained to animate the user community.

  • Integration of community contributions.
  • Bug fixing.

That is especially important since the Arch/Bazaar community has been repetitively damaged by lack of responsiveness from former the main developer. The current level of activity has been crucial in building community trust and enticing users to switch from GNU Arch to Bazaar. If the Bazaar project became unresponsive, that would also damage the company image in the community.

Product Data

Earch version control repository store in Launchpad is associated to a Product, which includes a human-written product description. Editing original and high quality product descriptions for projects one does not know about is a time-consuming and unrewarding activity.

To allow source imports to proceed, the description of newly created Products will be copied from the Debian package description.

Certifying Imports

For this volume of import we cannot afford manual sanity checking of imports.

The current conversion system does some sanity checks during the import process and compares the end result of the import to the HEAD in the upstream repository.

However that does not garantee that the annotation would be correct. It would be possible, but non-trivial to compare the annotated form of HEAD, that would provide a stronger garantee of correctness than a simple comparison.

But well... it does not seem all that important...

Ongoing Operation issues

A small percentage of syncs fail, because

  • upstream modified the repository in an destructive manner
  • CSCVS bugs

Need to implement better error reporting to identify sync failures.

Import initiation process

The process from "this repository has full RCS information" to "this repository is in sync" still requires many manual operations. A new process must be implemented.

Data Preservation and Migration

In place conversion of Bazaar archives to newer archives formats as Bazaar converges with Bazaar-NG.

Important to provide a visible migration path for users, where milestones are new revisions of the archives format.

Packages Affected

The 1,000 source packages selected are those in the Ubuntu main repository which have upstream revision control systems.

Eventually all of universe will also be imported, but that is outside this specifiction.

User Interface Requirements

Requirement for a user-interface for the Bazaar team to manage imports.

Outstanding Issues

Manual handling of CVS module aliases.

Telling server outages from conversion failures.

Implementation of error notification.

Sync scheduling, broken by slave bouncing.

Testing environment is messy.

Fire and Forget imports.

Hitting a wall (e.g. out of memory) in an import breaks the slave in a way that needs a manual restart.

Updating master from mirror when migrating a job across slaves.

Mark dangerous things (e.g. Xorg) DONTSYNC.

Scheduling issues with three people operating the same system.

UbuntuDownUnder/BOFs/BazaarImports (last edited 2008-08-06 16:30:07 by localhost)