BazaarImports
Bazaar Imports
Status
Created: 23/04/05 by RobertCollins
Priority: HighPriority
People: DavidAlloucheLead, ScottJamesRemnantSecond
- Contributors:
Interested: MarkShuttleworth, RobertCollins, JamesBlackwell, FabioDiNitto
Status: BrainDump, UduBof, BazaarSpecification, BazaarDevel
- Branch:
- Malone Bug:
- Packages:
Depends: UbuntuDevel/BazaarLaunchpadStrategy
UduSessions: 3
Introduction
This specification discusses improving and accelerating the bazaar CVS, SVN and Bitkeeper import process in order to make HCT able to leverage upstream changes on an ongoing basis.
Rationale
Baz is emerging as an excellent tool for distributed software development. In order to accelerate the adoption of baz, we are making available imports syncronised on a daily basis with upstream non-Bazaar repositories. The current software works but the process is too long. We need to identify what needs to be corrected, what pieces are missing, and plan to do that.
Scope and Use Cases
On August 1st 2005, 1000 upstream version control archives will have been imported as Bazaar archives and new upstream commits will be translated daily into new Bazaar revisions.
ScottJamesRemnant has also committed that on this date, both release tarball and ubuntu source package imports will have been performed and made available in the same system.
Bazaar imports will enable the full functionality of HCT, improving the productivity of the Ubuntu team and derived distributions.
Regularly updated Bazaar archives tracking upstream repositories in centralised version control systems will make it easy for community members to benefit from version control even if they do not have commit access to the repository.
"Ahead of time" archive conversion will make easy for established projects to evaluate Bazaar as a replacement for CVS or Subversion.
Baz vs. Buildbot
Bazaar development will be deprioritized in favour of source imports.
However, a level of activity must be maintained to animate the user community.
- Integration of community contributions.
- Bug fixing.
That is especially important since the Arch/Bazaar community has been repetitively damaged by lack of responsiveness from former the main developer. The current level of activity has been crucial in building community trust and enticing users to switch from GNU Arch to Bazaar. If the Bazaar project became unresponsive, that would also damage the company image in the community.
Implementation Plan
Staff Resources
To meet this target we need to import, on average, 17 new source repository per working day. The entire Bazaar team will be employed to reduce the work load on each member.
David Allouche, James Blackwell and Rob Weir will each be responsible for completing 5 new imports per working day on average.
Scott James Remnant will be responsible for completing 1 imports per working day on average. The 17th import is lost due to rounding errors.
Progress will be assessed weekly. Any backlog will have to be cleared.
Hardware resources
Each repository must be updated daily. To keep 1000 repositories in sync with only one machine, we need an average sync time of about 90 seconds, and that does not account for cost of new imports.
The initial import of a repository is an expensive operation in terms of CPU and I/O bandwith. To achieve the 17 syncs/day (on average) target, additional hardware will be provided.
- One system to run the botmaster.
- Three systems to run imports and syncs.
SSH connection caching
The current archive publication scheme involves the creation of one SSH session for each revision, even if it was already imported, between the machine performing the sync and arch.ubuntu.com.
That bottleneck can be removed by changing the SSH configuration to perform session caching.
DONTSYNC
The conversion system has still some bugs that prevent the conversion of really big repositories. The bugs induce two failure modes:
- Memory errors in a slave, blocking all the imports in that slave.
- CPU and memory overload in the botmaster, blocking all imports.
These problematic packages must be marked DONTSYNC before starting mass imports:
- XFree86
- XOrg
Multiple Slaves
TODO: specify association of jobs with slaves
Associating ProductSeries to different slaves.
Hardware
Certifying Imports
For this volume of import we cannot afford manual sanity checking of imports.
The current conversion system does some sanity checks during the import process and compares the end result of the import to the HEAD in the upstream repository.
However that does not garantee that the annotation would be correct. It would be possible, but non-trivial to compare the annotated form of HEAD, that would provide a stronger garantee of correctness than a simple comparison.
Certified consistence with HEAD was considered a good enough garantee of correctness. Certified consistence with annotated form has been delayed as a future development.
Data Preservation and Migration
In place conversion of Bazaar archives to newer archives formats as Bazaar converges with Bazaar-NG.
Important to provide a visible migration path for users, where milestones are new revisions of the archives format.
Packages Affected
The 1,000 source packages selected are those in the Ubuntu main repository which have upstream revision control systems.
Eventually all of universe will also be imported, but that is outside this specifiction.
User Interface Requirements
Requirement for a user-interface for the Bazaar team to manage imports.
The process from "this repository has full RCS information" to "this repository is in sync" still requires many manual operations. A new process must be implemented concurrently with the production of imports.
Product Data
Earch version control repository store in Launchpad is associated to a Product, which includes a human-written product description. Editing original and high quality product descriptions for projects one does not know about is a time-consuming and unrewarding activity.
To allow source imports to proceed, the description of newly created Products will be copied from the Debian package description.
A product is associated to source packages, but Debian source packages do not have a description. Only binary packages do. Here is a simple algorithm to pick one description:
- if the source package only creates one binary package, use the description of that source package.
- if there is a binary package with the same name as the source package, use the description of that binary package.
- otherwise, use the description of the first binary package defined in the control file.
Outstanding Issues
Autotesting or Not?