BazaarImports
Bazaar Imports
Status
Created: 23/04/05 by RobertCollins
Priority: HighPriority
People: DavidAlloucheLead, ScottJamesRemnantSecond
- Contributors:
Interested: MarkShuttleworth, RobertCollins, JamesBlackwell, FabioDiNitto
Status: BrainDump, UduBof, BazaarSpecification, BazaarDevel
- Branch:
- Malone Bug:
- Packages:
Depends: UbuntuDevel/BazaarLaunchpadStrategy
UduSessions: 3
Introduction
This specification discusses improving and accelerating the bazaar CVS, SVN and Bitkeeper import process in order to make HCT able to leverage upstream changes on an ongoing basis.
Rationale
Baz is emerging as an excellent tool for distributed software development. In order to accelerate the adoption of baz, we are making available imports synchronised on a daily basis with upstream non-Bazaar repositories. The current software works but the process is too long. We need to identify what needs to be corrected, what pieces are missing, and plan to do that.
Scope and Use Cases
On August 1st 2005, 1000 upstream version control archives will have been imported as Bazaar archives and new upstream commits will be translated daily into new Bazaar revisions.
ScottJamesRemnant has also committed that on this date, both release tarball and ubuntu source package imports will have been performed and made available in the same system.
Bazaar imports will enable the full functionality of HCT, improving the productivity of the Ubuntu team and derived distributions.
Regularly updated Bazaar archives tracking upstream repositories in centralised version control systems will make it easy for community members to benefit from version control even if they do not have commit access to the repository.
"Ahead of time" archive conversion will make easy for established projects to evaluate Bazaar as a replacement for CVS or Subversion.
Baz vs. Buildbot
Bazaar development will be deprioritized in favour of source imports.
However, a level of activity must be maintained to animate the user community.
- Integration of community contributions.
- Bug fixing.
That is especially important since the Arch/Bazaar community has been repetitively damaged by lack of responsiveness from former the main developer. The current level of activity has been crucial in building community trust and enticing users to switch from GNU Arch to Bazaar. If the Bazaar project became unresponsive, that would also damage the company image in the community.
Implementation Plan
Staff Resources
To meet this target we need to import, on average, 17 new source repository per working day. The entire Bazaar team will be employed to reduce the work load on each member.
David Allouche, James Blackwell and Rob Weir will each be responsible for completing 5 new imports per working day on average.
Scott James Remnant will be responsible for completing 1 imports per working day on average. The 17th import is lost due to rounding errors.
Progress will be assessed weekly. Any backlog will have to be cleared.
Hardware resources
Each repository must be updated daily. To keep 1000 repositories in sync with only one machine, we need an average sync time of about 90 seconds, and that does not account for cost of new imports.
The initial import of a repository is an expensive operation in terms of CPU and I/O bandwith. To achieve the 17 syncs/day (on average) target, additional hardware will be provided.
- One system to run the botmaster.
- Three systems to run imports and syncs.
SSH connection caching
The current archive publication scheme involves the creation of one SSH session for each revision, even if it was already imported, between the machine performing the sync and arch.ubuntu.com.
That bottleneck can be removed by changing the SSH configuration to perform session caching.
Certifying Imports
For this volume of import we cannot afford manual sanity checking of imports.
The current conversion system does some sanity checks during the import process and compares the end result of the import to the HEAD in the upstream repository.
However that does not garantee that the annotation would be correct. It would be possible, but non-trivial to compare the annotated form of HEAD, that would provide a stronger garantee of correctness than a simple comparison.
Certified consistence with HEAD was considered a good enough guarantee of correctness. Certified consistence with annotated form has been delayed as a future development.
DONTSYNC
The conversion system has some bugs that prevent the conversion of really big repositories. The bugs induce two failure modes:
- Memory errors in a slave, blocking all the imports in that slave.
- CPU and memory overload in the botmaster, blocking all imports.
These problematic products must be marked DONTSYNC before starting mass imports:
- XFree86
- XOrg
Other products of similar size will probably have to be delayed between the start of mass imports and the point where the performance bugs are fixed.
Multiple Slaves
Buildbot does not provide a load balancing facility, but it can associate jobs to specific buildbot slaves in a static way. To take advantage of the additional hardware we need a way to specify which jobs will be assigned to which slaves.
This association is a one to many relationship, one slave to many ProductSeries. We implement it as a foreign key in the productseries table.
importdslave table: int id # name used by the slave to login into the botmaster string name # purpose of this slave string description productseries table: # slave processing the import of this productseries foreign_key importdslave
Add a dropdown in the +sourceadmin page to associate a source with a slave name.
Add a bazaar/slaveadmin page that allows:
- adding slaves
- renaming slaves
- removing slaves (if they have no assigned source)
- group sources by slave
- visually identify sources that were not synced in time, and should therefore be moved to another slave (dependent on proper job scheduling)
- reassign multiple jobs to other slaves (dependent on job migration)
Data Preservation and Migration
Bazaar imports archives will be converted in place to newer archive formats as Bazaar converges with Bazaar-NG. See BazAndBzrConvergence.
Packages Affected
The 1,000 source packages selected are those in the Ubuntu main repository which have upstream revision control systems.
Eventually all of universe will also be imported, but that is outside this specification.
User Interface Requirements
This section documents the procedure to process new imports.
Picking a ProductSeries
RobertCollins will create a Wiki page listing all pending productseries. This page will include one section for each import operator and one section for unowned productseries.
The specifics of how this page is going to be used, how it will be refilled with new productseries, and how that process will interact with the product description approval process is still undecided.
If the products did not need approval, we could use the following process:
- move ONE productseries from the unowned section to your ownership section
if the productseries is associated to the unassigned product
- create a product for it, fill in the summary and description
- set the Arch namespace details
- approve the productseries for sync
Creating Products
Most productseries are associated to a product without a project. Exceptions to the rule are defined on a ad-hoc basis. Some notable existing exceptions are:
- GNOME
- KDE
- Mozilla products (mozilla, firefox, thunderbird, etc.)
- libggi
- bluez
Products that clearly belong to these projects should be created as part of the project.
Some products are questionable, generally if a productseries is hosted on the same CVS repository as the rest of the project, it is considered part of the project.
Product Description
Each version control repository store in Launchpad is associated to a Product, which includes a human-written product description. Editing original and high quality product descriptions for projects one does not know about is a time-consuming and unrewarding activity.
To allow source imports to proceed, the description of newly created Products will be copied from the Debian package description.
A product is associated to source packages, but Debian source packages do not have a description. Only binary packages do. Here is a simple algorithm to pick one description:
- if the source package only creates one binary package, use the description of that source package.
- if there is a binary package with the same name as the source package, use the description of that binary package.
- otherwise, use the description of the first binary package defined in the control file.
Setting the Arch Details
The general pattern for setting the Arch branch details of a productseries is the following:
archive is set to the project name, or the product name if the productseries does not belong to a project.
category is set to the product name.
branch and version are set respectively to the textual and numeric parts of the CVS (or subversion) branch name. For example, the MAIN branch of a CVS repository is imported as MAIN--0.
When in doubt on whether a productseries should have an archive named after its product name or a potential project name, use the product name.
Outstanding Issues
Searching in the bazaar/series for a string causes an error.
The productseries ownership process and how it will interact with product approval is still unspecified.