Created: 23/04/05 by RobertCollins
- Malone Bug:
This specification defines the plan to import 1,000 upstream branches and have them syncing by August 1st 2005. The team will share the load as follows:
DavidAllouche: 5 successful imports per weekday
JamesBlackwell: 5 "
RobWeir: 5 "
ScottJamesRemnant: 1 "
RobertCollins: 1 "
ScottJamesRemnant has also committed that on this date, both release tarball and Ubuntu source package imports will have been performed and made available in the same system.
These targets have been agreed by the team.
Baz is emerging as an excellent tool for distributed software development. In order to accelerate the adoption of baz, we are making available imports synchronised on a daily basis with upstream non-Bazaar repositories. The current software works but the process is too long. We need to identify what needs to be corrected, what pieces are missing, and plan to do that.
Scope and Use Cases
On August 1st 2005, 1000 upstream version control archives will have been imported as Bazaar archives and new upstream commits will be translated daily into new Bazaar revisions.
Bazaar imports will enable the full functionality of HCT, improving the productivity of the Ubuntu team and derived distributions.
Regularly updated Bazaar archives tracking upstream repositories in centralised version control systems will make it easy for community members to benefit from version control even if they do not have commit access to the repository.
"Ahead of time" archive conversion will make easy for established projects to evaluate Bazaar as a replacement for CVS or Subversion.
Baz vs. Buildbot
Bazaar development will be deprioritised in favour of source imports.
However, a level of activity must be maintained to animate the user community.
- Integration of community contributions.
- Bug fixing.
That is especially important since the Arch/Bazaar community has been repetitively damaged by lack of responsiveness from former the main developer. The current level of activity has been crucial in building community trust and enticing users to switch from GNU Arch to Bazaar. If the Bazaar project became unresponsive, that would also damage the company image in the community.
Each repository must be updated daily. To keep 1000 repositories in sync with only one machine, we need an average sync time of about 90 seconds, and that does not account for cost of new imports.
The initial import of a repository is an expensive operation in terms of CPU and I/O bandwidth. To achieve the 17 syncs/day (on average) target, additional hardware will be provided.
One system to run the botmaster.
Three systems for slaves running imports and syncs.
In the short, in the absence of adequate load balancing and job migration support, production imports and syncs will be run on a single slave.
- One slave for imports and syncs
- Two slaves for autotesting
SSH connection caching
The current archive publication scheme involves the creation of one SSH session for each revision, even if it was already imported, between the machine performing the sync and arch.ubuntu.com.
That bottleneck can be removed by changing the SSH configuration to perform session caching.
For this volume of import we cannot afford manual sanity checking of imports.
The current conversion system does some sanity checks during the import process and compares the end result of the import to the HEAD in the upstream repository.
However that does not guarantee that the annotation would be correct. It would be possible, but non-trivial to compare the annotated form of HEAD, that would provide a stronger guarantee of correctness than a simple comparison.
Certified consistence with HEAD was considered a good enough guarantee of correctness. Certified consistence with annotated form has been delayed as a future development.
The conversion system has some bugs that prevent the conversion of really big repositories. The bugs induce two failure modes:
- Memory errors in a slave, blocking all the imports in that slave.
- CPU and memory overload in the botmaster, blocking all imports.
These problematic products must be marked DONTSYNC before starting mass imports:
Other products of similar size will probably have to be delayed between the start of mass imports and the point where the performance bugs are fixed.
Reviewers comments: -- manual assignment of jobs to slaves is unsustainable. All assignment of work to slaves must be automatic.
In the absence of useful load sharing and job migration infrastructure, all production imports and syncs, that need persistent state, will be run on a single slave.
Autotest imports, that do not have persistent state will be shared equally on two slaves using round-robin load sharing: When the autotest master (Roomba) loads jobs, it will assign ever other job to the first and second slave.
Data Preservation and Migration
Bazaar imports archives will be converted in place to newer archive formats as Bazaar converges with Bazaar-NG. See BazAndBzrConvergence.
The 1,000 source packages selected are those in the Ubuntu main repository which have upstream revision control systems.
Eventually all of universe will also be imported, but that is outside this specification.
User Interface Requirements
This section documents the procedure to process new imports.
RobertCollins will create a Wiki page listing all pending productseries. This page will include one section for each import operator, with "preparation", "import", "sync" and "complete". subsections, and one section for unowned productseries.
Using these subsections allow operators to requeue their jobs after they were interrupted or flushed from the queue by system maintenance. Interruption occurs when the slave is restarted while a job is running, flushing occurs when the botmaster is restarted.
The unowned section will be regularly refilled with productseries that passed autotesting.
If the products did not need approval, we could use the following process:
- Move your quota of productseries from the unowned section to your ownership section, "preparation" subsection.
- For each productseries in your "preparation" subsection.
- If the productseries CVS is a module alias, create productseries for each of the CVS modules pointed to by this alias, and tell Robert Collins and David Allouche so they figure out a standard way to provide the config file.
If the productseries is associated to the unassigned product.
- Create a product.
- Fill in the product summary and description.
- Reassign the productseries to the proper product.
- Set the source admin details:
- Find the Launchpad sourceadmin page for the productseries.
- Fill in the target Arch namespace details.
- Promote the productseries to PROCESSING and submit the form.
- For each productseries in your "import" section, examine the job status.
- If the import was interrupted, or failed because of a transient remote service outage, leave it here.
- If the import failed because of a import error
- Demote the productseries to TESTING in the Launchpad sourceadmin page.
- Remove the productseries from your "import" subsection on the wiki.
- If the import succeeed
- Promote the productseries to SYNCING in the Launchpad sourceadmin page.
- Move the productseries to your "sync" subsection on the wiki.
- If the import does not appear to have run
- Examine the waterfall page for that job.
- If last event is "queued", complain that we need to run imports on multiple slaves.
- If the last event is "Connected", the slave was bounced since you last queued it. Do nothing, yet.
- If the waterfall history is empty, that means that the persistent state of the botmaster had been deleted (assuming your are following this procedure correctly, you should have previously queued it for initial import). Do nothing, yet.
- Move productseries in your "preparation" subsection to your "import" subsection.
- Reload the botmaster.
- For each productseries in your "import" subsection, "force build" the job, to queue the initial import.
- For each productseries in your "sync" subsection, check the job status.
- If the sync was interrupted or failed because of transient remote service outage, force build.
- If the sync failed because of a conversion error
- Demote the productseries to TESTING
- Remove the productseries from your ownership section on the wiki page.
- If the last event is a successful mirrorTarget, the initial sync succeeded, record the date and add it to your "complete" section.
- If the last event is a sucessful runJob, the initial sync has not been done yet. Check the waterfall page for the job.
- If the last item is not "Queued" force-build.
Identifying CVS module aliases
Use the following command:
cvs -d <root> co -c
That gives the listing of all modules in the CVS repository. The beginning of the listing for the KDE CVS repository reads:
admin kde-common/admin arts arts &admin &libltdl bugs bugs
In that repository, we can directly import "admin" and "bugs", but importing "arts" will not produce the same tree as a CVS checkout.
- You must create productseries for "admin" and "libltdl"
- Configs are the Arch equivalent of CVS module aliases. A config file will have to be created that associates "arts" with "admin" and "libltdl".
Most productseries are associated to a product without a project. Exceptions to the rule are defined on a ad-hoc basis. Some notable existing exceptions are:
- Mozilla products (mozilla, firefox, thunderbird, etc.)
Products that clearly belong to these projects should be created as part of the project.
Some products are questionable, generally if a productseries is hosted on the same CVS repository as the rest of the project, it is considered part of the project.
Some products have many associated sub-products, like Nautilus, and should probably have a projec of their own.
When in doubt, ask the distribution maintainer his opinion.
Each version control repository store in Launchpad is associated to a Product, which includes a human-written product description. Editing original and high quality product descriptions for projects one does not know about is a time-consuming and unrewarding activity.
To allow source imports to proceed, the description of newly created Products will be copied from the Debian package description.
A product is associated to source packages, but Debian source packages do not have a description. Only binary packages do. Here is a simple algorithm to pick one description:
- if the source package only creates one binary package, use the description of that source package.
- if there is a binary package with the same name as the source package, use the description of that binary package.
- otherwise, use the description of the first binary package defined in the control file.
Setting the Arch Details
The general pattern for setting the Arch branch details of a productseries is the following:
archive is set to the project name, or the product name if the productseries does not belong to a project.
category is set to the product name.
branch and version are set respectively to the textual and numeric parts of the CVS (or subversion) branch name. For example:
The MAIN branch of a CVS repository is imported as MAIN--0.
The experimental v0_6 branch of APT is imported as experimental--0.6. Some understanding of the purpose of a CVS branch may be necessary to pick a meaningful Arch branch name.
When in doubt on whether a productseries should have an archive named after its product name or a potential project name, use the product name.
The sourceadmin page
You can only set the Arch details in the sourceadmin page for a productseries. These pages are acessible through launchpad.ubuntu.com/bazaar/series.
In this page, the "ready" check box restrict search results to productseries that are associated to a product whose description was approved by sabdf, i.e. products for which you can actually start the import.
There is not currently a user interface to demote productseries.importstatus. But you can promote a productseries through this web page:
- to PROCESSING: by checking the "processing approved" box
- to SYNCING: by checking the "syncing approved" box
The buildbot web UI is accessible at this location.
This screen gives a listing of all jobs currently present in the botmaster. Hoover only create jobs for productseries that are PROCESSING or SYNCING.
When you change a productseries.importstatus, you have to reload the botmaster to update the jobs. It is particularly important that your reload the botmaster after promoting a job to SYNCING, otherwise forcing it will rerun the import, and not the initial sync. Reload the botmaster by loading this page:
There is also a hyperlink present on status page.
There is no easy way to know whether a given job is associated to a productseries in PROCESSING or SYNCING mode.
The status page displays the latest events for a the jobs. The color of the event is the most important information:
white means the job was not run since it was first loaded.
green means the last run of the job was sucessful.
yellow means the job is currently running.
red means the job is currently running.
If a job appears with an estimated time to completion, that means the botmaster is broken and has to be fixed by restarting it without persistent state (thus losing past log data).
You can restrict the set of jobs displayed using the search form at the top of the page. The search pattern is a regular expression that is matched on job names.
To "force build", click on the hyperlink that reads as the job name, then press the "force build" button.
To examine the log for a a job, click on the "waterfall" hyperlink on job's line in the status table, then click on a "log" hyperlink in the waterfall page.
In the waterfall page, you can distinguish:
- import jobs start with a nukeTarget step.
- sync jobs ends with a mirrorTarget step.
You can easily assert that a job was interrupted, since its event box in the status view reads "Interrupted" on a red background.
That means that the slave running the job had been shut down while the job was running.
If the job has been past its initial sync, it will be run again automatically.
If the job has not yet completed its initial sync, and you own it (it should be in your "import" or "sync" subsection on the wiki page), you have to force-build it.
Transient RunJob Failures
Errors ocurring when running "cvs version" are generally transient errors (the CVS server is down).
But it can also be caused by a missing SSH key if the CVS method is :ext:, aka SSH. In that case, notify Robert Collins or David Allouche so they will run "cvs version" on the server to accept the host key.
Fatal RunJob Failures
The most common fatal conversion failures translate into:
RuntimeError: attempted to patch non-extant file
MemoryError (that one is bad, and means the slave must be restarted)
Note that MemoryError may be caused by another job eating all the memory, so be careful.
Attempt to patch non-extent file and ValidationError can occur when the CVS repository was manually modified. Some projects do that in order to rename files in CVS while retaining version control history. That situation should not happen on initial sync, and if it does it's safe to demote the productseries to TESTING.
When such an error happens on a job that has already passed the initial sync, the problem must be diagnosed and manually fixed on the production system. Once the initial sync has been performed, a productseries must never be demoted to TESTING or PROCESSING.
Failures in mirrorTarget has been known to happen mainly for the following reasons:
- ssh configuration problem
- excessive simultaneous unauthenticated connections
- baz bugs causing multiple "archive-mirror", causing failures on revisions containing a cachedrev.
- lost database connection
- messed up database connection
In all cases, such a failure is not actually a conversion problem, and should not cause the package to be demoted from SYNCING.
The process will drive you crazy.
The productseries ownership process and how it will interact with product approval is still unspecified.
No user interface to demote productseries.importstatus.
Automatic creation and update of the config files for CVS module aliases.