DistributedDevelopment/UnderTheHood/Importer

Importer

This page explains in some depth how the importer works, and some of the things that you may see it do.

Mainloop

The importer is basically just a while loop that looks at recent uploads to Debian and Ubuntu and triggers imports based on them (with several running in parallel). This check is run frequently, meaning that it should start on your upload a few minutes after you complete it. The process is also reasonably fast, so if the binaries are published and you don't see any change in bzr it is quite likely something went wrong (the build time + publication delay should normally greatly exceed the time taken by the importer).

To find out about uploads the importer uses Launchpad APIs. This means that for Debian we only see uploads when Launchpad imports them. This is done four times a day (same frequency as Debian's dinstall), and so it should only be a few hours on average from a Debian upload to it being available in bzr. Obviously if you are waiting for the upload for some reason this is too long. If this is a frequent occurrence then we can look at alternative solutions, so speak up if you are hit by this.

If there is nothing for the importer to work on (the usual case) then it will do integrity checks instead, making sure it is up to date and all the data is correct. This doesn't delay any imports though.

If the queue fills up (see below for how to find out), perhaps after an autosync run where there are a few hundred uploads to import, then there can be a delay of up to a few hours until the backlog is processed. If you are waiting for something that is stuck in that backlog then request an importer admin increase the priority of that package (you just need to tell them the package name).

Import job

When the mainloop triggers an import it acts on the whole history of the package at once. It looks at everything it should have imported and compares it to what it has. If there are any un-imported uploads it starts the process to import them to bzr. The import itself is based on "bzr import-dsc" from bzr-builddeb.

If there are any problems with the import it will record a failure for that package and put it on hold. This generally means that it won't act on that package until it has been reviewed by a human, so that it won't dig itself in to a hole repeatedly trying to do something that causes it to crash. There are currently a few hundred failures, so that is the usual reason why a package isn't imported, and we are working to reduce that number.

However, not all failures require human intervention. There are often transient problems, usually network issues, and for these the importer will retry the package after a couple of hours, which works around most issues. If you find that a package has failed with what looks like a network problem and it doesn't say that it will be automatically retried then speak to an importer admin and suggest that the package has a "spurious failure" and they can have a look and retry the package such that errors similar to that will be automatically retried in future.

However, there is a cap on automatic retries to prevent and endless loop if the problem appears to be transient but is being caused by the actions of the importer. In these cases human intervention is once again required.

Finding the status

That's great, but how do you find out what is going on?

You can start at the status overview page: http://package-import.ubuntu.com/status/ which lists the currently running jobs, the queue, and the failures.

Clicking on one of the package links for a failure will take you to a page showing the gory details. Feel free to ask for an interpretation if it is a package that you care about.

Collisions

Because it is still possible to upload source packages without pushing to the Bazaar branches it is possible that the branch and the archive differ on what the current state of the package is. Every time the uploader imports a package it checks the branch to see if there are any revisions since the one corresponding to the previous package upload. There can be two cases for this:

Someone pushed and uploaded, and either tagged using "bzr mark-uploaded" or forgot.
Someone pushed and someone else uploaded something different.

The latter is what is known as a "collision". As the archive is authoritative in these cases it will make the branch the same as the uploaded package, and move the pushed changes to a different branch, and then propose that the changes are merged back.

First though it has to try and detect which of the above cases it has found. To do this it imports the upload on to temporary branch and then tries to merge the current state of the branch on Launchpad in to it. If there are no changes then it declares the collision to be "clean" and assumes that the same thing was pushed as was uploaded. If that happens it just ensures the branch on Launchpad is tagged for that upload and moves on.

If there are changes then it pushes the current state of the Launchpad branch to a new branch on LP, rewinds the original branch back to the last known-good state, and imports the package on top. It then proposes that the new branch be merged in to the original so that the changes aren't lost.

Occasionally the heuristics used to do this fail. If you see how the difference that was found could be detected then please file a bug on the 'udd' project suggesting how that could be done.

Importer admins

Who are the admins?

jam
james_w
StevenK
vila
wgrant

If you speak to one and they don't know how to deal with what you are asking them to do then see if the task is on DistributedDevelopment/UnderTheHood/Importer/Operational.

Hacking

You can run the importer locally to test bug fixes and the like. See DistributedDevelopment/UnderTheHood/Importer/Hacking.

DistributedDevelopment/UnderTheHood/Importer (last edited 2013-05-06 04:44:17 by wgrant)