AutopkgtestInfrastructure
561
Comment:
|
37531
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Infrastructure for running autopkgtests for proposed-migration = | |
Line 12: | Line 11: |
= Swift result store and layout = | = Test result store = == Swift == The swift object store is being used as the central API for storing and querying results. This ensures that logs are kept safe in a redundant non-SPOF storage, and we do not keep primary data in any cloud instance. Thus we can completely re-deploy the whole system (or any part of it can fatally fail) without losing test results and logs. Swift also provides a [[http://developer.openstack.org/api-ref-objectstorage-v1.html#storage_container_services|flexible API for querying particular results]] so that consumers (like web interfaces for result browsing, report builders, or proposed-migration) can easily find results based on releases, architectures, package names, and/or time stamps. For this purpose the containers are all publicly readable and browsable, so that no credentials are needed. == Container Names == Logs and artifacts are stored in one container `autopkgtest-`''release'' for every release, as we want to keep the logs throughout the lifetime of a release and thus it's easy to remove them after EOLing. Results for PPAs are stored in the container `autopkgtest-`''release''`-`''lpuser''`-`''ppaname'' (e. g. `autopkgtest-wily-pitti-systemd`). == Container Layout == In order to allow efficient querying and polling for new results, the logs are stored in this (pseudo-)directory structure: /release/architecture/prefix/sourcepkg/YYYYMMDD_HHMMSS@/autopkgtest_output_files "prefix" is the first letter (or first four letters if it starts with "lib") of the source package name, as usual for Debian-style archives. Example: `/trusty/amd64/libp/libpng/20140321_130412@/log.gz` The '`@`' character is a convenient separator for using with a container query's `delimiter=@` option: With that you can list all the test runs without getting the individual files for each run. The result files are by and large the contents of autopkgtest's `--output-directory` plus an extra file `exitcode` with `autopkgtest`'s exit code; these files are grouped and tar'ed/compressed: * `result.tar` contains the minimum files/information which clients like proposed-migration or web result browsers need to enumerate test runs and see their package names/versions/outcome: `exitcode`, `testpkg-version`, `duration`, and `testbed-packages`. All of these are very small (typically ~ 10 kB), thus it's fine to download and cache this information locally for fast access. * `log.gz` is the compressed `log` from autopkgtest. Clients don't need to download and parse this, but it's the main thing developers look at, so it should be directly linkable/accessible. These have a proper MIME type and MIME encoding so that they can be viewed inline in a browser. * `artifacts.tar.gz` contains ''testname''-`{stdout,stderr,packages}` and any test specific additional artifacts. Like the log, these are not necessary for machine clients making decisions, but should be linked from the web UI and be available to developers. Due to Swift's "eventual consistency" property, we can't rely on a group of files (like `exit-code` and `testpkg-version`) to be visible at exactly the same time for a particular client, so we must store them in `result.tar` to achieve atomicity instead of storing them individually. == Example queries == Please read the [[http://developer.openstack.org/api-ref-objectstorage-v1.html#storage_container_services|Swift container API]] for the precise meaning of these. The current public Swift URL for the production infrastructure is `https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac` * List all available files for Ubuntu trusty: [[https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-trusty/?format=plain|SWIFTURL/autopkgtest-trusty/?format=plain]] Note that swift returns max. 10.000 results at a time; retrieve the next batch with giving the last result path in `marker=`, until you get no further results: [[https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-trusty/?format=plain&marker=trusty/ppc64el/p/php5/20150911_123149@/result.tar|SWIFTURL/autopkgtest-trusty/?format=plain&marker=trusty/ppc64el/p/php5/20150911_123149@/result.tar]] * List all runs of `udisks2` on `amd64` by using a prefix match, and without the individual files by using `delimiter=@` (see above): [[https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-trusty/?format=plain&delimiter=@&prefix=trusty/amd64/u/udisks2/|SWIFTURL/autopkgtest-trusty/?format=plain&delimiter=@&prefix=trusty/amd64/u/udisks2/]] * If you poll swift for results regularly, you should remember the last timestamp per release/package/arch, and then just ask for newer results. E. g. the same query as above, but results newer than 2015-09-01 01:02:10: [[https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-trusty/?format=plain&delimiter=@&prefix=trusty/amd64/u/udisks2/&marker=trusty/amd64/u/udisks2/20150901_010210|SWIFTURL/autopkgtest-trusty/?format=plain&delimiter=@&prefix=trusty/amd64/u/udisks2/&marker=trusty/amd64/u/udisks2/20150901_010210]] * List results for tests with enabling the [[https://launchpad.net/~pitti/+archive/ubuntu/systemd-semaphore|pitti/systemd-semaphore]] PPA, in JSON format instead of plain text: [[https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-trusty-pitti-systemd-semaphore?format=json|SWIFTURL/autopkgtest-trusty-pitti-systemd-semaphore?format=json]] |
Line 16: | Line 69: |
== RabbitMQ server == AMQP (we use the [[http://www.rabbitmq.com/|RabbitMQ]] server implementation) provides a very robust and simple to use job distribution system, i. e. to coordinate running test requests amongst an arbitrary number of workers. We use explicit ACKs, and ACK only after a test request has been fully processed and its logs stored in swift. Should a worker or a test run fail anywhere in between and the request does not get ACK'ed, it will just be handed to the next worker. This ensures that we never lose test requests in the event of worker failures. RabbitMQ provides failover with [[http://www.rabbitmq.com/ha.html|mirrored queues]] to avoid a single point of failure. This is not currently being used, as RabbitMQ is very robust and runs in its own cloud instance (Juju service `rabbitmq-server`). == Queue structure == We want to use a reasonably fine-grained queue structure so that we can support workers that serve only certain releases, architectures, virtualization servers, real hardware, etc. For example: `debci-wily-amd64` or `debci-trusty-armhf`. As test requests are not long-lived objects, we remain flexible here and can introduce further granularity as needed; e. g. we might want a `trusty-amd64-laptop-nvidia` (i. e. running on bare metal without virtualization) queue in the future. == Test request format == A particular test request (i. e. a queue message) has the format `srcpkgname <parameter JSON>`. The following parameters are currently supported: * `triggers`: List of `trigsrcpkgname/version` strings of packages which caused `srcpkgname` to run (i. e. triggered the `srcpkgname` test). Ubuntu test requests issued by `proposed-migration` should always contain this, so that a particular test run for `srcpkgname` can be mapped to a new version of `trigsrcpkgname` in `-proposed`. In case multiple reverse dependencies `trigsrc1` and `trigsrc2` of `srcpkgname` get uploaded to `-proposed` around the same time, the trigger list can contain multiple entries. * `ppas`: List of PPA specification strings `lpuser/ppaname`. When given, ask Launchpad for the PPAs' GPG fingerprints and add setup commands to install the GPG keys and PPA apt sources. In this case the result is put into the container "autopkgtest-release-lpuser-ppaname" for the ''last'' entry in the list; this is is fine grained enough for easy lifecycle management (e. g. remove results for old releases wholesale) and still predictable to the caller for polling results. * `env`: List of `VAR=value` strings. These get passed verbatim to `autopkgtest`'s `--env option. This can be used to influence a test's behaviour from a test request. * `test-git`: A single URL or `URL branchname`. The test will be `git clone`d from that URL (if given, a non-default branch will be checked out) and ran from the checkout. This will ''not'' build binary packages from the branch and run tests against those, the test dependencies will be taken from the archive, or PPA if given. The `srcpkgname` will only be used for the result path in swift and be irrelevant for the actual test. * `build-git`: Like `test-git`, except that this will first build binary packages from the branch and run tests against those. * `test-bzr`: A single URL. The test will be checked out with `bzr` from that URL. Otherwise this has the same behaviour as `test-git`. * `all-proposed`: If this is set to 1, apt pinning to only use the trigger package from `-proposed` will be disabled, and the test will run against all of `-proposed`. This is sometimes necessary when several packages need to land in lockstep but don't declare versioned `Depends:`/`Breaks:` to each other, but might cause mis-blaming if some other package than the trigger got broken in `-proposed`. * `testname`: If given, this gets forwarded to autopkgtest's `--testname` option to run a single test only. Examples: * A typical request issued by proposed-migration when a new `glib2.0 2.20-1` is uploaded and we want to test one of its reverse dependencies `gedit`: `gedit {"triggers": ["glib2.0/2.20-1"]}` * Run the `systemd` package tests against the packages in the [[https://launchpad.net/~pitti/+archive/ubuntu/systemd|pitti/systemd PPA]]: `systemd {"ppas": ["pitti/systemd"]}` * Run the `unity8` package tests against the packages in the [[https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/stable-phone-overlay|stable phone overlay PPA]] ''and'' the [[https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/landing-001|ci-train-ppa-service/landing-001 silo PPA]]: `unity8 {"ppas": ["ci-train-ppa-service/stable-phone-overlay", "ci-train-ppa-service/landing-003"]}` * Run the `gedit` tests under a different env variable: `gedit {"env": ["XDG_SESSION_DESKTOP=xfce"]}` == Juju service == This uses the [[https://jujucharms.com/rabbitmq-server/|standard charm store RabbitMQ charm]] with some customizations: * Remove almighty "guest" user * Create user for test requests with random password and limited capabilities (nothing else than creating new messages); these are the credentials for clients like proposed-migration As usual with the charm, worker services create a relation to the RabbitMQ service, which creates individual credentials for them. The `rabbitmq-server` Juju service is exposed on a "public" IP (162.213.33.228), but accessible only within the Canonical VPN and firewalled to only be accessible from `snakefruit.canonical.com` (the proposed-migration host running britney) and any external workers. |
|
Line 18: | Line 132: |
== worker process and its configuration == The [[https://git.launchpad.net/autopkgtest-cloud/tree/worker/worker|worker]] script is the main workhorse which consumes one AMQP request at a time, runs `autopkgtest`, and uploads the results/artifacts into swift. Configuration happens in [[https://git.launchpad.net/autopkgtest-cloud/tree/worker/worker.conf|worker.conf]]; the options should be fairly self-explanatory. The virtualization server is configured in the `[virt]` section and can use various `$VARIABLE` substitutions. == Worker service in the cloud == The [[https://git.launchpad.net/autopkgtest-cloud/tree/deployment/charms/xenial/autopkgtest-cloud-worker|autopkgtest-cloud-worker]] Juju charm sets up a cloud instance which runs several parallel worker instances for each cloud instance (a few less than the maximum allowed number of instances). This is done through the [[https://git.launchpad.net/autopkgtest-cloud/tree/deployment/charms/xenial/autopkgtest-cloud-worker/autopkgtest.target|autopkgtest.target]] systemd unit which pulls in some instances of [[https://git.launchpad.net/autopkgtest-cloud/tree/deployment/charms/xenial/autopkgtest-cloud-worker/autopkgtest@.service|autopkgtest@.service]]. The workers use the config files in [[https://git.launchpad.net/autopkgtest-cloud/tree/worker-config-production|worker-config-production/worker*.conf]]. The macros like `#SWIFT_PASSWORD#` are filled in by `autopkgtest@.service`. If you change the configs, you need to `pkill -HUP worker` to restart the worker processes. They will gracefully handle SIGHUP and finish running the current test before they restart. The cloud workers are being used for running i386/amd64/arm64/ppc64el tests in the Canonical Scalingstack cloud. They use the [[http://manpages.ubuntu.com/autopkgtest-virt-ssh|autopkgtest-virt-ssh]] runner with the [[https://salsa.debian.org/ci-team/autopkgtest/blob/master/ssh-setup/nova|nova ssh setup script]]. This assumes that the nova credentials are already present in the environment (`$OS_*`). Note that we currently use two cloud instances to control ''all'' parallel `worker` and `autopkgtest` processes. This is reasonably reliable as on that instance `autopkgtest` effectively just calls some `nova`/`lxc` commands and copies the results back and forth. The actual tests are executed either in ephemeral VMs in ScalingStack or within `lxc` containers managed via `lxd` which is also running on ScalingStack VMs. == External workers == These work quite similarly to the cloud ones. You can run one (or several) worker instances with an appropriate `worker.conf` on any host that is allowed to talk to the RabbitMQ service and swift; i. e. this is mostly just an exercise in sending RT tickets to Canonical to open the firewall accordingly. But remember that all workers need to be within the Canonical VPN. We currently have such a setup on [[https://git.launchpad.net/autopkgtest-cloud/tree/lxc-slave-admin/s390x.hosts|two zVMs for s390x]] which use the [[http://manpages.ubuntu.com/autopkgtest-virt-lxc|autopkgtest-virt-lxc]] virtualization server. Once ScalingStack supports these architectures these will go away. = Web Results Browser = The [[https://git.launchpad.net/autopkgtest-cloud/tree/webcontrol|autopkgtest-cloud webcontrol module]] contains the components for presenting the test results: * [[https://git.launchpad.net/autopkgtest-cloud/tree/webcontrol/download-results|download-results]] downloads all new `results.tar.gz` files from Swift and puts their information into an SQLite database. This script gets called by cron every 5 minutes. * [[https://git.launchpad.net/autopkgtest-cloud/tree/webcontrol/amqp-status-collector|amqp-status-collector]] listens to the teststatus.fanout AMQP queue and updates `/tmp/running.json` with the currently running tests and their logtails. * [[https://git.launchpad.net/autopkgtest-cloud/tree/webcontrol/browse.cgi|browse.cgi]] is a simple [[http://flask.pocoo.org/|Flask]] app that renders results and statistics from the above database, currently running tests from `/tmp/running.json`, and queued tests from introspecting the `debci-*` AMQP queues. * [[https://git.launchpad.net/autopkgtest-cloud/tree/webcontrol/request.cgi|request.cgi]] provides a Launchpad SSO authenticated CGI API for (re)triggering test requests for Ubuntu and github. Pages like britney's [[http://people.canonical.com/~ubuntu-archive/proposed-migration/update_excuses.html|excuses.html]] link to it for retrying regressions. The [[https://git.launchpad.net/autopkgtest-cloud/tree/deployment/charms/xenial/autopkgtest-web|autopkgtest-web charm]] has a relation to RabbitMQ for listening to the `teststatus.fanout` AMQP queue and submitting test requests via `request.cgi`. Aside from that it's entirely independent from britney, the workers, and all other components. = Deployment = == Production deployment from wendigo == Everything that's necessary to deploy and configure all services into a freshly bootstrapped Juju environment are contained in [[https://git.launchpad.net/autopkgtest-cloud/tree/deployment/deploy.sh|deploy.sh]]: {{{ prod-ues-proposed-migration@wendigo:~$ autopkgtest-cloud/deployment/deploy.sh ~/.scalingstack/ ssl-autopkgtest.ubuntu.com/ }}} * The first argument is the directory of all nova RC files that you want to use to run actual tests (should be the various Scaling``Stack regions). Note that their names ''must'' end in `.rc`. * The second argument is the directory with the SSL `*.crt` and `*.key` for [[https://autopkgtest.ubuntu.com]]. You can also use `deploy.sh` for re-deploying a single service after you `juju destroy-service`'d it. `deploy.sh` deploys basenode/ksplice/landscape into all instances, deploys the above RabbitMQ, worker, and web charms, and does the necessary public IP attachments and exposes. At the end it prints credentials to be used by britney (or other entities requesting tests): These credentials can only be used to publish new test requests, not for consuming them or doing any other queue administration. This needs to be copied to `britney.conf` on `snakefruit.canonical.com`. /!\ The first time after Scalingstacks get set up, you need to add a firewall rule to allow ssh access from Prodstack: {{{ nova secgroup-add-rule default tcp 22 22 162.213.33.179/32 }}} Run this on every ScalingStack region you are going to use (lcy01, lgw01, bos02). == Local deployment == For developing this infrastructure you can deploy with [[https://jujucharms.com/docs/1.25/config-LXC|juju-local]] into containers. [[https://git.launchpad.net/autopkgtest-cloud/tree/deployment/deploy.sh|deploy.sh]] works with that, and will also not try to install the landscape/ksplice charms. This still needs a cloud for storing the test results in swift. For running the actual tests you can choose between: * running them in e. g. Canonistack (which is not very reliable, though) or another cloud. Do this if you want to work on the autopkgtest nova backend, PPAs, or cloud configuration/quirks. * running them with autopkgtest's [[http://manpages.ubuntu.com/autopkgtest-virt-null|null runner]]. This only really works for very few tests, but is by far the fastest and does not need any extra setup. Do this if your work does not depend on particular tests or testbed configuration. Useful tests are "gzip" (practically instant, will pass) and "coreutils" (will take some 40 seconds, useful for e. g. developing the "Currently running tests" page). Create a cloud config directory, and copy your cloud config rc file as either ``null.rc`` or ``canonistack.rc`` depending on your choice above. Import the cloud config into the environment (that's where the swift logs will be stored), and call ``deploy.sh``: {{{ mkdir /tmp/testrc cp path/to/canonistack/novarc /tmp/testrc/null.rc . /tmp/testrc/null.rc autopkgtest-cloud/deployment/deploy.sh /tmp/testrc/ }}} These ``*.rc`` names correspond to the [[https://git.launchpad.net/autopkgtest-cloud/tree/worker-config-production/worker-null.conf|worker-null.conf]] and [[https://git.launchpad.net/autopkgtest-cloud/tree/worker-config-production/worker-canonistack.conf|worker-canonistack.conf]] production configurations. The second argument to `deploy.sh` for the autopkgtest-web SSL certificate is optional. If not given (like above), SSL will be disabled. Otherwise you can point it to a directory with an SSL `*.crt` and `*.key` as in above "production deployment". |
|
Line 20: | Line 217: |
= debci results browser = = Deployment = |
Debian's britney2 does not integrate with autopkgtests, so [[https://git.launchpad.net/~ubuntu-release/britney/+git/britney2-ubuntu|Ubuntu's fork]] modifies it to do so. All the logic for determining the set of tests to run for a particular package, submitting the requests, and collecting the results are contained in the [[https://git.launchpad.net/~ubuntu-release/britney/+git/britney2-ubuntu/tree/britney2/policies/autopkgtest.py|autopkgtest.py Policy module]]. Tests for a lot of scenarios and bug reproducers are in [[https://git.launchpad.net/~ubuntu-release/britney/+git/britney2-ubuntu/tree/tests/test_autopkgtest.py|tests/test_autopkgtest.py]] which you can just run without further setup (it creates a temporary config and archive for every test case). Interfacing with the cloud happens via AMQP for requesting a test (e. g. sending a message `firefox [params]` to the `debci-trusty-armhf` queue) and by downloading new `result.tar` results from swift on each run. Thus britney only directly depends on the RabbitMQ service and swift, no other services in the cloud. Of course there must be some workers somewhere which actually process the requests, otherwise the triggered tests will stay "in progress" forever. = Integration with GitHub and GitLab pull/merge requests = autopkgtest-cloud can be used as a Git``Hub or Git``Lab web hook for triggering tests on PR/MR creation/changes. == Preparing the test == You need to have an autopkgtest for your project that is in some git branch. This can be in the actual Git``Hub project repo, but it's also possible and plausible to re-use the existing autopkgtest in the Ubuntu packaging git and just adjusting this a little to work for upstream PR tests. For example, you might want to disable `dh_install --fail-missing` or strict `dpkg-gensymbols` checking when testing an upstream PR so that you don't always need to adjust the packaging for these. This can be controlled through environment variables which get defined in the Git``Hub web hook and passed to your test. autopkgtest-cloud itself always provides `$UPSTREAM_PULL_REQUEST` with the PR number. If the tests live in the actual Git``Hub repo, this is all that is needed. If the tests live in the Debian/Ubuntu packaging repo, then your downstream `debian/rules` must ensure that, before it starts the package build, it replaces the downstream code from its own checkout with an upstream checkout of the pull request (and also drop all local patches). Look at [[https://salsa.debian.org/systemd-team/systemd/blob/master/debian/rules|systemd's debian/rules]] for an example, search for `TEST_UPSTREAM`. However you want to structure your test, ensure that it works locally with a command like {{{ autopkgtest --apt-upgrade https://coolcode.projects.org/foo.git \ --env UPSTREAM_PULL_REQUEST=1234 --env TEST_UPSTREAM=1 -- \ qemu autopkgtest-xenial-amd64.img }}} == Web hook setup == The Git``Hub project admin and a maintainer of the autopkgtest infrastructure need to exchange a webhook password for triggering tests and an auth token for sending status notifications back to Git``Hub. On the Git``Hub project side: 1. Go to the project's Settings → Webhooks → Add webhook 1. The payload URL is a call to [[https://git.launchpad.net/autopkgtest-cloud/tree/webcontrol/request.cgi|request.cgi]] with the desired parameters: * `release` and `arch` determine the Ubuntu image in which you want to run the test. * `build-git` is the git clone URL of the repo that provides the autopkgtest (`debian/tests/`). If it's a Debian/Ubuntu packaging repo, that must check out the corresponding upstream code from the PR by itself (look at [[https://salsa.debian.org/systemd-team/systemd/blob/master/debian/rules|systemd's debian/rules]] for an example, search for `TEST_UPSTREAM`). If the Git``Hub project to be tested contains the autopkgtest by itself, then don't specify this parameter at all; it will be dynamically generated as `clone_url#refs/pull/<PR number>/head`. * `package` is merely an identifier for the project name/test which will be used for the results in swift. It is ''not'' related to Ubuntu package names at all, as the test will come from a git branch. Use the project name, possibly with some suffix like `-master` if you have several different kinds of tests. * `ppa` specifies a `launchpaduser/ppaname`. This must always be present so that the results don't land in the Ubuntu results Swift containers. The PPA is being added during the test run; it may be empty, but it is commonly used to provide some package backports when running tests on older releases. /!\ The PPA must publish indexes for the target release, so you must have copied/published at least one package to that series (it is okay to delete it again afterwards, Launchpad will keep the indexes for that series). * `env` can specify one or multiple (separated with `;`) environment variables which are passed to the test. You can use that to speed up builds (`CFLAGS=-O0`) or change the behaviour of your tests (`TEST_UPSTREAM=1`). Note that the entire payload URL must be properly escaped as Git``Hub is very picky about it. Example: `https://autopkgtest.ubuntu.com/request.cgi?release=xenial&arch=amd64&build-git=https%3A%2F%2Fgit.launchpad.net%2F~pitti%2F%2Bgit%2Fsystemd-debian&env=CFLAGS%3D-O0%3BDEB_BUILD_PROFILES%3Dnoudeb%3BTEST_UPSTREAM%3D1&package=systemd-upstream&ppa=pitti%2Fsystemd-semaphore` 1. Generate a random password (e. g. `pwgen -N 1 15`) for the "Secret". 1. In the "Which events" section, select "individual events" and in there "Push" and "Pull request". 1. Leave the other settings at their defaults and press "Add webhook". 1. Create the access token for test status updates: * Go to the user mugshot at the top right → Settings → Developer settings → Personal access tokens → Generate new token * Use something like "get status updates for PR test requests from autopkgtest.ubuntu.com" as the description * Select ''only'' `repo:status` as scope. * Press "Generate", and note down the token value; you will never be able to retrieve it again from that page later. On the autopkgtest side on `wendigo.canonical.com:~prod-ues-proposed-migration`: 1. Add the new project name and webhook password to `credentials/github-secrets.json`. Make ''double sure'' to not break JSON formatting (e. g. trailing commas). 1. Add the new developer name and token for the chosen `package` from above (i. e. project name) to `credentials/github-status-credentials.txt`. 1. Deploy the updated credentials to `autopkgtest-web/0` with {{{ juju set autopkgtest-web github-status-credentials="$(cat credentials/github-status-credentials.txt)" \ github-secrets="$(cat credentials/github-secrets.json)" }}} You can debug what's going on with `tail -f /var/log/apache2/{access,error}.log` on `juju ssh autopkgtest-web/0`. Test the setup with some dummy PR that changes some README or similar. You can then re-trigger new tests by force-pushing to the branch. Once everything works, you can add more web hooks with different test parameters to e. g. trigger tests on multiple architectures or multiple Ubuntu releases. == Retrying tests == [[https://git.launchpad.net/autopkgtest-cloud/tree/tools/retry-github-test|retry-github-test]] is a self-contained script to emulate what Git``Hub does for running a PR test request on autopkgtest.ubuntu.com. You need to supply the Git``Hub API URL of the PR, the exact same test payload URL as you specified in the webhook above, and a file path that contains the web hook secret. For example, this retries PR #123 of systemd on amd64: {{{ $ tools/retry-github-test https://api.github.com/repos/systemd/systemd/pulls/123 \ 'https://autopkgtest.ubuntu.com/request.cgi?release=xenial&arch=amd64&build-git=https%3A%2F2Fgit.launchpad.net%2F~pitti%2F%2Bgit%2Fsystemd-debian&env=CFLAGS%3D-O0%3BDEB_BUILD_PROFILES%3Dnoudeb%3BTEST_UPSTREAM%3D1&package=systemd-upstream&ppa=pitti%2Fsystemd-semaphore' \ path/to/github-secret-systemd.txt }}} = Administration = == Deploy updated SSL certificates == {{{ $ juju set autopkgtest-web ssl-cert="$(cat autopkgtest.ubuntu.com.crt autopkgtest.ubuntu.com_chain.crt)" ssl-key="$(cat *.key)" }}} == Show current tests/requests == [[http://autopkgtest.ubuntu.com/running|http://autopkgtest.ubuntu.com → Running]] shows the currently running and queued tests. Alternatively, you can use some shell commands: * Show queue lengths: {{{ ssh wendigo.canonical.com sudo -H -u prod-ues-proposed-migration \ juju ssh rabbitmq-server/0 sudo rabbitmqctl list_queues }}} * Show currently running tests: {{{ ssh wendigo.canonical.com sudo -H -u prod-ues-proposed-migration \ juju ssh autopkgtest-cloud-worker/0 pgrep -af runner/autopkgtest }}} == Re-running tests == * Britney's [[http://people.canonical.com/~ubuntu-archive/proposed-migration/update_excuses.html|excuses.html]] has retry symbols ♻ after "Regression"s, which submit a test request via [[https://git.launchpad.net/autopkgtest-cloud/tree/webcontrol|autopkgtest-cloud's webcontrol]]. * Requesting individual manual runs can also be done with britney's [[http://bazaar.launchpad.net/~ubuntu-release/britney/britney2-ubuntu/view/head:/run-autopkgtest|run-autopkgtest]] script on snakefruit. Due to firewalling this currently can only be run on snakefruit, so define this shell alias: {{{ alias run-autopkgtest='ssh snakefruit.canonical.com sudo -i -u ubuntu-archive run-autopkgtest' }}} Then you can run `run-autopkgtest --help` to see the usage. E. g. {{{ # specific architecture run-autopkgtest -s xenial -a armhf --trigger glib2.0/2.46.1-2 libpng udisks2 # all configured britney architectures (current default: i386, amd64, ppc64el, armhf) run-autopkgtest -s xenial --trigger glibc/2.21-0ubuntu4 libpng udisks2 }}} Note that you must always submit a correct "trigger", i. e. the package/version on excuses.html that caused this test to run. This is necessary so that britney can correctly map results to requests and as we only use packages from -proposed for the trigger (via apt pinning). This apt pinning can be disabled with the `--all-proposed` option. If `--all-proposed` is too broad, you can alternatively just specify `--trigger` multiple times, for all packages in -proposed that need to be tested and landed together: {{{ run-autopkgtest -s xenial --trigger php-foo/1-1 --trigger php-foo-helpers/2-2 php-foo }}} * [[https://code.launchpad.net/~ubuntu-archive/ubuntu-archive-tools/trunk|lp:ubuntu-archive-tools]] contains a script [[http://bazaar.launchpad.net/~ubuntu-archive/ubuntu-archive-tools/trunk/view/head:/retry-autopkgtest-regressions|retry-autopkgtest-regressions]] which will build a series of `request.cgi` URLs for re-running all current regressions. It has options for picking a different series, running for a bileto PPA, or for a different test state (e. g. `--state=RUNNING` is useful to requeue lost test requests). You can also limit the age range. See `--help` for details and how to run it efficiently. == re-queueing all outstanding test requests == If rabbitmq has an issue and ends up dumping all of the pending test requests, you can get proposed-migration to requeue them. Ensure it is not running, and as `ubuntu-archive@snakefruit`, remove `~ubuntu-archive/proposed-migration/data/RELEASE-proposed/autopkgtest/pending.json`. Then on the next run, proposed-migration will have forgotten that it queued any tests and will re-request them all. (This will include any which are currently running - if that is a concern, stop britney and wait until these jobs finish and next time the result will be fetched and the test request not duplicated.) == Worker administration == * '''Autopkgtest controller access:''' Most workers (for i386, amd64, ppc64el) are running in a ProdStack instance of juju service `autopkgtest-cloud-worker/0`: {{{ ssh -t wendigo.canonical.com sudo -H -u prod-ues-proposed-migration juju ssh autopkgtest-cloud-worker/0 }}} Consider defining a shell alias for this for convenience. You can see which workers are running with {{{ systemctl list-units autopkgtest\*|sort }}} * '''Rolling out new worker code/config:''' * Adjust the [[https://git.launchpad.net/autopkgtest-cloud/tree/worker-config-production|worker-config-production/*.conf|worker*.conf]] configuration files, commit them. * Run `git pull` in the `autopkgtest-cloud/` checkout on `autopkgtest-cloud-worker/0` * Run `pkill -e -HUP worker`. This will signal the workers to finish their currently running test and then cleanly exit; the `autopkgtest@.service` units will then restart after five minutes. * '''Stopping all workers:''' For general cloud/worker administration or other debugging you might want to stop all workers. Run `pkill -ef worker/worker`; this signals the workers to finish their currently running test and then cleanly exit; contrary to `SIGHUP` the workers will then ''not'' auto-restart. If you want/need to stop all workers immediately and thus kill running tests (they will be auto-retried once workers come back online), run `pkill -ef runner/autopkgtest; pkill -9 -ef worker/worker` instead. * '''External LXC workers:''' The [[https://code.launchpad.net/~auto-package-testing-dev/auto-package-testing/trunk|lp:auto-package-testing]] branch has some [[http://bazaar.launchpad.net/~auto-package-testing-dev/auto-package-testing/trunk/files/head:/slave-admin/|scripts in the slave-admin dir]] which help with maintaining the external servers which run LXC autopkgtests. On these there are a system units `autopkgtest-lxc-worker@N.service` which run the LXC workers. You can see their status and which test they are currently running with: {{{ ./cmd s390x systemctl status "'autopkgtest*'" }}} `./cmd` is just a thin wrapper around `parallel-ssh`, which is a convenient way to mass-admin these boxes. * '''Rolling out new LXD hosts:''' armhf tests are currently run on arm64 guests in ScalingStack's bos02 region. The tests are dispatched from a second instance of `autopkgtest-cloud-worker`, called `autopkgtest-lxd-worker`. The instances themselves are managed directly from `prod-ues-proposed-migration@wendigo`. To deploy a new instance, run: * `. ~/.scalingstack/bos02.rc` * `NET_ID=$(nova network-show net_ues_proposed_migration | awk '/[[:space:]]id[[:space:]]/ { print $4 }')` * `IMAGE=$(nova image-list | grep auto-sync/ubuntu-$(distro-info --lts)-$(distro-info --lts -r | cut -f1 -d' ')-arm64 | tail -n1 | awk '{ print $4 }')` * `nova boot --poll --image $IMAGE --flavor m1.large --nic net-id=$NET_ID --key_name wendigo --security-groups default,lxd --user-data autopkgtest-cloud/tools/armhf-lxd-slave.userdata -- lxd-armhfNEXT_FREE_NUMBER` Once this is booted, you can SSH in (`ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -l ubuntu IP`) and watch `/var/log/cloud-init-output.log`. The userdata script will build lxd images and reboot the machine once setup is done. Once this is finished, edit `credentials/lxd-remotes.conf` and add `armhf IP 3` (3 is the number of parallel tasks). Finally commit this to the worker by running `juju set autopkgtest-lxd-worker lxd-remotes="$(cat credentials/lxd-remotes.conf)"`. Log into the autopkgtest-lxd-worker and check that the LXD remotes are configured correctly (`lxc remote list`). If they are not, try `systemctl restart autopkgtest-lxd-socat@lxd-armhf-IP.service`. You can then follow the journal `journalctl -fe -u autopkgtest@lxd-armhf-IP*` to make sure that jobs are being processed correctly. * '''Creating new LXD images before official ones are available:''' on each LXD remote host machine, run `RELEASE=<new release> autopkgtest/tools/autopkgtest-build-lxd images:ubuntu/<old release>/armhf` * '''Journal log analysis''': Logs starting 2019-02-22 can be analyzed using journal fields `ADT_PACKAGE`, `ADT_ARCH`, `ADT_RELEASE`, and `ADT_PARAMS`, though the latter might be useless. For example, `journalctl ADT_PACKAGE=autopkgtest` shows all worker logs for tests of autopkgtest. == Opening up a new series == * Run `seed-new-release <old_release> <new_release> autopkgtest.db` on wendigo. * Update all worker configs in lp:autopkgtest-cloud to include the new series in their `releases = ` config entry * Make sure a new distro-info with the new series is available (if not, temporarily hack in the new series into the distro-info-data ubuntu.csv on all worker and web nodes). * Build new lxd images on each lxd-armhf node `RELEASE=cosmic autopkgtest/tools/autopkgtest-build-lxd images:ubuntu/bionic/armhf` * Build cloud images `MIRROR=http://ftpmaster.internal RELEASE=cosmic autopkgtest-cloud/tools/build-adt-image-all-clouds autopkgtest/setup-commands/setup-testbed --bootstrap` * Make sure britney config is in place on snakefruit (in ~ubuntu-archive/proposed-migration/code/b2/britney.conf.ubuntu.cosmic). = Bug reporting = Please report any bugs against [[https://bugs.launchpad.net/auto-package-testing/]]. |
This describes the machinery we use to run autopkgtests for gating uploaded packages into the development series.
Contents
Architecture Overview
Test result store
Swift
The swift object store is being used as the central API for storing and querying results. This ensures that logs are kept safe in a redundant non-SPOF storage, and we do not keep primary data in any cloud instance. Thus we can completely re-deploy the whole system (or any part of it can fatally fail) without losing test results and logs. Swift also provides a flexible API for querying particular results so that consumers (like web interfaces for result browsing, report builders, or proposed-migration) can easily find results based on releases, architectures, package names, and/or time stamps. For this purpose the containers are all publicly readable and browsable, so that no credentials are needed.
Container Names
Logs and artifacts are stored in one container autopkgtest-release for every release, as we want to keep the logs throughout the lifetime of a release and thus it's easy to remove them after EOLing. Results for PPAs are stored in the container autopkgtest-release-lpuser-ppaname (e. g. autopkgtest-wily-pitti-systemd).
Container Layout
In order to allow efficient querying and polling for new results, the logs are stored in this (pseudo-)directory structure:
- /release/architecture/prefix/sourcepkg/YYYYMMDD_HHMMSS@/autopkgtest_output_files
"prefix" is the first letter (or first four letters if it starts with "lib") of the source package name, as usual for Debian-style archives. Example: /trusty/amd64/libp/libpng/20140321_130412@/log.gz
The '@' character is a convenient separator for using with a container query's delimiter=@ option: With that you can list all the test runs without getting the individual files for each run.
The result files are by and large the contents of autopkgtest's --output-directory plus an extra file exitcode with autopkgtest's exit code; these files are grouped and tar'ed/compressed:
result.tar contains the minimum files/information which clients like proposed-migration or web result browsers need to enumerate test runs and see their package names/versions/outcome: exitcode, testpkg-version, duration, and testbed-packages. All of these are very small (typically ~ 10 kB), thus it's fine to download and cache this information locally for fast access.
log.gz is the compressed log from autopkgtest. Clients don't need to download and parse this, but it's the main thing developers look at, so it should be directly linkable/accessible. These have a proper MIME type and MIME encoding so that they can be viewed inline in a browser.
artifacts.tar.gz contains testname-{stdout,stderr,packages} and any test specific additional artifacts. Like the log, these are not necessary for machine clients making decisions, but should be linked from the web UI and be available to developers.
Due to Swift's "eventual consistency" property, we can't rely on a group of files (like exit-code and testpkg-version) to be visible at exactly the same time for a particular client, so we must store them in result.tar to achieve atomicity instead of storing them individually.
Example queries
Please read the Swift container API for the precise meaning of these. The current public Swift URL for the production infrastructure is
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
- List all available files for Ubuntu trusty:
Note that swift returns max. 10.000 results at a time; retrieve the next batch with giving the last result path in marker=, until you get no further results:
List all runs of udisks2 on amd64 by using a prefix match, and without the individual files by using delimiter=@ (see above):
- If you poll swift for results regularly, you should remember the last timestamp per release/package/arch, and then just ask for newer results. E. g. the same query as above, but results newer than 2015-09-01 01:02:10:
List results for tests with enabling the pitti/systemd-semaphore PPA, in JSON format instead of plain text:
AMQP queues
RabbitMQ server
AMQP (we use the RabbitMQ server implementation) provides a very robust and simple to use job distribution system, i. e. to coordinate running test requests amongst an arbitrary number of workers. We use explicit ACKs, and ACK only after a test request has been fully processed and its logs stored in swift. Should a worker or a test run fail anywhere in between and the request does not get ACK'ed, it will just be handed to the next worker. This ensures that we never lose test requests in the event of worker failures.
RabbitMQ provides failover with mirrored queues to avoid a single point of failure. This is not currently being used, as RabbitMQ is very robust and runs in its own cloud instance (Juju service rabbitmq-server).
Queue structure
We want to use a reasonably fine-grained queue structure so that we can support workers that serve only certain releases, architectures, virtualization servers, real hardware, etc. For example: debci-wily-amd64 or debci-trusty-armhf. As test requests are not long-lived objects, we remain flexible here and can introduce further granularity as needed; e. g. we might want a trusty-amd64-laptop-nvidia (i. e. running on bare metal without virtualization) queue in the future.
Test request format
A particular test request (i. e. a queue message) has the format srcpkgname <parameter JSON>.
The following parameters are currently supported:
triggers: List of trigsrcpkgname/version strings of packages which caused srcpkgname to run (i. e. triggered the srcpkgname test). Ubuntu test requests issued by proposed-migration should always contain this, so that a particular test run for srcpkgname can be mapped to a new version of trigsrcpkgname in -proposed. In case multiple reverse dependencies trigsrc1 and trigsrc2 of srcpkgname get uploaded to -proposed around the same time, the trigger list can contain multiple entries.
ppas: List of PPA specification strings lpuser/ppaname. When given, ask Launchpad for the PPAs' GPG fingerprints and add setup commands to install the GPG keys and PPA apt sources. In this case the result is put into the container "autopkgtest-release-lpuser-ppaname" for the last entry in the list; this is is fine grained enough for easy lifecycle management (e. g. remove results for old releases wholesale) and still predictable to the caller for polling results.
env: List of VAR=value strings. These get passed verbatim to autopkgtest's `--env option. This can be used to influence a test's behaviour from a test request.
test-git: A single URL or URL branchname. The test will be git cloned from that URL (if given, a non-default branch will be checked out) and ran from the checkout. This will not build binary packages from the branch and run tests against those, the test dependencies will be taken from the archive, or PPA if given. The srcpkgname will only be used for the result path in swift and be irrelevant for the actual test.
build-git: Like test-git, except that this will first build binary packages from the branch and run tests against those.
test-bzr: A single URL. The test will be checked out with bzr from that URL. Otherwise this has the same behaviour as test-git.
all-proposed: If this is set to 1, apt pinning to only use the trigger package from -proposed will be disabled, and the test will run against all of -proposed. This is sometimes necessary when several packages need to land in lockstep but don't declare versioned Depends:/Breaks: to each other, but might cause mis-blaming if some other package than the trigger got broken in -proposed.
testname: If given, this gets forwarded to autopkgtest's --testname option to run a single test only.
Examples:
A typical request issued by proposed-migration when a new glib2.0 2.20-1 is uploaded and we want to test one of its reverse dependencies gedit:
gedit {"triggers": ["glib2.0/2.20-1"]}
Run the systemd package tests against the packages in the pitti/systemd PPA:
systemd {"ppas": ["pitti/systemd"]}
Run the unity8 package tests against the packages in the stable phone overlay PPA and the ci-train-ppa-service/landing-001 silo PPA:
unity8 {"ppas": ["ci-train-ppa-service/stable-phone-overlay", "ci-train-ppa-service/landing-003"]}
Run the gedit tests under a different env variable:
gedit {"env": ["XDG_SESSION_DESKTOP=xfce"]}
Juju service
This uses the standard charm store RabbitMQ charm with some customizations:
- Remove almighty "guest" user
- Create user for test requests with random password and limited capabilities (nothing else than creating new messages); these are the credentials for clients like proposed-migration
As usual with the charm, worker services create a relation to the RabbitMQ service, which creates individual credentials for them.
The rabbitmq-server Juju service is exposed on a "public" IP (162.213.33.228), but accessible only within the Canonical VPN and firewalled to only be accessible from snakefruit.canonical.com (the proposed-migration host running britney) and any external workers.
Workers
worker process and its configuration
The worker script is the main workhorse which consumes one AMQP request at a time, runs autopkgtest, and uploads the results/artifacts into swift. Configuration happens in worker.conf; the options should be fairly self-explanatory. The virtualization server is configured in the [virt] section and can use various $VARIABLE substitutions.
Worker service in the cloud
The autopkgtest-cloud-worker Juju charm sets up a cloud instance which runs several parallel worker instances for each cloud instance (a few less than the maximum allowed number of instances). This is done through the autopkgtest.target systemd unit which pulls in some instances of autopkgtest@.service.
The workers use the config files in worker-config-production/worker*.conf. The macros like #SWIFT_PASSWORD# are filled in by autopkgtest@.service. If you change the configs, you need to pkill -HUP worker to restart the worker processes. They will gracefully handle SIGHUP and finish running the current test before they restart.
The cloud workers are being used for running i386/amd64/arm64/ppc64el tests in the Canonical Scalingstack cloud. They use the autopkgtest-virt-ssh runner with the nova ssh setup script. This assumes that the nova credentials are already present in the environment ($OS_*).
Note that we currently use two cloud instances to control all parallel worker and autopkgtest processes. This is reasonably reliable as on that instance autopkgtest effectively just calls some nova/lxc commands and copies the results back and forth. The actual tests are executed either in ephemeral VMs in ScalingStack or within lxc containers managed via lxd which is also running on ScalingStack VMs.
External workers
These work quite similarly to the cloud ones. You can run one (or several) worker instances with an appropriate worker.conf on any host that is allowed to talk to the RabbitMQ service and swift; i. e. this is mostly just an exercise in sending RT tickets to Canonical to open the firewall accordingly. But remember that all workers need to be within the Canonical VPN.
We currently have such a setup on two zVMs for s390x which use the autopkgtest-virt-lxc virtualization server. Once ScalingStack supports these architectures these will go away.
Web Results Browser
The autopkgtest-cloud webcontrol module contains the components for presenting the test results:
download-results downloads all new results.tar.gz files from Swift and puts their information into an SQLite database. This script gets called by cron every 5 minutes.
amqp-status-collector listens to the teststatus.fanout AMQP queue and updates /tmp/running.json with the currently running tests and their logtails.
browse.cgi is a simple Flask app that renders results and statistics from the above database, currently running tests from /tmp/running.json, and queued tests from introspecting the debci-* AMQP queues.
request.cgi provides a Launchpad SSO authenticated CGI API for (re)triggering test requests for Ubuntu and github. Pages like britney's excuses.html link to it for retrying regressions.
The autopkgtest-web charm has a relation to RabbitMQ for listening to the teststatus.fanout AMQP queue and submitting test requests via request.cgi. Aside from that it's entirely independent from britney, the workers, and all other components.
Deployment
Production deployment from wendigo
Everything that's necessary to deploy and configure all services into a freshly bootstrapped Juju environment are contained in deploy.sh:
prod-ues-proposed-migration@wendigo:~$ autopkgtest-cloud/deployment/deploy.sh ~/.scalingstack/ ssl-autopkgtest.ubuntu.com/
The first argument is the directory of all nova RC files that you want to use to run actual tests (should be the various ScalingStack regions). Note that their names must end in .rc.
The second argument is the directory with the SSL *.crt and *.key for https://autopkgtest.ubuntu.com.
You can also use deploy.sh for re-deploying a single service after you juju destroy-service'd it.
deploy.sh deploys basenode/ksplice/landscape into all instances, deploys the above RabbitMQ, worker, and web charms, and does the necessary public IP attachments and exposes. At the end it prints credentials to be used by britney (or other entities requesting tests): These credentials can only be used to publish new test requests, not for consuming them or doing any other queue administration. This needs to be copied to britney.conf on snakefruit.canonical.com.
The first time after Scalingstacks get set up, you need to add a firewall rule to allow ssh access from Prodstack:
nova secgroup-add-rule default tcp 22 22 162.213.33.179/32
Run this on every ScalingStack region you are going to use (lcy01, lgw01, bos02).
Local deployment
For developing this infrastructure you can deploy with juju-local into containers. deploy.sh works with that, and will also not try to install the landscape/ksplice charms.
This still needs a cloud for storing the test results in swift. For running the actual tests you can choose between:
- running them in e. g. Canonistack (which is not very reliable, though) or another cloud. Do this if you want to work on the autopkgtest nova backend, PPAs, or cloud configuration/quirks.
running them with autopkgtest's null runner. This only really works for very few tests, but is by far the fastest and does not need any extra setup. Do this if your work does not depend on particular tests or testbed configuration. Useful tests are "gzip" (practically instant, will pass) and "coreutils" (will take some 40 seconds, useful for e. g. developing the "Currently running tests" page).
Create a cloud config directory, and copy your cloud config rc file as either null.rc or canonistack.rc depending on your choice above. Import the cloud config into the environment (that's where the swift logs will be stored), and call deploy.sh:
mkdir /tmp/testrc cp path/to/canonistack/novarc /tmp/testrc/null.rc . /tmp/testrc/null.rc autopkgtest-cloud/deployment/deploy.sh /tmp/testrc/
These *.rc names correspond to the worker-null.conf and worker-canonistack.conf production configurations.
The second argument to deploy.sh for the autopkgtest-web SSL certificate is optional. If not given (like above), SSL will be disabled. Otherwise you can point it to a directory with an SSL *.crt and *.key as in above "production deployment".
Integration with proposed-migration (britney)
Debian's britney2 does not integrate with autopkgtests, so Ubuntu's fork modifies it to do so. All the logic for determining the set of tests to run for a particular package, submitting the requests, and collecting the results are contained in the autopkgtest.py Policy module. Tests for a lot of scenarios and bug reproducers are in tests/test_autopkgtest.py which you can just run without further setup (it creates a temporary config and archive for every test case).
Interfacing with the cloud happens via AMQP for requesting a test (e. g. sending a message firefox [params] to the debci-trusty-armhf queue) and by downloading new result.tar results from swift on each run. Thus britney only directly depends on the RabbitMQ service and swift, no other services in the cloud. Of course there must be some workers somewhere which actually process the requests, otherwise the triggered tests will stay "in progress" forever.
Integration with GitHub and GitLab pull/merge requests
autopkgtest-cloud can be used as a GitHub or GitLab web hook for triggering tests on PR/MR creation/changes.
Preparing the test
You need to have an autopkgtest for your project that is in some git branch. This can be in the actual GitHub project repo, but it's also possible and plausible to re-use the existing autopkgtest in the Ubuntu packaging git and just adjusting this a little to work for upstream PR tests. For example, you might want to disable dh_install --fail-missing or strict dpkg-gensymbols checking when testing an upstream PR so that you don't always need to adjust the packaging for these. This can be controlled through environment variables which get defined in the GitHub web hook and passed to your test. autopkgtest-cloud itself always provides $UPSTREAM_PULL_REQUEST with the PR number.
If the tests live in the actual GitHub repo, this is all that is needed. If the tests live in the Debian/Ubuntu packaging repo, then your downstream debian/rules must ensure that, before it starts the package build, it replaces the downstream code from its own checkout with an upstream checkout of the pull request (and also drop all local patches). Look at systemd's debian/rules for an example, search for TEST_UPSTREAM.
However you want to structure your test, ensure that it works locally with a command like
autopkgtest --apt-upgrade https://coolcode.projects.org/foo.git \ --env UPSTREAM_PULL_REQUEST=1234 --env TEST_UPSTREAM=1 -- \ qemu autopkgtest-xenial-amd64.img
Web hook setup
The GitHub project admin and a maintainer of the autopkgtest infrastructure need to exchange a webhook password for triggering tests and an auth token for sending status notifications back to GitHub.
On the GitHub project side:
- Go to the project's Settings → Webhooks → Add webhook
The payload URL is a call to request.cgi with the desired parameters:
release and arch determine the Ubuntu image in which you want to run the test.
build-git is the git clone URL of the repo that provides the autopkgtest (debian/tests/). If it's a Debian/Ubuntu packaging repo, that must check out the corresponding upstream code from the PR by itself (look at systemd's debian/rules for an example, search for TEST_UPSTREAM). If the GitHub project to be tested contains the autopkgtest by itself, then don't specify this parameter at all; it will be dynamically generated as clone_url#refs/pull/<PR number>/head.
package is merely an identifier for the project name/test which will be used for the results in swift. It is not related to Ubuntu package names at all, as the test will come from a git branch. Use the project name, possibly with some suffix like -master if you have several different kinds of tests.
ppa specifies a launchpaduser/ppaname. This must always be present so that the results don't land in the Ubuntu results Swift containers. The PPA is being added during the test run; it may be empty, but it is commonly used to provide some package backports when running tests on older releases. The PPA must publish indexes for the target release, so you must have copied/published at least one package to that series (it is okay to delete it again afterwards, Launchpad will keep the indexes for that series).
env can specify one or multiple (separated with ;) environment variables which are passed to the test. You can use that to speed up builds (CFLAGS=-O0) or change the behaviour of your tests (TEST_UPSTREAM=1).
Note that the entire payload URL must be properly escaped as GitHub is very picky about it. Example:
https://autopkgtest.ubuntu.com/request.cgi?release=xenial&arch=amd64&build-git=https%3A%2F%2Fgit.launchpad.net%2F~pitti%2F%2Bgit%2Fsystemd-debian&env=CFLAGS%3D-O0%3BDEB_BUILD_PROFILES%3Dnoudeb%3BTEST_UPSTREAM%3D1&package=systemd-upstream&ppa=pitti%2Fsystemd-semaphore
Generate a random password (e. g. pwgen -N 1 15) for the "Secret".
- In the "Which events" section, select "individual events" and in there "Push" and "Pull request".
- Leave the other settings at their defaults and press "Add webhook".
- Create the access token for test status updates:
- Go to the user mugshot at the top right → Settings → Developer settings → Personal access tokens → Generate new token
- Use something like "get status updates for PR test requests from autopkgtest.ubuntu.com" as the description
Select only repo:status as scope.
- Press "Generate", and note down the token value; you will never be able to retrieve it again from that page later.
On the autopkgtest side on wendigo.canonical.com:~prod-ues-proposed-migration:
Add the new project name and webhook password to credentials/github-secrets.json. Make double sure to not break JSON formatting (e. g. trailing commas).
Add the new developer name and token for the chosen package from above (i. e. project name) to credentials/github-status-credentials.txt.
Deploy the updated credentials to autopkgtest-web/0 with
juju set autopkgtest-web github-status-credentials="$(cat credentials/github-status-credentials.txt)" \ github-secrets="$(cat credentials/github-secrets.json)"
You can debug what's going on with tail -f /var/log/apache2/{access,error}.log on juju ssh autopkgtest-web/0.
Test the setup with some dummy PR that changes some README or similar. You can then re-trigger new tests by force-pushing to the branch. Once everything works, you can add more web hooks with different test parameters to e. g. trigger tests on multiple architectures or multiple Ubuntu releases.
Retrying tests
retry-github-test is a self-contained script to emulate what GitHub does for running a PR test request on autopkgtest.ubuntu.com. You need to supply the GitHub API URL of the PR, the exact same test payload URL as you specified in the webhook above, and a file path that contains the web hook secret.
For example, this retries PR #123 of systemd on amd64:
$ tools/retry-github-test https://api.github.com/repos/systemd/systemd/pulls/123 \ 'https://autopkgtest.ubuntu.com/request.cgi?release=xenial&arch=amd64&build-git=https%3A%2F2Fgit.launchpad.net%2F~pitti%2F%2Bgit%2Fsystemd-debian&env=CFLAGS%3D-O0%3BDEB_BUILD_PROFILES%3Dnoudeb%3BTEST_UPSTREAM%3D1&package=systemd-upstream&ppa=pitti%2Fsystemd-semaphore' \ path/to/github-secret-systemd.txt
Administration
Deploy updated SSL certificates
$ juju set autopkgtest-web ssl-cert="$(cat autopkgtest.ubuntu.com.crt autopkgtest.ubuntu.com_chain.crt)" ssl-key="$(cat *.key)"
Show current tests/requests
http://autopkgtest.ubuntu.com → Running shows the currently running and queued tests. Alternatively, you can use some shell commands:
- Show queue lengths:
ssh wendigo.canonical.com sudo -H -u prod-ues-proposed-migration \ juju ssh rabbitmq-server/0 sudo rabbitmqctl list_queues
- Show currently running tests:
ssh wendigo.canonical.com sudo -H -u prod-ues-proposed-migration \ juju ssh autopkgtest-cloud-worker/0 pgrep -af runner/autopkgtest
Re-running tests
Britney's excuses.html has retry symbols ♻ after "Regression"s, which submit a test request via autopkgtest-cloud's webcontrol.
Requesting individual manual runs can also be done with britney's run-autopkgtest script on snakefruit. Due to firewalling this currently can only be run on snakefruit, so define this shell alias:
alias run-autopkgtest='ssh snakefruit.canonical.com sudo -i -u ubuntu-archive run-autopkgtest'
Then you can run run-autopkgtest --help to see the usage. E. g.
# specific architecture run-autopkgtest -s xenial -a armhf --trigger glib2.0/2.46.1-2 libpng udisks2 # all configured britney architectures (current default: i386, amd64, ppc64el, armhf) run-autopkgtest -s xenial --trigger glibc/2.21-0ubuntu4 libpng udisks2
Note that you must always submit a correct "trigger", i. e. the package/version on excuses.html that caused this test to run. This is necessary so that britney can correctly map results to requests and as we only use packages from -proposed for the trigger (via apt pinning). This apt pinning can be disabled with the --all-proposed option.
If --all-proposed is too broad, you can alternatively just specify --trigger multiple times, for all packages in -proposed that need to be tested and landed together:
run-autopkgtest -s xenial --trigger php-foo/1-1 --trigger php-foo-helpers/2-2 php-foo
lp:ubuntu-archive-tools contains a script retry-autopkgtest-regressions which will build a series of request.cgi URLs for re-running all current regressions. It has options for picking a different series, running for a bileto PPA, or for a different test state (e. g. --state=RUNNING is useful to requeue lost test requests). You can also limit the age range. See --help for details and how to run it efficiently.
re-queueing all outstanding test requests
If rabbitmq has an issue and ends up dumping all of the pending test requests, you can get proposed-migration to requeue them. Ensure it is not running, and as ubuntu-archive@snakefruit, remove ~ubuntu-archive/proposed-migration/data/RELEASE-proposed/autopkgtest/pending.json. Then on the next run, proposed-migration will have forgotten that it queued any tests and will re-request them all. (This will include any which are currently running - if that is a concern, stop britney and wait until these jobs finish and next time the result will be fetched and the test request not duplicated.)
Worker administration
Autopkgtest controller access: Most workers (for i386, amd64, ppc64el) are running in a ProdStack instance of juju service autopkgtest-cloud-worker/0:
ssh -t wendigo.canonical.com sudo -H -u prod-ues-proposed-migration juju ssh autopkgtest-cloud-worker/0
Consider defining a shell alias for this for convenience. You can see which workers are running withsystemctl list-units autopkgtest\*|sort
Rolling out new worker code/config:
Adjust the worker-config-production/*.conf configuration files, commit them.
Run git pull in the autopkgtest-cloud/ checkout on autopkgtest-cloud-worker/0
Run pkill -e -HUP worker. This will signal the workers to finish their currently running test and then cleanly exit; the autopkgtest@.service units will then restart after five minutes.
Stopping all workers: For general cloud/worker administration or other debugging you might want to stop all workers. Run pkill -ef worker/worker; this signals the workers to finish their currently running test and then cleanly exit; contrary to SIGHUP the workers will then not auto-restart. If you want/need to stop all workers immediately and thus kill running tests (they will be auto-retried once workers come back online), run pkill -ef runner/autopkgtest; pkill -9 -ef worker/worker instead.
External LXC workers: The lp:auto-package-testing branch has some scripts in the slave-admin dir which help with maintaining the external servers which run LXC autopkgtests. On these there are a system units autopkgtest-lxc-worker@N.service which run the LXC workers. You can see their status and which test they are currently running with:
./cmd s390x systemctl status "'autopkgtest*'"
./cmd is just a thin wrapper around parallel-ssh, which is a convenient way to mass-admin these boxes.
Rolling out new LXD hosts: armhf tests are currently run on arm64 guests in ScalingStack's bos02 region. The tests are dispatched from a second instance of autopkgtest-cloud-worker, called autopkgtest-lxd-worker. The instances themselves are managed directly from prod-ues-proposed-migration@wendigo. To deploy a new instance, run:
. ~/.scalingstack/bos02.rc
NET_ID=$(nova network-show net_ues_proposed_migration | awk '/[[:space:]]id[[:space:]]/ { print $4 }')
IMAGE=$(nova image-list | grep auto-sync/ubuntu-$(distro-info --lts)-$(distro-info --lts -r | cut -f1 -d' ')-arm64 | tail -n1 | awk '{ print $4 }')
nova boot --poll --image $IMAGE --flavor m1.large --nic net-id=$NET_ID --key_name wendigo --security-groups default,lxd --user-data autopkgtest-cloud/tools/armhf-lxd-slave.userdata -- lxd-armhfNEXT_FREE_NUMBER
Once this is booted, you can SSH in (ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -l ubuntu IP) and watch /var/log/cloud-init-output.log. The userdata script will build lxd images and reboot the machine once setup is done. Once this is finished, edit credentials/lxd-remotes.conf and add armhf IP 3 (3 is the number of parallel tasks). Finally commit this to the worker by running juju set autopkgtest-lxd-worker lxd-remotes="$(cat credentials/lxd-remotes.conf)". Log into the autopkgtest-lxd-worker and check that the LXD remotes are configured correctly (lxc remote list). If they are not, try systemctl restart autopkgtest-lxd-socat@lxd-armhf-IP.service. You can then follow the journal journalctl -fe -u autopkgtest@lxd-armhf-IP* to make sure that jobs are being processed correctly.
Creating new LXD images before official ones are available: on each LXD remote host machine, run RELEASE=<new release> autopkgtest/tools/autopkgtest-build-lxd images:ubuntu/<old release>/armhf
Journal log analysis: Logs starting 2019-02-22 can be analyzed using journal fields ADT_PACKAGE, ADT_ARCH, ADT_RELEASE, and ADT_PARAMS, though the latter might be useless. For example, journalctl ADT_PACKAGE=autopkgtest shows all worker logs for tests of autopkgtest.
Opening up a new series
Run seed-new-release <old_release> <new_release> autopkgtest.db on wendigo.
Update all worker configs in lp:autopkgtest-cloud to include the new series in their releases = config entry
- Make sure a new distro-info with the new series is available (if not, temporarily hack in the new series into the distro-info-data ubuntu.csv on all worker and web nodes).
- Build new lxd images on each lxd-armhf node
RELEASE=cosmic autopkgtest/tools/autopkgtest-build-lxd images:ubuntu/bionic/armhf
- Build cloud images
MIRROR=http://ftpmaster.internal RELEASE=cosmic autopkgtest-cloud/tools/build-adt-image-all-clouds autopkgtest/setup-commands/setup-testbed --bootstrap
- Make sure britney config is in place on snakefruit (in ~ubuntu-archive/proposed-migration/code/b2/britney.conf.ubuntu.cosmic).
Bug reporting
Please report any bugs against https://bugs.launchpad.net/auto-package-testing/.
ProposedMigration/AutopkgtestInfrastructure (last edited 2021-04-27 11:45:11 by laney)