AutopkgtestInfrastructure

Revision 10 as of 2015-08-12 15:55:17

Clear message

This describes the machinery we use to run autopkgtests for gating uploaded packages into the development series.

Architecture Overview

autopkgtest-cloud-architecture.svg

(Dia source)

Swift result store and layout

The swift object store is being used as the central API for storing and querying results. This ensures that logs are kept safe in a redundant non-SPOF storage, and we do not keep primary data in any cloud instance. Thus we can completely re-deploy the whole system (or any part of it can fatally fail) without losing test results and logs. Swift also provides a flexible API for querying particular results so that consumers (like web interfaces for result browsing, report builders, or proposed-migration) can easily find results based on releases, architectures, package names, and/or time stamps. For this purpose the containers are all publicly readable and browsable, so that no credentials are needed.

Logs and artifacts are stored in one container adt-release for every release, as we want to keep the logs throughout the lifetime of a release and thus it's easy to remove them after EOLing. In order to allow efficient querying and polling for new results, the logs are stored in this (pseudo-)directory structure:

  • /release/architecture/prefix/sourcepkg/YYYYMMDD_HHMMSS@/autopkgtest_output_files

"prefix" is the first letter (or first four letters if it starts with "lib") of the source package name, as usual for Debian-style archives. Example: /trusty/amd64/libp/libpng/20140321_130412@/log.gz

The '@' character is a convenient separator for using with a container query's delimiter=@ option: With that you can list all the test runs without getting the individual files for each run.

The result files are by and large the contents of autopkgtest's --output-directory plus an extra file exitcode with adt-run's exit code; these files are grouped and tar'ed/compressed:

  • result.tar contains the minimum files/information which clients like proposed-migration or debci need to enumerate test runs and see their package names/versions/outcome: exitcode, testpkg-version, duration, and testbed-packages. All of these are very small (typically ~ 10 kB), thus it's fine to download and cache them all in e. g. the debci frontend for fast access.

  • log.gz is the compressed log from autopkgtest. Clients don't need to download and parse this, but it's the main thing developers look at, so it should be directly linkable/accessible. These have a proper MIME type and MIME encoding so that they can be viewed inline in a browser.

  • artifacts.tar.gz contains testname-{stdout,stderr,packages} and any test specific additional artifacts. Like the log, these are not necessary for machine clients making decisions, but should be linked from the web UI and be available to developers.

Due to Swift's "eventual consistency" property, we can't rely on a group of files (like exit-code and testpkg-version) to be visible at exactly the same time for a particular client, so we must store them in result.tar to achieve atomicity instead of storing them individually.

AMQP queues

RabbitMQ server

AMQP (we use the RabbitMQ server implementation) provides a very robust and simple to use job distribution system, i. e. to coordinate running test requests amongst an arbitrary number of workers. We use explicit ACKs, and ACK only after a test request has been fully processed and its logs stored in swift. Should a worker or a test run fail anywhere in between and the request does not get ACK'ed, it will just be handed to the next worker. This ensures that we never lose test requests in the event of worker failures.

RabbitMQ provides failover with mirrored queues to avoid a single point of failure. This is not currently being used, as RabbitMQ is very robust and runs in its own cloud instance (Juju service rabbitmq-server).

Queue structure

We want to use a reasonably fine-grained queue structure so that we can support workers that serve only certain releases, architectures, virtualization servers, real hardware, etc. For example: debci-wily-amd64 or debci-trusty-armhf. As test requests are not long-lived objects, we remain flexible here and can introduce further granularity as needed; e. g. we might want a trusty-amd64-laptop-nvidia (i. e. running on bare metal without virtualization) queue in the future.

A particular test request (i. e. a queue message) currently just consists of the source package name. Additional fields, such as "PPA name" or perhaps version constraints may be added in the future.

Juju service

This uses the standard charm store RabbitMQ charm with some customizations:

  • Remove almighty "guest" user
  • Create user for test requests with random password and limited capabilities (nothing else than creating new messages); these are the credentials for clients like proposed-migration

As usual with the charm, worker services create a relation to the RabbitMQ service, which creates individual credentials for them.

The rabbitmq-server Juju service is exposed on a "public" IP (162.213.33.228), but accessible only within the Canonical VPN and firewalled to only be accessible from snakefruit.canonical.com (the proposed-migration host running britney) and any external workers.

Workers

Integration with proposed-migration (britney)

debci results browser

Deployment

Administration

  • Reqesting manual runs is done with britney's run-autopkgtest script on snakefruit. Due to firewalling this currently can only be run on snakefruit, so define this shell alias:

     alias run-autopkgtest='ssh snakefruit.canonical.com sudo -i -u ubuntu-archive run-autopkgtest'

    Then you can run run-autopkgtest --help to see the usage. E. g.

     # specific architecture
     run-autopkgtest -s wily -a armhf libpng udisks2
     # all configured britney architectures (current default: i386, amd64)
     run-autopkgtest -s wily libpng udisks2
  • Show queue lengths, until that gets shown in debci:

    ssh -t wendigo.canonical.com sudo -H -u prod-ues-proposed-migration \
        juju ssh rabbitmq-server/0 sudo rabbitmqctl list_queues
  • Show currently running tests:
    ssh wendigo.canonical.com sudo -H -u prod-ues-proposed-migration \
        juju ssh autopkgtest-cloud-worker/0 "'ps ux|grep runner/adt-run'"
  • Show all "temporary testbed failure" results, which in most cases are infrastructure bugs; there should be zero. (Brandon is working on showing them in debci.)
    ssh wendigo.canonical.com sudo -H -u prod-ues-proposed-migration \
        juju ssh debci-web-swift/0 grep -l tmpfail 'debci/data/packages/*/*/*/*/latest.json'