AutomatedTesting

Revision 25 as of 2005-11-01 20:10:31

Status

see https://launchpad.net/distros/ubuntu/+spec/automated-testing

Introduction

Discuss ways to automatically check certain package properties which we regard as essential.

Rationale

Currently it is possible to upload packages which do not work at all, can disrupt the packaging system, are uninstallable, or have a broken build system. We want to introduce a set of universally applicable tests that reject such packages before they irritate anyone. To the extent possible, a package should also be able to check its own functionality to make regressions immediately visible.

Scope and Use Cases

Check validity of binary packaging (all packages)
Check (re)buildability of source packages (all packages)
Check for library and linking errors (many packages)
Check for functionality regressions (where applicable, i. e. for non-interactive programs)

Design

We will create the following new machinery:

1 Tester core. This:
- Interprets test metadata
- Knows how to enumerate tests, determine which test(s) are possible under the current circumstances
- Knows how to invoke tests and collect results
- Knows how to request test virtualisation services from the virtualisation regime
- Provides a convenient interface for both manual use and building into automation (eg launchpad)
2 Generic tests
- Provides a test of tests (including metadata) which are supposed to be applicable to any package. These tests will typically involve building, installing, removing, etc. the package
- Provides appropriate metadata about these tests
3 Virtualisation regime
- Encapsulate the invocation of tests
- Insulates the host that is performing the tests from the effects of the tests (insofar as this regime is able to)
- Provides a standard interface to the driver core
- There may be (eventually, will be) several virtualisation regimes; initially we will provide one that is simple to implement
4 Package-specific tests
- Test scripts and test metadata are found in a specified location in the unpacked source package. These tests test the installed version of the package.

Initially, all but the package-specific tests will be in a single test package.

There are two main new interfaces here:

Test/metadata interface: a standard way describing the tests available and their important properties. Provided by packages and the generic tests and used by the tester core.
Virtualisation interface: provided by virtualisation regimes and used by the tester core. tests

Implementation Plan

Data Preservation and Migration

None of the test should alter runtime behavior or touch actual data files.

Virtualisation

To test a package it must be installed (and often, removed again) for testing. Its installation or operation might alter the system or data in other ways.

For other than the most ad-hoc testing by a knowledgeable expert, there has to be a testbed for this purpose. Ideally, the testbed would be virtualised.

There are a fair variety of virtualisation systems which differ in maturity, intrusiveness into the hosting system/hardware, features, etc. We are interested in the following features:

Ability to set a checkpoint or make a snapshot, so that changes to the testbed filesystem can be undone fairly efficiently. (Must have, for virtualisation to be at all useful.)
Defence of the host system from the virtual environment (ie, security). (Must have for automated testing of possibly-untrusted packages, but optional in many cases for developers' use on their own systems.)
Ability to efficiently determine what changes were made to the testbed filesystem (as filenames and contents, not disk block changes).
Obviously, ability to run commands in the testbed (perhaps as root) and get the output and exit status, and copy data back and forth.

Approaches or part-approaches that seem plausible include:

chroot. Has the virtue of being well-known and simple to implement - so we will do this first, with either unionfs or lvm snapshots.
Xen. Looks very promising, and we hope to get it running soon in Breezy. That would require to run Breezy on the test server, but since we don't need public access on this server, we could maybe live with this for a limited time.
UML
Union-fs
CPU emulators (Qemu, Bochs, PearPC, Faumachine?)
LVM snapshots

There is a lot of activity in many of these projects, so their capabilities are changing. And, different approaches make senses in different contexts (local testing, launchpad autotest, etc.). So we introduce an abstraction interface which we'll provide at least one low-impact sample implementation of.

Check validity of binary packaging

Test installability:

Start with a sandbox with only required packages.
Install all dependencies.
Create a list of all files in the sandbox.
Install the package.
Run functional self tests of the package (see below).
Reinstall the package to check that a (degraded) upgrading works.
Remove the package.
Remove all dependencies of the package.
Purge the package. If this fails, then the purging code depends on non-required packages, which is invalid.
Create a list of all files in the sandbox and report any differences against the first list.

Test conflicts:

Create a mapping installed file -> package from package contents lists.
Create the union of all installed files.
Remove all entries from that set whose file only appears once.
Remove all pairs where the associated packages declare a conflict to each other.
Ideally the remaining set should be empty, report all package names that are left.

(Note that apparently some Debian folks already do this, so there might be some scripts around).

Test debconf:

Install the packages using the non-interactive frontend.
Intercept mails sent by non-interactive to collect the questions the package would ask.
Ideally there should be no questions.

Test package contents:

Compare package contents list with latest version in the archive; notify the uploader if the number of files changed considerably (we had such errors in the past).

Check validity of source package

Buildability is already tested on the buildd's. However, many packages have broken clean rules which leave the package in an unbuildable state. We should fix all packages where this is the case.

Unpack the source package.
dpkg-buildpackage
Rename the resulting diff.gz to diff.gz-first
dpkg-buildpackage; if this fails, the packaging is broken
Compare the new diff.gz to diff.gz-first; if there is any difference, report this as a potentially broken package; however, many packages update config.{guess,sub}, so these should be excluded from the test

Package self tests

Build time self tests:

Many packages already come with test suites which are run at build time.
Add debian/rules check which runs those self tests and exits with 0 if all of them are successful.
check should not be a dependency of binary since there are test suites which take a fair amount of time. We rather modify dpkg-buildpackage to check for the existence of the check target and call it if it exists. dpkg-buildpackage should get a new option --no-check to disable the invocation of the test suite.
If check fails, this should generally make the build fail to prevent publishing regressions to the archive. There are some exceptions like gcc where many tests are expected to fail and it is unreasonable to modify the package to disregard them; in these cases check should exit with a zero exit status if appropriate.
Idea: Export the results of regression tests in a tarball and publish it somewhere so package maintainers do not need to rebuild the package to evaluate the reason for failures.

Run time self tests:

Call ldd on all ELF binaries and libraries and check for unresolved libraries.
dlopen() all ELF libraries and report failures.
Change packages to install runtime self tests into /usr/lib/selftests/packagename/; run all binaries in this directory and ensure that all of them exit with 0.

We really want to be able to run a package's tests on the installed version of the package. This requires a standardised way for a package to provide an interface to running the tests and finding the results. Also, this interface should be more than just "invoke some script called check". In particular, tests need have interesting properties like:

modifies global data
needs package X Y Z (>=4) installed
needs to run as root
needs an X display (and uses some gui replay tool?)

which need to be per-test or at least per batch of tests. This needs to be extensible so that new tests in old environments can be not run with `test environment does not support "blames-canada" property of test "simpsons"'.

Outstanding Issues

Is this a good overall design ? We've tried to keep the amount of new code to a minimum - and just glue, basically - while still encouraging development into a sophisticated automated testing framework.
Are the details of the test/metadata interface right ?
Are the details of the virtualisation regime interface right ?
What will we choose for the initial virtualisation regime to implement ?
Which generic tests will we implement ?

UDU BOF Agenda

Pre-Work

Tasks for the future

Discuss the issues at http://wiki.ubuntu.com/AutomatedTesting

UBZ BoF notes

Install/uninstall (http://packages.debian.org/unstable/devel/piuparts):
- check for cruft left over on the filesystem after install/uninstall
- check for double-installs
- perhaps a package black-list (to avoid testing stuff which won't live nicely in the virtualization environment -- perhaps networking stuff)
One alternative to testing is to change the package to actually test right after building:
- run a "make check" after building
- enforce its usage: fail builds if they don't pass
- test depends would need to be listed as build-depends
- this tests the package as-built, but not as-installed
Testing as-installed would require standardizing a testing entry point, per-package
- Information on test-dependencies would be included (so we could install them under virtualization)
- A basic test could be simply running the binary and checking the result status (or other variants of this)
- Every package would need to be changed to include a test
Sharing test infrastructure with Debian and other distros would allow us not to bear the burden of writing all the tests, forever.
chroots are what we use in buildds -- we could use chroots as poor-man's virtualization.
We should design the solution to allow moving to proper a virtualization environment.
Upgrading
Test functionality (where existent)
Limit scope: this is testing per-package, not integration testing. Other specifications may cover integration testing.
Example interface would be:
- debian/test/info: RFC822 file with stanzas
  - Tests: fred bill bongo Restrictions: needs-root breaks-computer
  meaning execute debian/test/fred, debian/test/bill, etc., expect exit status 0 and no stderr
- Additional possibilities:
  - Depends: ... Tests: filenamepattern*
  etc. to make writing test/info easier but makes test-harness script harder.

CategoryUdu CategorySpec