Created: 2006-06-22 by SimonLaw
Packages affected: none
This specification describes a distributed cluster that supports automated integration testing. This cluster would facilitate the implementation of automated test plans that replace manual tests. It is distributed so that machines worldwide can be contributed to the infrastructure.
Inside Ubuntu, there are plenty of automated test systems. From the unit tests within make check to piuparts, these tests all check for poor implementations. However, all of these systems only test the package itself. There is no facility for higher-level testing.
Since integration testing cannot be automated, it must be done by hand. As Ubuntu gets bigger, with more packages and more supported platforms, this becomes more and more infeasible.
This is meant to be read as a requirements specification. There is little chance that all of this can be implemented in four months. Rather, this will be an ongoing project, broken into phases.
- Andrew tests whether Apache works by installing the package, setting up a dynamic website, and using it through his web browser. He wants to write an automated test script that does this all for him.
- Biella wants to test file sharing between her Ubuntu server and her Windows workstation. She also wants to write an automated test script that does this.
- Charles wants to test that some new machines work well with Ubuntu. He wants to run the entire test suite on this new hardware.
- Dora is a system administrator at a university which makes an Ubuntu-derived distribution. She wants to run the entire test suite against a test lab of computers, whenever she rolls out a new version of her distro. Some of these test results may be private, so it must be possible to run an isolated instance of the system.
- Eric is an Ubuntu developer that has made a large library transition. He wants to confirm that he hasn't introduced any regressions into the distro before uploading his new packages.
- Fiona is a system administrator that wants to help Ubuntu by donating hardware resources. These should be added to the resource pool for the automated test system.
- Greg is a release manager for an Ubuntu-derived distribution. He looks at a nightly report generated by this system to determine that no regressions have been introduced.
- Hania is a porter who wants to port Ubuntu from one architecture to the next. She wants to get a nightly report of which features still need work for this new architecture.
- Ivan sees fixes a bug and wants to prevent it from regressing. He writes a test script and uploads it into the testing infrastructure, as part of uploading his fix.
- Joanna notices that bugs keep on cropping up in a certain area that she's interested in. She'll write a suite of test scripts that exercise this functionality, some of which may fail, and uploads them into the infrastructure.
- Kyle wants to look at a summary of automated testing results in Launchpad. He would be able to look at why a certain test has passed or failed.
- Linda is working in Malone on a bug when she realizes that it is related to a bug discovered by an automated test. It should be trivial to link the Malone report to the test results.
This specification covers the design and implementation of the automated testing infrastructure. It does not specify any specific behaviour for Launchpad.
The system is based around the idea of resource pools. These are groups of networked machines, which may or may not be computers running Ubuntu, that can be reserved and used. These resource pools are controlled by a scheduler, which ensures that tests are run, allocates resources to these tests, and ensures that no test hogs any resource. The scheduler decides on which tests to run based on the work queue, and tests report their results back to a scoreboard.
This design assumes that more than one system will eventually exist. Some of these may be run by private organizations, but many of them should be run as community resources.
A resource pool could be populated by:
- computers, of varying architectures, which have Ubuntu installed,
- computers which have other operating systems installed (Red Hat, SuSE, Debian, Windows, Mac OS X),
- virtual machines instead of real computers,
- appliances, like DSL routers,
- any other device that can be controlled remotely.
Each resource in the pool will be connected to a private network that will not be connected to the Internet. This is to protect the Internet from poorly written tests.
Each resource in the pool will be connected to a VLAN capable switch, so that various resources can be connected to their own private test networks.
A resource pool will provide a simulated Internet connection, that provides minimal services like DNS and hosts that resolve. It may provide a cache of the Ubuntu archive. This allows machines inside the resource pool to connect to an Internet-style network without being exposed to the Internet itself. This is not required for the initial implementation.
Each resource in the pool will be plugged into a networked power bar. This would allow it to be rebooted without manual intervention. This is not required for the initial implementation.
Each resource in the pool should have some mechanism to reset it to a known-good configuration. For instance, on modern servers, it should be possible to netboot into a bootloader that can decide if it should boot directly off the hard disk. If it shouldn't, it would boot off the network to an image that would wipe the existing hard disks and load a clean image.
The scheduler is responsible for continually running tests, which is picks up off the work queue. To support a test, it will:
- Allocate resources out of its resource pool,
- Configure these resources so the test can run which may involve cleaning up a resource or configuring a VLAN,
- Command the resources to run a test,
- Monitor the test and terminate it if it runs for too long,
- Collect the results of the test and feeds them to the scoreboard,
- Clean up the resources so they are ready for the next test run.
The scheduler will not execute any test plans itself, but delegates this task to the resources desired by the test. This reduces its impact as the bottleneck in the testing framework.
Each distributed instance should be able to run its own scheduler, which does not depend on the existance of any other scheduler.
Schedulers must be able to pick up tests off of an arbitrary work queue.
Schedulers must be able to send test results to an arbitrary scoreboard.
In order to communicate with test runs, there should be a wire-protocol that describes passing and failing test cases within an automated test plan. As well, this wire-protocol should allow the test to send log files back to the scheduler. These log files must be sent back to the scheduler even if the automated test script dies unrecoverably.
Schedulers must be able to kill a test run that is hogging resources. For example, a test that takes too long will have all its resources turned off and restarted.
Schedulers request tests from the work queue if there are free resources available. To prevent polling, it should be possible to request notifications of new tests from the work queue.
When a test completes, a scheduler must inform the work queue that the test plan has been run.
An authorized administrator must be able to describe which resources the scheduler has control over.
The work queue is an ordered set of tests. Each test needs to express the following:
- Declare the properties of the resources it needs. For instance, a test may express that it needs a Windows 2000 Server and an Ubuntu machine.
- Declare the way the resources are connected. For instance, the Windows and Ubuntu machine should be connected with a VLAN.
- Declare any files to send to each resource. For instance, an automated test script should be uploaded to the Ubuntu machine.
- Declare which commands are run on each resource. For instance, the automated test script should be executed.
A scheduler must be able to request a new test plan to run. The work queue is used to prevent starvation of test plans by feeding schedulers with unique test plans. To request a test plan, the scheduler informs the work queue of which free resources it has, and the work queue provides the scheduler with a plan that could fit in its queue.
When a test plan has been completed, it gets added back to the end of the work queue. If a test plan has run for too long without completing, it also gets added back to the end of the work queue. This guarantees that all test plans will be run, even if a scheduler has crashed.
An authorized user must be able to add new test plans to the queue. This plan can either be:
- Continuously running, so it will be scheduled at regular priority,
- Once-off, so it will be scheduled as soon as possible.
When a new test plan has been added to the queue, all connected schedulers are informed so that they may request a new plan, if they have free resources.
An authorized administrator must be able to manage the queue. Some of these operations include, but are not limited to, disabling tests, removing tests, and user management.
The scoreboard records the results of each run. It needs to track the following:
- Timestamp for each run,
- Identifier for the test plan that was run,
- Work queue that supplied this test plan,
- Scheduler that allocated the resources for the test,
- Final result of each run,
- Links to log files extracted from each run.
The scoreboard must provide an interface to query it for test run information.
The scoreboard should present a web interface for constructing queries and displaying results that would be useful to developers, testers, release managers, and infrastructure administrators.
The scoreboard should integrate some subset of this information to Launchpad. What subset is currently unspecified as it requires further discussion.
An authorized scheduler must be able to add test results to the scoreboard.
The resource pool is on a private network that is only connected to the outside world through the scheduler.
The optional simulated Internet is used for tests that require Internet access. However, for security reasons, we do not allow test scripts to affect anything outside their sandboxed environments. This feature is not required for the first implementation.
The simulated Internet must present at least one public IP address which can be pinged. It must also provide forward and reverse DNS for that address.
The simulated Internet should present at least one public IP address which offers:
- a HTTP and HTTPS server,
- an SMTP server,
- a POP3 and IMAP server,
- an NTP server,
- an FTP server,
- any other common Internet services.
Every machine on the network must have some way of restoring it to a clean state. For machines, there should be some form of network boot capability.
The scheduler should be a networked server that knows how to kick the power on any resource that it controls. We can use networked power bars for that.
The scheduler needs to have a network connection to the resource pool, and to some public network so that it can be controlled by authorized administrators. This must be done using a human-readable protocol. It may have a web interface.
The scheduler needs to provide TFTP so that machines in the resource pool can boot off the network.
The work queue should be a network server that responds to scheduler requests.
The work queue must be on a network where so that it can be controlled by authorized users and administrators. This may be done using a web interface.
The scheduler must be able to contact the work queue to issues requests. This must be done using a human-readable protocol.
The work queue must be implemented as an ordered list of references to test plans.
Test plans should be expressed in a declarative manner, so that no user-submitted code runs on the scheduler.
Test scripts may be written in any appropriate language, although we will strongly prefer the use of Python.
Test plans should be stored in a version control system, which must be bzr. This is not necessary for the first implementation.
The work queue may be on the same machine as the scheduler. This is recommended for the first implementation.
The scoreboard must keep test results in a database which can be mined for information.
The scoreboard must have be on a public network where users can see the results of test runs. This may be done using a web interface. It is only required to have simple queries in the first implementation.
The scoreboard must store log files captured from test runs. These may be stored on the database or as files in the Librarian.
The scoreboard may be on the same machine as the scheduler or work queue. This is recommended for the first implementation.
I dont follow the scheduler needing tftp - if we have distributed machines, the scheduler is likely not tftp reachable for all of them; our 'must be able to reboot to reinstall clean' requirement can be solved by simply requiring the subnet the machine is on to offer an appropriate service, and having a install source network available on a known ip address. - RobertCollins
There is one scheduler per resource pool, because the scheduler has to control them. Once a resource has booted off the network, it can contact external install sources. - SimonLaw
The data representation for work queues has not yet been defined.
The database schema
The UI for the scheduler, work queue, and scoreboard are yet to be designed.
We need to consider how much space test results will require, based on incoming data flow. We should decide if we want to store results indefinitely, or to implement some kind of expiry algorithm.
We need to be aware of data privacy issues for the logs that are captured by the scoreboard. We also need to comply with various local laws that deal with privacy and data retention.
* http://www.robertcollins.net/unittest/subunit/ — a proposed wire-protocol for test to return results to the scheduler.