Dev Week -- Effectively testing for regressions -- sbeattie -- Thu Sep 3rd, 2009
21:04 - 22:00 UTC (Reformated from original IRC chat.)
Hi, I'm Steve Beattie, on the Ubuntu QA team, here to talk about the regression testing we do.
We essentially do regression testing in 3 different situations, when doing a security update, verifying a post-release regular update to a package, and during the milestones for development releases
We have a few different tools we use for testing within Ubuntu.
There's checkbox. You may know this as the "System Testing" menu item under System -> Administration menu. In addition to helping to do hardware testing, it's meant to be sort of a meta-testframework, in that it can encapsualte other frameworks.
There's Mago, which Ara Pulido talked about earlier today. It's meant to be an automated desktop testing framwork, and is a joint initiative that we're pushing with Gnome.
And finally, there's the qa-regression-testing tree. It's located at launchpad aka lp:qa-regression-testing (warning, the tree is over 500MB!) It initially started out as a project by the Ubuntu Security team, to help them test out their security updates. But the QA team has also adopted it for some of our testing as well.
The qa-regression-testing tree is what I'm going to talk about.
As I said, the bzr tree itself is about 500MB, but I've made a very small subset (80kB) available. With this, we try to cover functional tests, exercising program(s) in the package we're interested in, to ensure they function proporly, or verifying that default configs are sensible, and that we haven't lost critical ones over time.
Sometimes these tests are destructive; we attempt to make them not be, but there's no guarantees. So it's best to run them in a non-essential environment, either a virtual machine or a chroot.
If we look over the tree. There's a few different toplevel directories
- build_testing/ covers notes and scripts related to invoking (typically) build tests from the upstream package itself
- results/ are saved results from running such upstream tests, to use as a comparison baseline.
- notes_testing/ is a collection of notes about testing various packages.
- install/ is a post OS install sanity check script, along with saved results
- scripts/ contains the actual set of testcases, organized by package, along with helper libraries and test programs
and data/ which is saved data that can also be used in one of the scripts/ testcases
- scripts/ is where we'll focus our attention.
We'll start with a trivial example.
As I said, the scripts are organized by packages; each package that we've worked on so far will have a script name test-PACKAGE.py
If we look in scripts/ we'll see there's no test-coreutils.py script; that seems like an oversight, so we'll add a very simple one. Again if you pull down the tree subset, there's a subset of the bzr tree, along with toplevel directories named 1, 2, and 3. In directory 1/ there's a test-coreutils.py. You can also see it at http://pastebin.com/f5d7510be
So our scripts our all extensions of python-unit (so you'll want that installed). Yes, we're using a unit test framework, despite doing a bunch of functional tests; essentially we're using python as a smart scripting language. (See documentation on python-unit.)
Our first test that I've written will test if /bin/true actually runs and returns 0 as expected. So some important points as we look at it.
class CoreutilsTest is a subclass of testlib.TestlibCase testlib is a module we've added which both extends unittest.TestCase and provides additional utility functions that make it easier to do common tasks. The testcase itself is the test_true() method CoreutilsTest class.
python-unit's unitest will run all methods on our class that begin with the name "test". testlib.cmd(['/bin/true']) is where /bin/true gets executed, testlib.cmd is an improved version of the various system(),popens() providied by python.
We then throw an assert if the result from running /bin/true does not equal what we expect. Asserts are the way one causes a testcase to fail in python-unit, other types of exceptions will cause py-unit to consider the test as an error. py-unit provides a wide variety of assert test functions.
So, to run the test, we cd into 1 and do ./test-coreutils.py. The output from running should look like http://paste.ubuntu.com/264590/ Note that the output string (in verbose mode, which our script turned on) is the docstring from the test_true() method.
So what does a failed test look like?
To see, we'll change our expected result to be 1 instead of 0. This is the version in the 2/ directory, also visible at http://pastebin.com/f9c5be05. Again, we run ./test-coreutils.py, and should see output like http://paste.ubuntu.com/264591/
Okay, we did /bin/true, let's add another testcase, one for /bin/false.
That's what our example in 3/test-coreutils.py does, we've added a second method, test_false() (also at http://pastebin.com/m5e8cc2d0) and if we ran it, we should see output like http://paste.ubuntu.com/264593/
Looking at our results, we notice it ran the false test first; pyunit runs the test methods in alphabetic order. Order generally shouldn't matter, but sometimes test authors will prefix testcase methods with number to sort them in a more logical ordering from a human's perspective.
So that's a very simple example, but real tests are likely to be much more complicated. We might need to do some configuration setup, create datafiles, etc. before running our tests.
More complicated tests
Both unittest and our testlib provide help for writing more complex tests. As a simple example, let's look at test-apt.py in 33-49, we have two methods, setUp() and tearDown(). (lines 33.49, that is)
These will get automatically invoked the python-unit before and after each testcase (i.e. each test*() method). These will get automatically invoked *by* python-unit before and after each testcase (i.e. each test*() method). These functions give us a point where we can change our environment to match what we want to test, or to setup a non-default config in an alternate location, so we aren't destructive to the default system settings.
There are other ways of modifying configs in (hopefully) safe ways. testlib provides these:
config_replace() lets you replace or append the contents of a config file
config_comment(), config_set(), config_patch() modify configs in certain ways
- and then config_restore() restores whatever configs were modified to their original saved state.
An example where this is used is in the test-dash.py script In section main, lines 69 onward. Basically, 3 different shell config files are modified and then restored.
Another thing to notice in this example is line 77, it contains test_user = testlib.TestUser() The testlib.TestUser class creates a new (randomly-named) user on the system. Obviously, requires script to be run as root (as does modifying global configs). The destructor for the TestUser class does the cleanup work of removing the user. This lets you add a user to test out various privilege changes. As well as not mess with the state of the user that you're trying to run the tests from.
Config munging and system state changing can be quite complex. The test-openldap.py script is a nice complex example. I won't go through it, but at a highlevel, there's a variety of Server* classes that extend the ServerCommon class. The setup for each of these classes creates an openldap config to test a specific aspect or feature of openldap: different backends, different auth methods (SASL), different types of connections (TLS)
Sometimes, tests are dependent on specific versions of Ubuntu. testlib provides a way to do tests conditionally based on different version by testing the value of "self.lsb_release['Release']" e.g. "self.lsb_release['Release'] < 8.10" will only be true for Hardy Heron (8.04) or older
This is used quite extensively in the test-kernel-security.py script (e.g. lines 511-533) This is done because various kernel config names have changed over time, didn't exist in older releases, or different features weren't enabled. And sometimes, sadly, because there was a bug in an older release, that we're not likely to fix, for whatever reason. When a config or something else in a test changes it's identity conditionally based on version, it's useful to change the reported (verbose) docstring via self.announce()
Tests aren't limited to python code, sometimes we need to do things in other languages to exercise something specific. For example, triggering some kernel issues may require writing a C program. scripts/SOURCEPACKAGE can contain a tree of helper programs if needed.
Also, we'd annotate the existence of this directory via adding "# QRT-Depends: PACKAGE" as meta-info We do this, because as I mentioned the full bzr tree is very large, and it's a pain for us to copy around the full tree when we're typically only interested in testing one package (when doing an update). scripts/make-test-tarball will collect up just the relevant bits into a tarball, making a much smaller blobl to copy around. e.g. ./make-test-tarball test-kernel-security.py
Also, other helper testlibs are available, all named testlib_*.py in the scripts/ directory. Anyway, that's a brief overview of what have available in that tree.
So how can you help and what work do we want to do going forward? More testcases! More testscripts for packages we don't have tests for! Extending our coverage would be great. Tests do need to be somewhat scriptable, mechanisable. Tests of GUI apps are probably better off being directed at the Mago project.
Be careful to ensure you're testing what you think you're testing. It's not a lot of fun debugging a test failure that turns out to be a bug in the test itself.
We also need to do the work of encapsulating/integration with checkbox.
Feel free to ask questions in #ubuntu-testing (where the QA team hangs out) or in #ubuntu-hardened (where the sceurity team hides itself)
That's all I've got, thanks!