AutomatedServerTestingSpec

Summary

With our next LTS coming up, this is a great time to focus on stability and QA. As such, we'll work on setting up automated testing of as much as of Ubuntu server as possible. This includes (but is not necessarily limited to) daily runs of the security team's regression test suite, enabling upstreams' test suites at build time, performance regression tests, and automating ISO testing.

Release Note

A considerable amount of time has been spent on setting up automating testing of the server product. It is our hope that this will provide a more solid release of Ubuntu Server than ever before.

Rationale

Regression testing can be a repetitive job. Thankfully, a lot of things can be done automatically. Many packages have test suites that we're not running (for one reason or another), we have qa-regression-tests, and lots of other means for testing things without minimal day to day effort.

User stories

  1. Soren uploads a new version of dovecot, but is worried he might break compatibility with some IMAP client. However, he sleeps well that evening, knowing that the automatic regression testing suite will raise an alert over night if something broke.
  2. Jamie notices a bug in libvirt and works on a patch. While test-building the package locally, the test suite informs him that he broke a little-used feature in an obscure corner of the lxc driver in libvirt. He promptly fixes it, the test suite is happy again, and Jamie can proceed to upload the package to Ubuntu.
  3. Matthias uploads an update to eglibc which triggers a bug in php5. The next morning, the server team receives a report from the automated test system telling them there's a regression in php5. Looking at versions of the packages in the dependency chain from the last succesful test run and this new one, they quickly pinpoint the culprit and start working on a fix.

Assumptions

Design

The goal is to detect as many problems as early as possible.

  • Many packages ship test suites.
    • Some of these can be run at build time. We should make them do so.
      • We should not only run these at the time of upload, but daily as well (to catch bugs introduced by something in the dependency chain).
    • Other packages ship test suites that can't easily be run at build time. We should arrange for them to be run daily "somewhere" and somehow get alerted about failures (regressions).
  • The security team and QA team have a series of tests they use to ensure they don't introduce regressions in stable releases.
    • We should use these during development as well. This should be done daily and we should get a report back about failures.
  • We want to be alerted about performance regressions as well.
  • Automate ISO testing as much as possible.
    • MathiasGug already has a setup automating much of the ISO testing by using preseeding followed by a script, that logs in over ssh to do the last bits of the test cases). This should ideally be fully automated.

    • KVM-autotest is a framework for testing kvm. However, assuming kvm is functional, it's perfect for emulating interactivity, thus allowing us to do end-to-end ISO testing like a normal user typing and clicking.

Implementation

qa-regression-testing scripts

We will integrate the security team's qa-regression-test collection into checkbox, and have it run on a daily basis. Feedback will be collected by the QA team and turned into bugs for the server team to deal with.

Performance testing

The Phoronix test suite seems to be reasonably comprehensive. We will run it on a daily basis and keep an eye on performance regressions. Of course this needs to run on the same hardware every time.

Upstream test suites

A number of server packages are known to provide test suites:

  • Postgresql test suite (already runs during the build)
  • Puppet has a separate package, puppet-testsuite, which provides the test suite.
  • php5 test suite (already runs during the build)
  • Apache2
  • libvirt
  • MySQL test suite (already runs during the build)
  • OpenLDAP test suite (already runs during the build)
  • CUPS test suite already runs in the build, but has concept of other levels (eg smbtorture).
  • samba
    • 'make test', seems to requires samba to be built with --enable-socket-wrapper
  • There's an imap test suite we can use to test dovecot: http://imapwiki.org/ImapTest

The packages that provide a build time test suite will be rebuilt in a PPA every day to catch regressions introduced by things further down the dependency chain.

ISO testing

  • We should attempt to make MathiasGug's existing ISO testing setup completely automatic. Currently, the install is done using preseeding. Once the install is complete, the operator has to invoke the appropriate script on his client, which then connects to the VM and performs the tests.

  • We should embrace KVM-autotest and use it for our ISO tests. This involves packaging KVM-autotest and providing socalled step files corresponding to each of the test cases in the ServerWhole list.

BoF agenda and discussion

Automated testing is a great way to prevent/detect regressions.

Security team qa-regression-testing scripts:

  • currently integrating in checkbox - aim at 80% - the most easy ones.
  • cr3 could run the tests in the data centre

What we want:

  • every day a report is generated covering which tests have been run and their results

Running tests in EC2.

Test results reporting:

  • leverage checkbox.
  • checkbox supports different submission plugin. What to use to track the results and generate reports?

Inclusion in milestone reports presented during the release meeting team.

QA team: easy to run the tests and process the test results internally (black box).

  • tests are run and failures are reported as bug by the QA team.

How are test suites updated because of changes in the system? Who?

  • QA team finds out about the failure and reports the bug
  • QA team fixes the test and writes tests.

What needs testing?

Integration list:

  1. qa-regression-testing scripts
    1. enable selected phoronix tests
  2. upstream test suites
    • integrate postgresql test suite
    • integrate puppet-testsuite suite package
    • integrate dovecot imap test suite (not packaged)
    • apache tests (has a framework, use documented in QRT)
    • libvirt test suite (not run during build, but could be), also tests in QRT (but not python-unit)
    • mysql test suite runs during the build
    • openldap test suite runs during the build
    • cups test suite runs in the build, but has concept of other levels (eg smbtorture)
    • samba
      • 'make test', but needs to be built with --enable-socket-wrapper
      • smbtorture
    • php5
  3. integrate iso testing tests in checkbox:
  4. review all the packages on the server CD
  5. Multi-system environements: documentation.
    • pacemaker
    • drbd

What sort of testing do we want to perform?

  • Stress/performance testing?
    • E.g. check if Apache suddenly can handle much fewer requests per second
      • than the previous day?
    • leverage phoronix test suite?
  • Functional testing?
    • E.g. use different mail clients to talk to a mail server?
    • Try a suite of different configuration combinations that we know used to
      • work?
  • Upgrade testing?
    • Do a very fat hardy install (all sorts of different servers, clients, and
      • other stuff), and upgrade it to Lucid and see how it breaks?
    • Repeat for different configurations? mvo testing infrastructure: only looking at package upgrade failure. How to test that services are working correctly after the upgrade? Marjo to figure it out.
  • Enabling test suite if they have one

Misc

chat with Steve Beattie on 2010-06-09

2010-06-09T15:04:35

hggdh

sbeattie: so now it is us...

2010-06-09T15:04:48

hggdh

brb

2010-06-09T15:05:14

sbeattie

hggdh: no worries, I need a beverage refill.

2010-06-09T15:12:29

hggdh

sbeattie: I am back

2010-06-09T15:13:46

sbeattie

hggdh: moi aussi.

2010-06-09T15:15:35

hggdh

sbeattie: ça va. So... on qa-r-t: you were saying some of the tests are potentially complex/impossible to set up

2010-06-09T15:16:02

sbeattie

Yes, digging up my notes now.

2010-06-09T15:17:55

sbeattie

hggdh: here's what I had, last updated around beta 1 or so in lucid: http://paste.ubuntu.com/447390/<<BR>>

2010-06-09T15:20:09

hggdh

cool. Are they all under checkbox (those committed)?

2010-06-09T15:20:45

sbeattie

hggdh: committed means I'd committed to a local bzr tree and was awaiting merger into checkbox trunk; I'm updating my checkbox checkout to see if I'd gotten the committed ones merged.

2010-06-09T15:21:20

hggdh

sbeattie: ah, OK

2010-06-09T15:25:19

hggdh

sbeattie: another Q -- I see coreutils there. Upstream delivers coreutils with an extensive test suite, which is run everytime we build it

2010-06-09T15:26:11

hggdh

so, do we need it in qa-r-t? or can we just run a build (say) every day with updated packages?

2010-06-09T15:26:57

sbeattie

hggdh: heh, our coreutils test is very weak; it's basically an example test of /bin/{true,false} I used in a presentation to demonstrate how to write qa-r-t tests.

2010-06-09T15:27:09

hggdh

oh, OK

2010-06-09T15:27:16

hggdh

I had not yet looked at it

2010-06-09T15:27:29

sbeattie

hggdh: their testsuite is not included in a package?

2010-06-09T15:27:52

sbeattie

is it run during our coreutils package build?

2010-06-09T15:27:58

hggdh

sbeattie: no, it is not packaged as coreutils-tests, say. But it is run on every build

2010-06-09T15:28:23

hggdh

I had a brief look at it, and it is fully immersed into their makefile environment

2010-06-09T15:29:00

hggdh

also, I remember one of the maintainers stating that the utilities we run some few thousands of times during the tests

2010-06-09T15:29:23

hggdh

s/we run/were run/

2010-06-09T15:30:10

sbeattie

hggdh: I think build-time is sufficient for testing to ensure coreutils is okay; if you're hoping to catch bugs that coreutils depends on (glibc, kernel) then kicking off a frequent/daily rebuild may make sense.

2010-06-09T15:30:31

sbeattie

(all assuming package build fails if some threshhold of tests fail)

2010-06-09T15:30:51

hggdh

sbeattie: yes, build fails on a test error (I know, had them myself ;-)

2010-06-09T15:30:59

sbeattie

hggdh: awesome!

2010-06-09T15:31:18

hggdh

sbeattie: I will add them on the regression builds we currently do daily

2010-06-09T15:31:19

sbeattie

okay, looks like cups got merged, you can cross that one off.

2010-06-09T15:33:21

sbeattie

( http://bazaar.launchpad.net/~checkbox-dev/checkbox/trunk/annotate/head:/jobs/qa_regression.txt.in is the reference for what's been already merged)

2010-06-09T15:38:06

sbeattie

hggdh: okay, based on review, all the ones that are listed as COMMITTED have been merged and are in fact DONE

2010-06-09T15:40:00

hggdh

sbeattie: OK. I am updating my local copy of your list with a :1,$s/COMMITTED/DONE/

2010-06-09T15:40:38

sbeattie

hggdh: yep, now reviewing the list of tasks you have on the blueprint

2010-06-09T15:43:11

sbeattie

hggdh: ao cups, cyrus-sasl2, and mysql tasks are already done.

2010-06-09T15:43:15

sbeattie

s/ao/so/

2010-06-09T15:44:27

sbeattie

clamav used to have a need to wait between startup and the tests running, requiring manual intervention; this may have been fixed and needs exploration.

2010-06-09T15:44:58

sbeattie

fetchmail: don't recall the issues, needs exploration

2010-06-09T15:46:36

sbeattie

libvirt starts virtual machines (as you might expect); I had passed on that because I was using ESX guests as a testrun environment (to have an accurate idea of the limitations of the test network)

2010-06-09T15:47:03

sbeattie

... and thus I wasn't going to be able to kick off kvm guests

2010-06-09T15:47:05

hggdh

and it does not make sense to run libvirt on virt...

2010-06-09T15:47:24

sbeattie

yeah

2010-06-09T15:47:30

hggdh

OK. updating the ones done on the blueprint (and crediting you)

2010-06-09T15:48:06

sbeattie

net-snmp: the test script took arguments of some kind, and thus needs reworking before it can be integrated.

2010-06-09T15:49:11

sbeattie

apache2: IIRC, the same script was used to test the various flavors of apache (worker, threaded, etc.) and needs some thought before integration can occur.

2010-06-09T15:49:51

sbeattie

dhcp3: sets up a dhcp server; needs re-work to bind this to a fake interface or somesuch.

2010-06-09T15:50:39

sbeattie

dnsmasq: my note is unclear to me, needs exploration (sorry)

2010-06-09T15:50:46

hggdh

heh

2010-06-09T15:51:26

sbeattie

freeradius: our lucid packages appear to have some breakage.

2010-06-09T15:52:26

sbeattie

ipsec-tools: needs a setup environment of hosts/networks to test setting up vpns.

2010-06-09T15:53:51

sbeattie

httpd tests: qa-r-t doesn't have a script named that, not sure if it's a copy/waste error with lighttpd (which is also there)

2010-06-09T15:54:20

sbeattie

http://bazaar.launchpad.net/~ubuntu-bugcontrol/qa-regression-testing/master/files/head:/scripts/ is the listing of the test scripts

2010-06-09T15:56:20

sbeattie

libnet-dns-perl: my note isn't helpful, my guess is that errors may have been related to networking restrictions in the datacenter, needs exploration

2010-06-09T15:56:45

hggdh

sbeattie: those are the ones already integrated, correct?

2010-06-09T15:58:08

sbeattie

hggdh: the scripts in that directory? Some are, some aren't; the tree was mostly developed by the security team to test their updates and they run them manually on the packages they're working on.

2010-06-09T15:59:48

sbeattie

our goal here is to run as many of these as we can going forward to catch regressions in the development release/milestones.

2010-06-09T16:01:30

sbeattie

lighttpd: requires apache is not running, which is tricky if we enable the apache test script, as checkbox installs everything at once, and apache's postinstall starts it up.

2010-06-09T16:03:06

sbeattie

nagios3: I didn't explore this much because of the existence of nagios1 and nagios2 tests; we could probably get away with just enabling the nagios3 test. Needs exploration.

2010-06-09T16:03:40

sbeattie

nfs-utils: needs external to the host nfs clients and servers.

2010-06-09T16:04:15

hggdh

huh... thunderstorm arriving...

2010-06-09T16:04:56

sbeattie

ntp: needs access to external ntp servers.

2010-06-09T16:05:21

sbeattie

hggdh: heh, good luck. :-)

2010-06-09T16:07:35

sbeattie

we don't get many thunderstorms out west, though I heard one rumble this morning; I miss a good thunderstorm.

2010-06-09T16:09:14

sbeattie

nut: had unknown failures, needs exploration with the test script. Though I don't recall how useful the tests are for systems without a UPS attached.

2010-06-09T16:10:17

sbeattie

ah, nut has a dummy driver that the test script uses.

2010-06-09T16:11:27

sbeattie

pptpd: test has some hardcoded networking assumptions that cause failures, I think.

2010-06-09T16:13:12

sbeattie

python: script needs a little re-working as it takes an argument to specify which version of python (2.4, 2.5, 2.6) to test.

2010-06-09T16:13:47

sbeattie

ruby: similar issues as python

2010-06-09T16:14:40

sbeattie

samba: needs working external clients and servers in its environment

2010-06-09T16:16:16

sbeattie

squid: test requires multiple protocol (http, https, ftp) access to various ubuntu.com hosts.

2010-06-09T16:16:42

sbeattie

hggdh: I think that covers all the ones on your task list.

2010-06-09T16:17:24

hggdh

sbeattie: thank you. I am updating the blueprint with your notes (so that we have a reference)

2010-06-09T16:17:54

hggdh

sbeattie: is python 2.4 still in use?

2010-06-09T16:18:49

sbeattie

hggdh: looks like it got purged in lucid.

2010-06-09T16:19:29

sbeattie

(it's in main for dapper, hardy, jaunty, and karmic, which is why the security team cares)

2010-06-09T16:19:42

hggdh

K, so it stays

2010-06-09T16:20:10

sbeattie

well, for checkbox integration, we can possibly drop it.

2010-06-09T16:21:01

sbeattie

and just focus on the "current" supported python.

2010-06-09T16:21:18

sbeattie

python2.5 also got dropped in lucid, if rmadison is to be believed.

2010-06-09T16:21:24

hggdh

so, look at 2.6 only right now

2010-06-09T16:22:01

sbeattie

hggdh: that would be the short-term approach I'd take.

2010-06-09T16:22:38

hggdh

sbeattie: thank you. I will probably have Qs later on, if you do not mind

2010-06-09T16:23:25

sbeattie

hggdh: happy to answer what I can. I've been meaning to document this more, both for our internal uses and to encourage community members to contribute testcases.

2010-06-09T16:27:02

sbeattie

I, heh, do have a work item to add; late in the lucid cycle, zul added a mysql-testsuite which contains upstreams test infrastructure (and, AFAIK, he didn't test it's packaging at all); integrating it into our mysql test script has not made it to the top of my todo list.

2010-06-09T16:27:27

sbeattie

woo; grammer/english fail.

2010-06-09T16:27:42

hggdh

sbeattie: heh. I will check with zul

2010-06-09T16:30:08

*

sbeattie needs to step away for a bit

2010-06-09T16:30:14

--

sbeattie is now known as sbeattie-afk


CategorySpec

AutomatedServerTestingSpec (last edited 2010-07-14 15:14:23 by pool-71-252-251-234)