== Dev Week -- QA: Automated Testing and Jenkins -- hggdh -- Tue, Jan 31st, 2012 == {{{#!irc [20:01] hello. My name is Carlos de-Avillez, and I am a member of the Ubuntu QA team. You can reach me either on IRC (nick: hggdh) or via email to hggdh2@ubuntu.com. [20:02] Since beginning of 2010 we have been working on test automation. This meant we had to create our own lab, and prepare it for all we need. [20:02] this is, still a Work In Progress ;-) [20:02] today I will talk about our usage of Jenkins [20:02] http://jenkins-ci.org/ [20:03] we implemented (er, *are* implementing) Jenkins on two places: an internal one, where the tests are actually executed (in our lab), [20:03] and an external one, where results of the tests are visible for all [20:04] the internal Jenkins is access-controlled, and out of scope right now [20:04] the external one can be reached at https://jenkins.qa.ubuntu.com/ [20:04] this is, pretty much, what we see internally, but read-only. [20:05] We are adding tests as we go, and all Ubuntu development teams are collaborating in making the tests more (and more) inclusive [20:06] Jenkins can be seen as a scheduler (it is not only a scheduler, but simpler to get acquainted to) [20:07] if you look at the public instance, you will see it has a not-so-big-yet-but-growing list of tabs [20:07] in each tab you will see the tests -- in Jenkins parlance, the *jobs* that we currently publish [20:08] the very first tab -- "all" -- shows ALL the tests; each other tab gives us a view restricted to an area of interest [20:09] so, for example, https://jenkins.qa.ubuntu.com/view/Precise/ shows all tests currently being run for the Precise Pangolin Ubuntu version [20:10] (BTW, we will be relying on https://wiki.ubuntu.com/QATeam/AutomatedTesting, https://wiki.ubuntu.com/QATeam/AutomatedTesting/UnderstandingJenkins, [20:10] and https://wiki.ubuntu.com/QATeam/AutomatedTesting/UnderstandingJenkinsResults for this talk) [20:11] each Jenkins job can have one of four possible final states: [20:11] 1 - successful -- represented by a green ball [20:11] 2. failed -- represented by a red ball [20:12] 3. unstable -- represented by an yellow ball [20:12] 4. not-yet-run (or no record surviving in Jenkins) -- represented by a gray ball [20:13] apologies for those of you that are -- as I am -- colour-confused [20:13] tests are written in a mix of code (usually Python, or shell scripts) and Jenkins setup [20:14] Pretty much all the code we use for the tests can be found at https://launchpad.net/ubuntu-server-iso-testing [20:15] this is a bazaar branch; commit access to it is, as usual, restricted to members of the Ubuntu Server Iso Testing Developers, a team on Launchpad [20:15] https://launchpad.net/~ubuntu-server-iso-testing-dev [20:16] please do contribute. You can tweak the current code, and propose changes, via a bazaar merge request [20:17] anyway [20:17] if we look at the precise-desktop-amd64_default job (https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/) [20:17] we will see that it failed [20:18] on the Build History (at the left) we can see the last runs, about 30 of them; we can also see that the last two runs failed [20:19] run 223 was successful, and 222 was... unstable [20:19] if we click on the last run -- 225, url https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/225/ [20:21] we will be able to see some more links to data. The most important, usually, is the rawe console output (link at the left). This will give us all output to stdout that was generated by the tests [20:21] this does not mean *all* output of the job, just what was sent to stdout [20:23] looking at it, we see that the first 130K of data was not shown, and the rest seems to be a system running and just -- pretty much -- renewing IP address via DHCP [20:24] looking at near the end, we see this message: [20:24] DEBUG:root:Test e40d4cb6-bc3d-4c6d-b618-a30826e5c26e failed to execute within 120 minutes [20:24] so... the test fails [20:25] (going back to run 225 summary) [20:25] at the middle of this screen -- again, I am at https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/225/ [20:26] we see a "Build artifacts" link and, under it, a series of other links. [20:26] what we are usually interested in is *not* these other links, but what is under the "Build artifacts" [20:27] so, drilling down on it (https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/225/artifact/) [20:28] we see a 225/test-results link, and some others. It is the test-results we usually want... [20:29] under it we only see a d-i-syslog.log.gz file (er, link). This sort of tells us that the test indeed failed ;-) [20:30] (an example of a successful run is at https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/223/artifact/223/test-results/ )\ [20:31] diwic asked: You asked for contributions, but assume I've made a change to the branch, how do I go ahead and test it locally before proposing it to be merged? Do I have to set up a local jenkins instance...etc? [20:32] hum [20:33] yes, you *might*. It would be better to, I mean. We are considering how to set up a test Jenkins, where code changes can be tested without needing the whole shebang [20:34] gang65 asked: How I could look at current job configuration? - I would like to see which scripts it is using... [20:38] (sorry for the delay, was battling a login) [20:39] yes, this is a good question. There is no way, right now, just tested it. You have to have access to the internal Jenkins to see the configuration. We will work on that [20:39] sorry [20:40] so, back to looking at errors [20:40] the only way to find out what happened on run 225... [20:42] go back to build 225 summary page (https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/225/ ) [20:43] and look at the 'default' link on the middle of the page, at the 'run_test' on the same page, and at the syslog output [20:44] the 'default' shows you the actual test code that is (would) be executed [20:44] the 'run_test' is the driver [20:44] and the syslog is all that was sent to stdout. [20:45] we know the test did not even execute, in this case [20:45] so I would start with the syslog, looking for "abnormal" messages [20:46] and in the syslog, around line 978, we see a python stacktrace [20:47] it may be part of the problem. We would need to review all syslog, and talk with the developers to really find out [20:48] an example of a *unstable* execution is in precise-server-i386_dns-server [20:48] https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-i386_dns-server/76 [20:49] drilling down on the build artifacts, we see an interesting link: [20:49] (actually two) [20:49] https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-i386_dns-server/76/artifact/76/test-results/TEST-dns-server.xml [20:49] and [20:49] https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-i386_dns-server/76/artifact/76/test-results/dns-server.stderr [20:50] the first one, the XML file, is the results Jenkins looks for to determine final status [20:50] There are 10 minutes remaining in the current session. [20:50] looking at it, we find that the tests *did* run, but failed [20:51] for example, [20:51] Unable to perform external DNS resolution from the QA Lab [20:51] [20:52] looking at the stderr one, we see that a 'netstat -ltnp' showed *NO* DNS server running [20:52] ergo, DNS failed [20:52] so we would need to look at the full syslog to try to find the error [20:52] (I happen to know, it was a package install failure, already corrected) [20:53] you can also subscribe to the mailing list that announces Jenkins job results [20:54] https://lists.ubuntu.com/mailman/listinfo/ubuntu-testing-notifications [20:54] you should receive an email of all Jenkins runs that caused a change of status *from* successful, and all failed runs [20:55] this usually runs to about 30 emails per day; a really bad day may have around 100 of them [20:55] There are 5 minutes remaining in the current session. [20:55] If you subscribe, I suggest filtering to what you want to know only (either procmail, or local email client filtering) [20:56] but we are already out of time... [20:57] so. Please go to #ubuntu-testing for any questions you may have, or email ubuntu-qa@lists.ubuntu.com. we all hang the the #ubuntu-testing channel, and we all subscribe to the ubuntu-qa ML [20:57] thank you, [20:57] diwic asked: This all seems quite complex (inspecting log files etc). What are the advantages of Jenkins compared to other testing frameworks? [20:58] jenkins allows us to set up different environments, from bare-metal to EC2 and KVMs, and gives us a consolidated view of all tests [20:59] also, firing off tests can be automated -- checking a bzr branch and firing off on new code, firing off on availability of a new ISO, etc [21:00] thank you all again }}}