AutomatedTestingAndJenkins

Dev Week -- QA: Automated Testing and Jenkins -- hggdh -- Tue, Jan 31st, 2012

   1 [20:01] <hggdh> hello. My name is Carlos de-Avillez, and I am a member of the Ubuntu QA team. You can reach me either on IRC (nick: hggdh) or via email to hggdh2@ubuntu.com.
   2 [20:02] <hggdh> Since beginning of 2010 we have been working on test automation. This meant we had to create our own lab, and prepare it for all we need.
   3 [20:02] <hggdh> this is, still a Work In Progress ;-)
   4 [20:02] <hggdh> today I will talk about our usage of Jenkins
   5 [20:02] <hggdh> http://jenkins-ci.org/
   6 [20:03] <hggdh> we implemented (er, *are* implementing) Jenkins on two places: an internal one, where the tests are actually executed (in our lab),
   7 [20:03] <hggdh> and an external one, where results of the tests are visible for all
   8 [20:04] <hggdh> the internal Jenkins is access-controlled, and out of scope right now
   9 [20:04] <hggdh> the external one can be reached at https://jenkins.qa.ubuntu.com/
  10 [20:04] <hggdh> this is, pretty much, what we see internally, but read-only.
  11 [20:05] <hggdh> We are adding tests as we go, and all Ubuntu development teams are collaborating in making the tests more (and more) inclusive
  12 [20:06] <hggdh> Jenkins can be seen as a scheduler (it is not only a scheduler, but simpler to get acquainted to)
  13 [20:07] <hggdh> if you look at the public instance, you will see it has a not-so-big-yet-but-growing list of tabs
  14 [20:07] <hggdh> in each tab you will see the tests -- in Jenkins parlance, the *jobs* that we currently publish
  15 [20:08] <hggdh> the very first tab -- "all" -- shows ALL the tests; each other tab gives us a view restricted to an area of interest
  16 [20:09] <hggdh> so, for example, https://jenkins.qa.ubuntu.com/view/Precise/ shows all tests currently being run for the Precise Pangolin Ubuntu version
  17 [20:10] <hggdh> (BTW, we will be relying on https://wiki.ubuntu.com/QATeam/AutomatedTesting, https://wiki.ubuntu.com/QATeam/AutomatedTesting/UnderstandingJenkins,
  18 [20:10] <hggdh> and https://wiki.ubuntu.com/QATeam/AutomatedTesting/UnderstandingJenkinsResults for this talk)
  19 [20:11] <hggdh> each Jenkins job can have one of four possible final states:
  20 [20:11] <hggdh> 1 - successful -- represented by a green ball
  21 [20:11] <hggdh> 2. failed -- represented by a red ball
  22 [20:12] <hggdh> 3. unstable -- represented by an yellow ball
  23 [20:12] <hggdh> 4. not-yet-run (or no record surviving in Jenkins) -- represented by a gray ball
  24 [20:13] <hggdh> apologies for those of you that are -- as I am -- colour-confused
  25 [20:13] <hggdh> tests are written in a mix of code (usually Python, or shell scripts) and Jenkins setup
  26 [20:14] <hggdh> Pretty much all the code we use for the tests can be found at https://launchpad.net/ubuntu-server-iso-testing
  27 [20:15] <hggdh> this is a bazaar branch; commit access to it is, as usual, restricted to members of the Ubuntu Server Iso Testing Developers, a team on Launchpad
  28 [20:15] <hggdh> https://launchpad.net/~ubuntu-server-iso-testing-dev
  29 [20:16] <hggdh> please do contribute. You can tweak the current code, and propose changes, via a bazaar merge request
  30 [20:17] <hggdh> anyway
  31 [20:17] <hggdh> if we look at the precise-desktop-amd64_default job (https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/)
  32 [20:17] <hggdh> we will see that it failed
  33 [20:18] <hggdh> on the Build History (at the left) we can see the last runs, about 30 of them; we can also see that the last two runs failed
  34 [20:19] <hggdh> run 223 was successful, and 222 was... unstable
  35 [20:19] <hggdh> if we click on the last run -- 225, url https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/225/
  36 [20:21] <hggdh> we will be able to see some more links to data. The most important, usually, is the rawe console output (link at the left). This will give us all output to stdout that was generated by the tests
  37 [20:21] <hggdh> this does not mean *all* output of the job, just what was sent to stdout
  38 [20:23] <hggdh> looking at it, we see that the first 130K of data was not shown, and the rest seems to be a system running and just -- pretty much -- renewing IP address via DHCP
  39 [20:24] <hggdh> looking at near the end, we see this message:
  40 [20:24] <hggdh> DEBUG:root:Test e40d4cb6-bc3d-4c6d-b618-a30826e5c26e failed to execute within 120 minutes
  41 [20:24] <hggdh> so... the test fails
  42 [20:25] <hggdh> (going back to run 225 summary)
  43 [20:25] <hggdh> at the middle of this screen -- again, I am at https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/225/
  44 [20:26] <hggdh> we see a "Build artifacts" link and, under it, a series of other links.
  45 [20:26] <hggdh> what we are usually interested in is *not* these other links, but what is under the "Build artifacts"
  46 [20:27] <hggdh> so, drilling down on it (https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/225/artifact/)
  47 [20:28] <hggdh> we see a 225/test-results link, and some others. It is the test-results we usually want...
  48 [20:29] <hggdh> under it we only see a d-i-syslog.log.gz file (er, link). This sort of tells us that the test indeed failed ;-)
  49 [20:30] <hggdh> (an example of a successful run is at https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/223/artifact/223/test-results/ )\
  50 [20:31] <ClassBot> diwic asked: You asked for contributions, but assume I've made a change to the branch, how do I go ahead and test it locally before proposing it to be merged? Do I have to set up a local jenkins instance...etc?
  51 [20:32] <hggdh> hum
  52 [20:33] <hggdh> yes, you *might*. It would be better to, I mean. We are considering how to set up a test Jenkins, where code changes can be tested without needing the whole shebang
  53 [20:34] <ClassBot> gang65 asked: How I could look at current job configuration? - I would like to see which scripts it is using...
  54 [20:38] <hggdh> (sorry for the delay, was battling a login)
  55 [20:39] <hggdh> yes, this is a good question. There is no way, right now, just tested it. You have to have access to the internal Jenkins to see the configuration. We will work on that
  56 [20:39] <hggdh> sorry
  57 [20:40] <hggdh> so, back to looking at errors
  58 [20:40] <hggdh> the only way to find out what happened on run 225...
  59 [20:42] <hggdh> go back to build 225 summary page (https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/225/ )
  60 [20:43] <hggdh> and look at the 'default' link on the middle of the page, at the 'run_test' on the same page, and at the syslog output
  61 [20:44] <hggdh> the 'default' shows you the actual test code that is (would) be executed
  62 [20:44] <hggdh> the 'run_test' is the driver
  63 [20:44] <hggdh> and the syslog is all that was sent to stdout.
  64 [20:45] <hggdh> we know the test did not even execute, in this case
  65 [20:45] <hggdh> so I would start with the syslog, looking for "abnormal" messages
  66 [20:46] <hggdh> and in the syslog, around line 978, we see a python stacktrace
  67 [20:47] <hggdh> it may be part of the problem. We would need to review all syslog, and talk with the developers to really find out
  68 [20:48] <hggdh> an example of a *unstable* execution is in precise-server-i386_dns-server
  69 [20:48] <hggdh> https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-i386_dns-server/76
  70 [20:49] <hggdh> drilling down on the build artifacts, we see an interesting link:
  71 [20:49] <hggdh> (actually two)
  72 [20:49] <hggdh> https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-i386_dns-server/76/artifact/76/test-results/TEST-dns-server.xml
  73 [20:49] <hggdh> and
  74 [20:49] <hggdh> https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-i386_dns-server/76/artifact/76/test-results/dns-server.stderr
  75 [20:50] <hggdh> the first one, the XML file, is the results Jenkins looks for to determine final status
  76 [20:50] <ClassBot> There are 10 minutes remaining in the current session.
  77 [20:50] <hggdh> looking at it, we find that the tests *did* run, but failed
  78 [20:51] <hggdh> for example, <testcase classname="test.DnsServerTest" name="testResolveTcp" time="0.000">
  79 [20:51] <hggdh> <skip>Unable to perform external DNS resolution from the QA Lab&#xA;</skip>
  80 [20:51] <hggdh> </testcase>
  81 [20:52] <hggdh> looking at the stderr one, we see that a 'netstat -ltnp' showed *NO* DNS server running
  82 [20:52] <hggdh> ergo, DNS failed
  83 [20:52] <hggdh> so we would need to look at the full syslog to try to find the error
  84 [20:52] <hggdh> (I happen to know, it was a package install failure, already corrected)
  85 [20:53] <hggdh> you can also subscribe to the mailing list that announces Jenkins job results
  86 [20:54] <hggdh> https://lists.ubuntu.com/mailman/listinfo/ubuntu-testing-notifications
  87 [20:54] <hggdh> you should receive an email of all Jenkins runs that caused a change of status *from* successful, and all failed runs
  88 [20:55] <hggdh> this usually runs to about 30 emails per day; a really bad day may have around 100 of them
  89 [20:55] <ClassBot> There are 5 minutes remaining in the current session.
  90 [20:55] <hggdh> If you subscribe, I suggest filtering to what you want to know only (either procmail, or local email client filtering)
  91 [20:56] <hggdh> but we are already out of time...
  92 [20:57] <hggdh> so. Please go to #ubuntu-testing for any questions you may have, or email ubuntu-qa@lists.ubuntu.com. we all hang the the #ubuntu-testing channel, and we all subscribe to the ubuntu-qa ML
  93 [20:57] <hggdh> thank you,
  94 [20:57] <ClassBot> diwic asked: This all seems quite complex (inspecting log files etc). What are the advantages of Jenkins compared to other testing frameworks?
  95 [20:58] <hggdh> jenkins allows us to set up different environments, from bare-metal to EC2 and KVMs, and gives us a consolidated view of all tests
  96 [20:59] <hggdh> also, firing off tests can be automated -- checking a bzr branch and firing off on new code, firing off on availability of a new ISO, etc
  97 [21:00] <hggdh> thank you all again

MeetingLogs/devweek1201/AutomatedTestingAndJenkins (last edited 2012-02-01 12:54:14 by dholbach)