DebuggingJenkinsISOInstallations

Warning /!\ You might want to read the general presentation to understand jenkins results first.

This document describes how to troubleshoot ISO installation failures reported by QA's daily iso testing system.

Understanding the statuses

There are 4 different statuses reported by Jenkins:

  1. Green: The test passed. Nothing special to do.
  2. Grey: The test has been cancelled. This is usually the result of a manual action so the user who cancelled the build will know hy s/he did it. Alternatively, the test has never been executed yet.
  3. Yellow: Unstable. For ISO installation, it means that the ISO installed correctly but the post-installation test failed.
  4. Red: Failed. The ISO didn't install either because of an infrastructure problem or a problem with the ISO itself.

We'll review each of these cases.

If you're interested in understanding how the testing system itself works, check the project page on Launchpad

Red Ball : Installation Failed

There are 2 reasons for a installation to fail:

  1. A problem with the testing infrastructure
  2. A problem with the image

Infrastructure Issues

Usually, the infrastructure is the cause of the failed build there is no attachment to the run. In this case, the only source of information is the console output.

To view the console output, go to the build that failed and click on console.

For example, the following run 69 of precise-server-amd64_dns-server failed because it was unable to do DNS resolution:

INFO:root:Checking/Downloading current build for variant server from http://cdimage.ubuntu.com/ubuntu-server/daily
Traceback (most recent call last):
  File "/usr/bin/download-latest-test-iso", line 239, in <module>
    main()
  File "/usr/bin/download-latest-test-iso", line 169, in main
    fh = urllib2.urlopen(sha256sum_url)
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 394, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 412, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1201, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1171, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Archiving artifacts
Recording test results
Sending e-mails to: ubuntu-testing-notifications@lists.ubuntu.com
Build was marked for publishing on https://jenkins.qa.ubuntu.com/
Finished: FAILURE

The testing framework being written in python, system error will often be python traces, which are self-explanatory. If it is not, then file a bug with the trace against the project or read the code and provide a patch.

Image Issues

The second type of red bullet is when the image failed to install because it is broken. The most common causes of failures are:

  1. Test timeout (installer hangs).
  2. Unmet dependencies
  3. Broken package

Test Timeout

The console and syslog may help to investigate why it hanged, but this must be reproduced and investigated manually.

https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/206/ shows this time of timeout which is very difficult to understand without testing locally.

There is a syslog file but it contains no evidence of what actually failed. In this case, that was an error in livecd-rootfs and ubiquity was not on the image.

Unmet dependencies

This happens with the debian-installer, less frequently with Ubiquity unless additional packages are installed during installation.

Desktop (Ubiquity based) images are can not be build is there are unmet/broken dependencies because package dependency resolution is done at build time, while it's done at installation time for debian-installer (server, alternate and netboot) based images

When this type of installation failure happens, the details of the error will be available from the artefact d-i-syslog.gz.

Here is an exemple of an installation that failed because of a broken dependency.

And the full trace from syslog

Jan 13 07:59:29 in-target: Some packages could not be installed. This may mean that you have
Jan 13 07:59:29 in-target: requested an impossible situation or if you are using the unstable
Jan 13 07:59:29 in-target: distribution that some required packages have not yet been created
Jan 13 07:59:29 in-target: or been moved out of Incoming.
Jan 13 07:59:29 in-target: The following information may help to resolve the situation:
Jan 13 07:59:29 in-target: 
Jan 13 07:59:29 in-target: The following packages have unmet dependencies:
Jan 13 07:59:29 in-target:  libebook-1.2-12 : Breaks: libebook1.2-12 but 3.2.2-0ubuntu2 is to be installed
Jan 13 07:59:29 in-target:  libedataserver-1.2-15 : Breaks: libedataserver1.2-15 but 3.2.2-0ubuntu2 is to be installed
Jan 13 07:59:29 in-target: E
Jan 13 07:59:29 in-target: : 
Jan 13 07:59:29 in-target: Unable to correct problems, you have held broken packages.
Jan 13 07:59:29 in-target: 
Jan 13 07:59:29 in-target: debconf (developer): <-- PROGRESS SET 800
Jan 13 07:59:29 debconf: --> PROGRESS SET 800
Jan 13 07:59:29 in-target: debconf (developer): ----> PROGRESS SET 800
Jan 13 07:59:29 debconf: <-- 0 OK
Jan 13 07:59:29 in-target: debconf (developer): <---- 0 OK
Jan 13 07:59:29 in-target: debconf (developer): --> 0 OK
Jan 13 07:59:29 in-target: tasksel: aptitude failed (100)
Jan 13 07:59:29 debconf: --> GET mirror/protocol
Jan 13 07:59:29 debconf: <-- 0 http
Jan 13 07:59:29 debconf: --> GET mirror/http/proxy
Jan 13 07:59:29 debconf: <-- 0 http://192.168.123.1:3142
Jan 13 07:59:29 debconf: --> GET debconf/priority
Jan 13 07:59:29 debconf: <-- 0 critical
Jan 13 07:59:29 debconf: --> GET debian-installer/locale
Jan 13 07:59:29 debconf: <-- 0 en_US.UTF-8
Jan 13 07:59:29 debconf: --> PROGRESS INFO pkgsel/progress/cleanup
Jan 13 07:59:29 debconf: <-- 0 OK
Jan 13 07:59:29 debconf: --> PROGRESS STOP
Jan 13 07:59:29 debconf: <-- 0 OK
Jan 13 07:59:29 main-menu[1721]: WARNING **: Configuring 'pkgsel' failed with error code 1
Jan 13 07:59:29 debconf: --> GET debconf/priority
Jan 13 07:59:29 debconf: <-- 0 critical
Jan 13 07:59:29 main-menu[1721]: WARNING **: Menu item 'pkgsel' failed.
Jan 13 07:59:29 debconf: --> SETTITLE debian-installer/pkgsel/title
Jan 13 07:59:29 debconf: <-- 0 OK
Jan 13 07:59:29 debconf: --> CAPB
Jan 13 07:59:29 debconf: <-- 0 multiselect backup progresscancel align plugin-terminal plugin-detect-keyboard
Jan 13 07:59:29 debconf: --> METAGET debian-installer/pkgsel/title Description
Jan 13 07:59:29 debconf: <-- 0 Select and install software
Jan 13 07:59:29 debconf: --> SUBST debian-installer/main-menu/item-failure ITEM Select and install software
Jan 13 07:59:29 debconf: Adding [ITEM] -> [Select and install software]
Jan 13 07:59:29 debconf: <-- 0
Jan 13 07:59:29 debconf: --> INPUT critical debian-installer/main-menu/item-failure
Jan 13 07:59:29 debconf: <-- 0 question will be asked
Jan 13 07:59:29 debconf: --> GO

The indication of a critical error is the trace

Jan 13 07:59:29 debconf: --> INPUT critical debian-installer/main-menu/item-failure
Jan 13 07:59:29 debconf: <-- 0 question will be asked

The following log entry indicates at which stage of the installation the installer failed

Jan 13 07:59:29 debconf: --> SUBST debian-installer/main-menu/item-failure ITEM Select and install software

If you scroll backward, you'll find the root cause of the failure

Jan 13 07:59:29 in-target: The following packages have unmet dependencies:
Jan 13 07:59:29 in-target:  libebook-1.2-12 : Breaks: libebook1.2-12 but 3.2.2-0ubuntu2 is to be installed
Jan 13 07:59:29 in-target:  libedataserver-1.2-15 : Breaks: libedataserver1.2-15 but 3.2.2-0ubuntu2 is to be installed

When this type of error happens, check the precise changes list for recent uploads and verify if there's a transition in process otherwise, ask on the corresponding IRC channel (in this case #ubuntu-desktop)

There's another very common type of unmet dependency. When a new kernel is uploaded, debian-installer must be rebuild with this new kernel. When kernel and d-i are out-of-sync the following error is displayed in d-i-syslog.gz

Jan  7 07:56:49 anna[7139]: WARNING **: no packages matching running kernel 3.2.0-8-generic in archive
Jan  7 07:56:49 debconf: --> INPUT critical anna/no_kernel_modules
Jan  7 07:56:49 debconf: <-- 0 question will be asked

Usually debian-installer is already build but was not available from the archive when the ISO was built. First check precise changes if there is no new version of debian-installer with the new kernel after some hours then ping #ubuntu-installer.

Broken package

Broken package are like other installation failures on d-i based images. Search for 'A question will be asked' then scroll backward until you find the error. The behavior of d-i is 'if something unexpected happens, stop and ask'

For example alternate amd64 run 232 there was a problem with keyboard setup.

The following error is displayed in the logs

Nov 21 07:57:51 debconf: --> INPUT critical keyboard-configuration/xkb-keymap
Nov 21 07:57:51 debconf: <-- 0 question will be asked

Another exemple with Ubiquity is 894768 . This was a bug in the kernel that randomly crashed when ubiquity was unpacking the files to disk. Opening syslog reveals the error

install.py: Exception during installation:
install.py: Traceback (most recent call last):
install.py: File "/usr/share/ubiquity/install.py", line 656, in <module>
install.py: install.run()
install.py: File "/usr/share/ubiquity/install.py", line 130, in run
install.py: self.copy_all()
install.py: File "/usr/share/ubiquity/install.py", line 423, in copy_all
install.py: install_misc.copy_file(self.db, sourcepath, targetpath, md5_check)
install.py: File "/usr/lib/ubiquity/ubiquity/install_misc.py", line 621, in copy_file
install.py: targetfh.write(buf)
install.py: IOError: [Errno 22] Invalid argument
install.py:

Yellow Ball : Post Installation Test Failed

After an ISO is installed, the VM is rebooted and the post-installation tests run.

The test are python unit test. The test script is attached to the build's artefacts.

The results are also attached to the build's artefact as a jUnit XML file that jenkins can parse to then publish the results and generate fancy graphs.

For example, in case of https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-amd64_dns-server/75 the run 75 failed because bind fails to start after install. The following assertion error is displayed in the result file:

<testcase classname="test.DnsServerTest" name="testBindDaemon" time="0.009">
  <failure type="testtools.testresult.real._StringException">
    _StringException: Text attachment: traceback 
    ------------ 
    Traceback (most recent call last): 
    File "test.py", line 37, in testBindDaemon self.assertNotEquals(output, "") 
    AssertionError: '' == '' 
    ------------
  </failure>
</testcase>

You then need to know what the test does or read the test script attached to the run result to understand what the error means. In this case the output of the command pgrep named is empty while it shouldn't which means named is not running

Another example for a desktop image (desktop amd64 default run 213) the error from TEST-default.xml is

<failure type="testtools.testresult.real._StringException">
  _StringException: Text attachment: traceback 
  ------------ 
  Traceback (most recent call last): 
  File "test.py", line 47, in test_20DBusMachineId self.assertTrue((now - mid_mtime ) < 120 ) 
  AssertionError: False is not true 
  ------------
</failure>

Which means that the machine_id has not been regenerated during the first boot of the system.

Green Ball : Enjoy!

Nothing to do, unless you're paranoid and want to check the logs anyway. Every so often, it is good to be a paranoid: we may have false negatives (a test that succeeded, but should have failed)...

QATeam/phillw/AutomatedTesting/DebuggingJenkinsISOInstallations (last edited 2014-07-22 21:57:52 by host-80-41-221-66)