||<>|| /!\ You might want to read the [[https://wiki.ubuntu.com/QATeam/AutomatedTesting/UnderstandingJenkinsResults|general presentation]] to understand jenkins results first. This document describes how to troubleshoot ISO installation failures reported by [[http://jenkins.qa.ubuntu.com|QA's daily iso testing system]]. = Understanding the statuses = There are 4 different statuses reported by Jenkins: 1. Green: The test passed. Nothing special to do. 1. Grey: The test has been cancelled. This is usually the result of a manual action so the user who cancelled the build will know hy s/he did it. Alternatively, the test has never been executed yet. 1. Yellow: Unstable. For ISO installation, it means that the ISO installed correctly but the post-installation test failed. 1. Red: Failed. The ISO didn't install either because of an infrastructure problem or a problem with the ISO itself. We'll review each of these cases. If you're interested in understanding how the testing system itself works, check the [[https://launchpad.net/ubuntu-server-iso-testing|project page on Launchpad]] = Red Ball : Installation Failed = There are 2 reasons for a installation to fail: 1. A problem with the testing infrastructure 1. A problem with the image == Infrastructure Issues == Usually, the infrastructure is the cause of the failed build there is no attachment to the run. In this case, the only source of information is the console output. To view the console output, go to the build that failed and click on '''console'''. For example, the following [[https://jenkins.qa.ubuntu.com/job/precise-server-amd64_dns-server/69/console|run 69 of precise-server-amd64_dns-server]] failed because it was unable to do DNS resolution: {{{ INFO:root:Checking/Downloading current build for variant server from http://cdimage.ubuntu.com/ubuntu-server/daily Traceback (most recent call last): File "/usr/bin/download-latest-test-iso", line 239, in main() File "/usr/bin/download-latest-test-iso", line 169, in main fh = urllib2.urlopen(sha256sum_url) File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 394, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 412, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 372, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1201, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1171, in do_open raise URLError(err) urllib2.URLError: Archiving artifacts Recording test results Sending e-mails to: ubuntu-testing-notifications@lists.ubuntu.com Build was marked for publishing on https://jenkins.qa.ubuntu.com/ Finished: FAILURE }}} The testing framework being written in python, system error will often be python traces, which are self-explanatory. If it is not, then file a bug with the trace against [[https://bugs.launchpad.net/ubuntu-server-iso-testing|the project]] or read the code and provide a patch. == Image Issues == The second type of red bullet is when the image failed to install because it is broken. The most common causes of failures are: 1. Test timeout (installer hangs). 1. Unmet dependencies 1. Broken package === Test Timeout === The console and syslog may help to investigate why it hanged, but this must be reproduced and investigated manually. [[https://jenkins.qa.ubuntu.com/view/Precise/job/precise-desktop-amd64_default/206/||Desktop AMD64 Run #206]] shows this time of timeout which is very difficult to understand without testing locally. There is a syslog file but it contains no evidence of what actually failed. In this case, that was an error in livecd-rootfs and ubiquity was not on the image. === Unmet dependencies === This happens with the debian-installer, less frequently with Ubiquity unless additional packages are installed during installation. Desktop (Ubiquity based) images are can not be build is there are unmet/broken dependencies because package dependency resolution is done at build time, while it's done at installation time for debian-installer (server, alternate and netboot) based images When this type of installation failure happens, the details of the error will be available from the artefact '''d-i-syslog.gz'''. Here is [[https://jenkins.qa.ubuntu.com/view/Precise/job/precise-alternate-amd64_default/306/|an exemple]] of an installation that failed because of a broken dependency. And the full trace from syslog {{{ Jan 13 07:59:29 in-target: Some packages could not be installed. This may mean that you have Jan 13 07:59:29 in-target: requested an impossible situation or if you are using the unstable Jan 13 07:59:29 in-target: distribution that some required packages have not yet been created Jan 13 07:59:29 in-target: or been moved out of Incoming. Jan 13 07:59:29 in-target: The following information may help to resolve the situation: Jan 13 07:59:29 in-target: Jan 13 07:59:29 in-target: The following packages have unmet dependencies: Jan 13 07:59:29 in-target: libebook-1.2-12 : Breaks: libebook1.2-12 but 3.2.2-0ubuntu2 is to be installed Jan 13 07:59:29 in-target: libedataserver-1.2-15 : Breaks: libedataserver1.2-15 but 3.2.2-0ubuntu2 is to be installed Jan 13 07:59:29 in-target: E Jan 13 07:59:29 in-target: : Jan 13 07:59:29 in-target: Unable to correct problems, you have held broken packages. Jan 13 07:59:29 in-target: Jan 13 07:59:29 in-target: debconf (developer): <-- PROGRESS SET 800 Jan 13 07:59:29 debconf: --> PROGRESS SET 800 Jan 13 07:59:29 in-target: debconf (developer): ----> PROGRESS SET 800 Jan 13 07:59:29 debconf: <-- 0 OK Jan 13 07:59:29 in-target: debconf (developer): <---- 0 OK Jan 13 07:59:29 in-target: debconf (developer): --> 0 OK Jan 13 07:59:29 in-target: tasksel: aptitude failed (100) Jan 13 07:59:29 debconf: --> GET mirror/protocol Jan 13 07:59:29 debconf: <-- 0 http Jan 13 07:59:29 debconf: --> GET mirror/http/proxy Jan 13 07:59:29 debconf: <-- 0 http://192.168.123.1:3142 Jan 13 07:59:29 debconf: --> GET debconf/priority Jan 13 07:59:29 debconf: <-- 0 critical Jan 13 07:59:29 debconf: --> GET debian-installer/locale Jan 13 07:59:29 debconf: <-- 0 en_US.UTF-8 Jan 13 07:59:29 debconf: --> PROGRESS INFO pkgsel/progress/cleanup Jan 13 07:59:29 debconf: <-- 0 OK Jan 13 07:59:29 debconf: --> PROGRESS STOP Jan 13 07:59:29 debconf: <-- 0 OK Jan 13 07:59:29 main-menu[1721]: WARNING **: Configuring 'pkgsel' failed with error code 1 Jan 13 07:59:29 debconf: --> GET debconf/priority Jan 13 07:59:29 debconf: <-- 0 critical Jan 13 07:59:29 main-menu[1721]: WARNING **: Menu item 'pkgsel' failed. Jan 13 07:59:29 debconf: --> SETTITLE debian-installer/pkgsel/title Jan 13 07:59:29 debconf: <-- 0 OK Jan 13 07:59:29 debconf: --> CAPB Jan 13 07:59:29 debconf: <-- 0 multiselect backup progresscancel align plugin-terminal plugin-detect-keyboard Jan 13 07:59:29 debconf: --> METAGET debian-installer/pkgsel/title Description Jan 13 07:59:29 debconf: <-- 0 Select and install software Jan 13 07:59:29 debconf: --> SUBST debian-installer/main-menu/item-failure ITEM Select and install software Jan 13 07:59:29 debconf: Adding [ITEM] -> [Select and install software] Jan 13 07:59:29 debconf: <-- 0 Jan 13 07:59:29 debconf: --> INPUT critical debian-installer/main-menu/item-failure Jan 13 07:59:29 debconf: <-- 0 question will be asked Jan 13 07:59:29 debconf: --> GO }}} The indication of a critical error is the trace {{{ Jan 13 07:59:29 debconf: --> INPUT critical debian-installer/main-menu/item-failure Jan 13 07:59:29 debconf: <-- 0 question will be asked }}} The following log entry indicates at which stage of the installation the installer failed {{{ Jan 13 07:59:29 debconf: --> SUBST debian-installer/main-menu/item-failure ITEM Select and install software }}} If you scroll backward, you'll find the root cause of the failure {{{ Jan 13 07:59:29 in-target: The following packages have unmet dependencies: Jan 13 07:59:29 in-target: libebook-1.2-12 : Breaks: libebook1.2-12 but 3.2.2-0ubuntu2 is to be installed Jan 13 07:59:29 in-target: libedataserver-1.2-15 : Breaks: libedataserver1.2-15 but 3.2.2-0ubuntu2 is to be installed }}} When this type of error happens, check the [[https://lists.ubuntu.com/archives/precise-changes/|precise changes list]] for recent uploads and verify if there's a transition in process otherwise, ask on the corresponding IRC channel (in this case #ubuntu-desktop) There's another very common type of unmet dependency. When a new kernel is uploaded, debian-installer must be rebuild with this new kernel. When kernel and d-i are out-of-sync the following error is displayed in d-i-syslog.gz {{{ Jan 7 07:56:49 anna[7139]: WARNING **: no packages matching running kernel 3.2.0-8-generic in archive Jan 7 07:56:49 debconf: --> INPUT critical anna/no_kernel_modules Jan 7 07:56:49 debconf: <-- 0 question will be asked }}} Usually debian-installer is already build but was not available from the archive when the ISO was built. First check [[https://lists.ubuntu.com/archives/precise-changes/|precise changes]] if there is no new version of debian-installer with the new kernel after some hours then ping #ubuntu-installer. === Broken package === Broken package are like other installation failures on d-i based images. Search for 'A question will be asked' then scroll backward until you find the error. The behavior of d-i is 'if something unexpected happens, stop and ask' For example [[https://jenkins.qa.ubuntu.com/view/Precise/job/precise-alternate-amd64_default/232/|alternate amd64 run 232]] there was a problem with keyboard setup. The following error is displayed in the logs {{{ Nov 21 07:57:51 debconf: --> INPUT critical keyboard-configuration/xkb-keymap Nov 21 07:57:51 debconf: <-- 0 question will be asked }}} Another exemple with Ubiquity is Bug:894768 . This was a bug in the kernel that randomly crashed when ubiquity was unpacking the files to disk. Opening syslog reveals the error {{{ install.py: Exception during installation: install.py: Traceback (most recent call last): install.py: File "/usr/share/ubiquity/install.py", line 656, in install.py: install.run() install.py: File "/usr/share/ubiquity/install.py", line 130, in run install.py: self.copy_all() install.py: File "/usr/share/ubiquity/install.py", line 423, in copy_all install.py: install_misc.copy_file(self.db, sourcepath, targetpath, md5_check) install.py: File "/usr/lib/ubiquity/ubiquity/install_misc.py", line 621, in copy_file install.py: targetfh.write(buf) install.py: IOError: [Errno 22] Invalid argument install.py: }}} = Yellow Ball : Post Installation Test Failed = After an ISO is installed, the VM is rebooted and the post-installation tests run. The test are python unit test. The test script is attached to the [[https://jenkins.qa.ubuntu.com/view/Precise/job/precise-alternate-amd64_default/lastSuccessfulBuild/artifact/tests/default|build's artefacts]]. The results are also attached to the build's artefact as a [[https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-amd64_dns-server/75/artifact/75/test-results/TEST-dns-server.xml|jUnit XML file]] that jenkins can parse to then publish the results and generate fancy graphs. For example, in case of [[https://jenkins.qa.ubuntu.com/view/Precise/job/precise-server-amd64_dns-server/75||Server amd64 DNS test]] the run 75 failed because bind fails to start after install. The following assertion error is displayed in the result file: {{{ _StringException: Text attachment: traceback ------------ Traceback (most recent call last): File "test.py", line 37, in testBindDaemon self.assertNotEquals(output, "") AssertionError: '' == '' ------------ }}} You then need to know what the test does or read the test script attached to the run result to understand what the error means. In this case the output of the command {{{pgrep named}}} is empty while it shouldn't which means named is not running Another example for a desktop image (desktop amd64 default run 213) the error from TEST-default.xml is {{{ _StringException: Text attachment: traceback ------------ Traceback (most recent call last): File "test.py", line 47, in test_20DBusMachineId self.assertTrue((now - mid_mtime ) < 120 ) AssertionError: False is not true ------------ }}} Which means that the machine_id has not been regenerated during the first boot of the system. = Green Ball : Enjoy! = Nothing to do, unless you're paranoid and want to check the logs anyway. Every so often, it is good to be a paranoid: we may have false negatives (a test that succeeded, but should have failed)...