Incident Description


Crisis Response Team

  • Andy Whitcroft
  • Stefan Bader
  • Jane Silber
  • Matt Zimmerman
  • Chris Jones
  • Zaid Al Hamami
  • Colin Watson
  • Robbie Williamson


All times are in UTC.



Master bug bug 561151 is filed

dholbach brings the issue to #ubuntu-kernel team attention, Sarvatt helps diagnose, showing that upstream kernels are not affected
kernel-team becomes aware of issue, assisting with ongoing analysis, it appears to be affecting specific system
analysis suggests this may be related to an EC change. As upstream has this change too and testing has shown upstream unaffected, we suspect a conflict between this new change and an Ubuntu specific change to improve boot performance. Preparation and builds of test kernels with reversion of Ubuntu changes started.
Jane notifies Matt of the problem, naming Andy as a point of contact. Pete Graner (kernel team manager) and Robbie Williamson (Pete's manager) are both unavailable due to time zones and travel.
a number of new cases of the issue become reported, escalating the issue
testing of the reverted ubuntu reversions show similar issues

Matt takes managerial responsibility for the issue, reads #ubuntu-kernel to see what's happening

Matt and Andy speak by phone to assess the severity of the issue, agree that blocking the package is appropriate
Matt makes contact with Chris Jones, asks him to stand by

Matt provides Chris with the list of filenames to be blocked, per Andy:

linux-image-2.6.32-20-generic_2.6.32-20.29_amd64.deb (29.5 MiB)
  linux-image-2.6.32-20-preempt_2.6.32-20.29_amd64.deb (29.9 MiB)
  linux-image-2.6.32-20-server_2.6.32-20.29_amd64.deb (29.5 MiB)
  linux-image-2.6.32-20-386_2.6.32-20.29_i386.deb (29.5 MiB)
  linux-image-2.6.32-20-generic_2.6.32-20.29_i386.deb (29.5 MiB)
  linux-image-2.6.32-20-generic-pae_2.6.32-20.29_i386.deb (29.6 MiB)
Chris confirms that the package "has been blocked on ftpmaster.internal and removed from our archive servers"

Matt enters #canonical-support and notifies the Canonical support team of the issue:

<mdz> hello all
 I'd like to make you aware of a serious-looking regression in lucid which is likely to affect Canonical staff
<zaid_h> mdz - please go ahead
<mdz> zaid_h, the kernel package version 2.6.32-20.29 includes a regression which is known to affect some/many ThinkPads
 IS has already blocked further downloads of the package
 the kernel team is working on a fix
 anyone who has installed that version already could potentially find that their system won't boot
 in which case they will need to select an older kernel using GRUB
<zaid_h> mdz - okie doke. Think pads mainly? Is there a bug # we could follow for updates?
<zaid_h> MagicFab, pmatulis, EtienneG:^^
<mdz> zaid_h, I'm gathering those details now and creating an incident report at https://wiki.canonical.com/IncidentReports/2010-04-12-Lucid-Kernel
<EtienneG> understood, thanks mdz
<zaid_h> mdz, thx
Matt creates this incident report page

Matt polls #ubuntu-devel for an archive administrator to remove the package, per UbuntuPlatform/DealingWithCrisis

Scott Kitterman responds, but does not know whether he has the necessary privileges to run lp-remove-package.py
Andy: test kernels carrying with the reversion of the suspect EC change kernels prepared
Colin Watson responds
50+ confirmed cases via Launchpad reports

Colin removes the affected packages:

lp_archive@cocoplum:~/syncs$ lp-remove-package.py -u cjwatson -m 'temporarily remove due to bug 561151 affecting ThinkPad users' -b linux-image-2.6.32-20-386 linux-image-2.6.32-20-generic-pae linux-image-2.6.32-20-generic linux-image-2.6.32-20-preempt linux-image-2.6.32-20-server
2010-04-12 13:51:54 INFO    creating lockfile
2010-04-12 13:52:00 INFO    Removing candidates:
2010-04-12 13:52:00 INFO        linux-image-2.6.32-20-386 2.6.32-20.29 in lucid i386
2010-04-12 13:52:00 INFO        linux-image-2.6.32-20-generic-pae 2.6.32-20.29 in lucid i386
2010-04-12 13:52:00 INFO        linux-image-2.6.32-20-generic 2.6.32-20.29 in lucid amd64
2010-04-12 13:52:00 INFO        linux-image-2.6.32-20-generic 2.6.32-20.29 in lucid i386
2010-04-12 13:52:00 INFO        linux-image-2.6.32-20-preempt 2.6.32-20.29 in lucid amd64
2010-04-12 13:52:00 INFO        linux-image-2.6.32-20-server 2.6.32-20.29 in lucid amd64
2010-04-12 13:52:00 INFO    Removed-by: Colin Watson
2010-04-12 13:52:00 INFO    Comment: temporarily remove due to bug 561151 affecting ThinkPad users
2010-04-12 13:52:00 INFO    6 packages successfully removed.
Confirm this transaction? [yes, no] yes
2010-04-12 13:52:14 INFO    Transaction committed.
2010-04-12 13:52:14 INFO    The archive will be updated in the next publishing cycle.

testing confirms EC change as the culprit:

 * (pre-stable) ACPI: EC: Allow multibyte access to EC
     - LP: #526354
Robbie Williamson comes online, and Matt notifies him of the incident in progress
Matt hands off responsibility to Robbie
Daniel Holbach confirms the test kernel (2.6.32-20-generic #30~lp561151v201004121418) resolves the issue
Andy had 3 confirmations and estimates updated binaries in the archive in 4 hours (18:30)

Andy will continue to investigate the reason for the bug, as the fix released will revert some Dell machines back into http://bugs.launchpad.net/bugs/526354

apw: additional testing shows 6/6 thinkpads resolved, also this seems to fix 2/2 mac books
apw: uploaded updated kernel 2.6.32-20.30, builds started for i386 and amd64

Packages for i386/amd64 complete building.

Packages are pushed out onto main archive machines and top-tier mirrors.

Tue, 13 Apr 2010 00:01:03 +0100: External archive mirror triggers completed.


<Identify positive things that happened. What went right in the course of our response?>

  • Early and direct reporting of the issues to the kernel team by affected employees got resolution of the issue started several hours sooner than it would otherwise have been. If you are affected by a problem and wonder if you should tell someone, tell someone. (AndyWhitcroft)


<Identify problems with the events. What went wrong in the course of our response?>

  • We were slow to realise this was a generic issue affecting all ThinkPads, leading to the affected kernel being available for much longer than it should have been. (AndyWhitcroft)

  • The particular response chosen represented one side of a trade-off between minimising the number of affected users and enabling developers to work effectively. Once the kernel was made non-downloadable, all network-only upgrades and netboot installation tests became impossible, which may be a serious problem for the relevant developers in the crunch time immediately before FinalFreeze. (ColinWatson)

  • Because of changes to preserve a certain boot experience, some users had a hard time booting into older, installed kernels. (RobbieWilliamson)


<Suggest changes to process to minimize problems in the future. These should correspond to the problems identified above.>

  • Better testing of kernel bug fixes post KernelFreeze (RobbieWilliamson)

  • I think the decision to block the download was the right one. With that said, the decision to block the download of pakcages should always take into account where we are in the release cycle, as it could potentially cause more harm than good. (RobbieWilliamson)

  • Investigate how we can provide access to the boot loader, while preserving the overall boot experience for the user (RobbieWilliamson)

IncidentReports/2010-04-12-Lucid-Kernel (last edited 2010-04-28 09:53:34 by robbie.w)