Crisis Response Team
- Andy Whitcroft
- Stefan Bader
- Jane Silber
- Matt Zimmerman
- Chris Jones
- Zaid Al Hamami
- Colin Watson
- Robbie Williamson
All times are in UTC.
Master bug bug 561151 is filed
- dholbach brings the issue to #ubuntu-kernel team attention, Sarvatt helps diagnose, showing that upstream kernels are not affected
- kernel-team becomes aware of issue, assisting with ongoing analysis, it appears to be affecting specific system
- analysis suggests this may be related to an EC change. As upstream has this change too and testing has shown upstream unaffected, we suspect a conflict between this new change and an Ubuntu specific change to improve boot performance. Preparation and builds of test kernels with reversion of Ubuntu changes started.
- Jane notifies Matt of the problem, naming Andy as a point of contact. Pete Graner (kernel team manager) and Robbie Williamson (Pete's manager) are both unavailable due to time zones and travel.
- a number of new cases of the issue become reported, escalating the issue
- testing of the reverted ubuntu reversions show similar issues
Matt takes managerial responsibility for the issue, reads #ubuntu-kernel to see what's happening
- Matt and Andy speak by phone to assess the severity of the issue, agree that blocking the package is appropriate
- Matt makes contact with Chris Jones, asks him to stand by
Matt provides Chris with the list of filenames to be blocked, per Andy:
linux-image-2.6.32-20-generic_2.6.32-20.29_amd64.deb (29.5 MiB) linux-image-2.6.32-20-preempt_2.6.32-20.29_amd64.deb (29.9 MiB) linux-image-2.6.32-20-server_2.6.32-20.29_amd64.deb (29.5 MiB) linux-image-2.6.32-20-386_2.6.32-20.29_i386.deb (29.5 MiB) linux-image-2.6.32-20-generic_2.6.32-20.29_i386.deb (29.5 MiB) linux-image-2.6.32-20-generic-pae_2.6.32-20.29_i386.deb (29.6 MiB)
- Chris confirms that the package "has been blocked on ftpmaster.internal and removed from our archive servers"
Matt enters #canonical-support and notifies the Canonical support team of the issue:
<mdz> hello all I'd like to make you aware of a serious-looking regression in lucid which is likely to affect Canonical staff <zaid_h> mdz - please go ahead <mdz> zaid_h, the kernel package version 2.6.32-20.29 includes a regression which is known to affect some/many ThinkPads IS has already blocked further downloads of the package the kernel team is working on a fix anyone who has installed that version already could potentially find that their system won't boot in which case they will need to select an older kernel using GRUB <zaid_h> mdz - okie doke. Think pads mainly? Is there a bug # we could follow for updates? <zaid_h> MagicFab, pmatulis, EtienneG:^^ <mdz> zaid_h, I'm gathering those details now and creating an incident report at https://wiki.canonical.com/IncidentReports/2010-04-12-Lucid-Kernel <EtienneG> understood, thanks mdz <zaid_h> mdz, thx
- Matt creates this incident report page
Matt polls #ubuntu-devel for an archive administrator to remove the package, per UbuntuPlatform/DealingWithCrisis
- Scott Kitterman responds, but does not know whether he has the necessary privileges to run lp-remove-package.py
- Andy: test kernels carrying with the reversion of the suspect EC change kernels prepared
- Colin Watson responds
- 50+ confirmed cases via Launchpad reports
Colin removes the affected packages:
lp_archive@cocoplum:~/syncs$ lp-remove-package.py -u cjwatson -m 'temporarily remove due to bug 561151 affecting ThinkPad users' -b linux-image-2.6.32-20-386 linux-image-2.6.32-20-generic-pae linux-image-2.6.32-20-generic linux-image-2.6.32-20-preempt linux-image-2.6.32-20-server 2010-04-12 13:51:54 INFO creating lockfile 2010-04-12 13:52:00 INFO Removing candidates: 2010-04-12 13:52:00 INFO linux-image-2.6.32-20-386 2.6.32-20.29 in lucid i386 2010-04-12 13:52:00 INFO linux-image-2.6.32-20-generic-pae 2.6.32-20.29 in lucid i386 2010-04-12 13:52:00 INFO linux-image-2.6.32-20-generic 2.6.32-20.29 in lucid amd64 2010-04-12 13:52:00 INFO linux-image-2.6.32-20-generic 2.6.32-20.29 in lucid i386 2010-04-12 13:52:00 INFO linux-image-2.6.32-20-preempt 2.6.32-20.29 in lucid amd64 2010-04-12 13:52:00 INFO linux-image-2.6.32-20-server 2.6.32-20.29 in lucid amd64 2010-04-12 13:52:00 INFO Removed-by: Colin Watson 2010-04-12 13:52:00 INFO Comment: temporarily remove due to bug 561151 affecting ThinkPad users 2010-04-12 13:52:00 INFO 6 packages successfully removed. Confirm this transaction? [yes, no] yes 2010-04-12 13:52:14 INFO Transaction committed. 2010-04-12 13:52:14 INFO The archive will be updated in the next publishing cycle.
testing confirms EC change as the culprit:
* (pre-stable) ACPI: EC: Allow multibyte access to EC - LP: #526354
- Robbie Williamson comes online, and Matt notifies him of the incident in progress
- Matt hands off responsibility to Robbie
- Daniel Holbach confirms the test kernel (2.6.32-20-generic #30~lp561151v201004121418) resolves the issue
- Andy had 3 confirmations and estimates updated binaries in the archive in 4 hours (18:30)
Andy will continue to investigate the reason for the bug, as the fix released will revert some Dell machines back into http://bugs.launchpad.net/bugs/526354
- apw: additional testing shows 6/6 thinkpads resolved, also this seems to fix 2/2 mac books
- apw: uploaded updated kernel 2.6.32-20.30, builds started for i386 and amd64
- Packages are pushed out onto main archive machines and top-tier mirrors.
Tue, 13 Apr 2010 00:01:03 +0100: External archive mirror triggers completed.
<Identify positive things that happened. What went right in the course of our response?>
Early and direct reporting of the issues to the kernel team by affected employees got resolution of the issue started several hours sooner than it would otherwise have been. If you are affected by a problem and wonder if you should tell someone, tell someone. (AndyWhitcroft)
<Identify problems with the events. What went wrong in the course of our response?>
The particular response chosen represented one side of a trade-off between minimising the number of affected users and enabling developers to work effectively. Once the kernel was made non-downloadable, all network-only upgrades and netboot installation tests became impossible, which may be a serious problem for the relevant developers in the crunch time immediately before FinalFreeze. (ColinWatson)
Because of changes to preserve a certain boot experience, some users had a hard time booting into older, installed kernels. (RobbieWilliamson)
<Suggest changes to process to minimize problems in the future. These should correspond to the problems identified above.>
I think the decision to block the download was the right one. With that said, the decision to block the download of pakcages should always take into account where we are in the release cycle, as it could potentially cause more harm than good. (RobbieWilliamson)
Investigate how we can provide access to the boot loader, while preserving the overall boot experience for the user (RobbieWilliamson)