<> Owner: Pete Graner === Incident Description === A security update to the Lucid kernel included a KVM security fix. This fix prevented all KVM images being started (they would start to a black screen and then stop) on the majority of Intel based systems. This fix originated from vendor-sec via the Security Team. === Crisis Response Team === * Andy Whitcroft -- Kernel Team * Kees Cook -- Security Team * Dustin Kirkland -- Server Team === Events === 03-jun-2010 14:24 -- Barry Warsaw (barry) reports he is unable to start KVM instances following and update. He reports that booting the previous -21 kernel on his system allows him to start the kernels. '''[15:24] did something in last night's kernel update break kvm? with 2.6.32-22, none of my vms start up. rebooting to 2.6.32-21 fixes the problem''' 03-jun-2010 15:00 -- Pete Graner (pgraner) brings the Incident to the attention of the Kernel Team and assigns Andy Whitcroft (apw) to aid in diagnosis. 03-jun-2010 15:01 -- Colin Watsom (cjwatson) asks if we need to withdraw the update. Discussions tend to suggest that as it is one sub-system and one release affected that we will wait until we can confirm the penetration of the issue. 03-jun-2010 15:32 -- kirkland reports no issues on 2.6.32-22.33. 03-jun-2010 15:51 -- apw reports he can also reproduce this on the 2.6.32-22.35 kernel, kirkland upgrades and concurs. 03-jun-2010 16:09 -- apw indicates that there are only 8 patches in the security updates and only one affects KVM 03-jun-2010 16:40 -- apw tests kernel packages confirming that removing the KVM security patch resolves the issue 03-jun-2010 17:08 -- kirkland confirms that the patched kernel resolves the issue for him 03-jun-2010 17:20 -- kees confirms the patched kernel resolves the issue for him, and requests a source package for upload 03-jun-2010 18:06 -- source package for 2.6.32-22.36 provided to the Security Team 03-jun-2010 18:08 -- kees uploads new source package to security build queue, ETA for finished build, based on prior i386 and amd64 builds in just over 6 hours. 04-jun-2010 03:00 (approx) -- both amd64 and i386 packages are published to the archive and available for download. All times are in UTC. === Successes === * identified this as a KVM issue quickly and got the Kernel, Server and Security teams engaged quickly * testers responsive to requests to test patched kernels, leading to a speedy resolution === Problems === * As the original issue was related to a security change discussions started in private, they then continued in private for the remainder of the response. It would have been appropriate to move discussions to a public IRC channel, perhaps #ubuntu-release once we knew there was no embargo requirements. === Recommendations === * This could have perhaps benefited from some burn time in a linux-proposed or linux-security-proposed archive; and would have further benefited from some internal encouragement (as mdz did in the past) for ubuntu-platform@ to track lucid-*updates through the first point release or so (kirkland) * one issue here is that this was an embargoed security update, therefore there is no proposed style location for the uploads to be tested. All testing occurs on a need to know basis and indeed Server were involved in testing Karmic and previous as they included more extensive KVM fixes. The failure was not more extensively testing Lucid. (apw) * We need to ensure a full matrix of tests are performed regardless of the size of the fixes (apw) * Lucid was not as extensively tested as the other releases as the changes were much more minor there (part of the security update had already been accepted in a prior upload and therefore already tested). * More formally list the stake-holders for an update such that the right people are brought in to test components (apw) * embargoed security updates are shared on a need to know basis, therefore we only engage external testers where their sub-systems (KVM, arm SOC, ecryptfs, etc) are affected. It would be helpful to more formally record the appropriate contacts for each significant sub-system where we have specific testing resource. * Evaluate whether we could get QA to do testing within the embargoed setting (apw) * We should have a defined location for discussion on on-going incidents, #ubuntu-release perhaps (apw)