2010-06-03-kernel-kvm-security-update-breaks-lucid-kvm

Owner: Pete Graner

Incident Description

A security update to the Lucid kernel included a KVM security fix. This fix prevented all KVM images being started (they would start to a black screen and then stop) on the majority of Intel based systems. This fix originated from vendor-sec via the Security Team.

Crisis Response Team

  • Andy Whitcroft -- Kernel Team
  • Kees Cook -- Security Team
  • Dustin Kirkland -- Server Team

Events

03-jun-2010 14:24 -- Barry Warsaw (barry) reports he is unable to start KVM instances following and update. He reports that booting the previous -21 kernel on his system allows him to start the kernels.

  • [15:24] <barry> did something in last night's kernel update break kvm? with 2.6.32-22, none of my vms start up. rebooting to 2.6.32-21 fixes the problem

03-jun-2010 15:00 -- Pete Graner (pgraner) brings the Incident to the attention of the Kernel Team and assigns Andy Whitcroft (apw) to aid in diagnosis.

03-jun-2010 15:01 -- Colin Watsom (cjwatson) asks if we need to withdraw the update. Discussions tend to suggest that as it is one sub-system and one release affected that we will wait until we can confirm the penetration of the issue.

03-jun-2010 15:32 -- kirkland reports no issues on 2.6.32-22.33.

03-jun-2010 15:51 -- apw reports he can also reproduce this on the 2.6.32-22.35 kernel, kirkland upgrades and concurs.

03-jun-2010 16:09 -- apw indicates that there are only 8 patches in the security updates and only one affects KVM

03-jun-2010 16:40 -- apw tests kernel packages confirming that removing the KVM security patch resolves the issue

03-jun-2010 17:08 -- kirkland confirms that the patched kernel resolves the issue for him

03-jun-2010 17:20 -- kees confirms the patched kernel resolves the issue for him, and requests a source package for upload

03-jun-2010 18:06 -- source package for 2.6.32-22.36 provided to the Security Team

03-jun-2010 18:08 -- kees uploads new source package to security build queue, ETA for finished build, based on prior i386 and amd64 builds in just over 6 hours.

04-jun-2010 03:00 (approx) -- both amd64 and i386 packages are published to the archive and available for download.

All times are in UTC.

Successes

  • identified this as a KVM issue quickly and got the Kernel, Server and Security teams engaged quickly
  • testers responsive to requests to test patched kernels, leading to a speedy resolution

Problems

<Identify problems with the events. What went wrong in the course of our response?>

  • As the original issue was related to a security change discussions started in private, they then continued in private for the remainder of the response. It would have been appropriate to move discussions to a public IRC channel, perhaps #ubuntu-release once we knew there was no embargo requirements.

Recommendations

<Suggest changes to process to minimize problems in the future. These should correspond to the problems identified above.>

  • This could have perhaps benefited from some burn time in a linux-proposed or linux-security-proposed archive; and would have further benefited from some internal encouragement (as mdz did in the past) for ubuntu-platform@ to track lucid-*updates through the first point release or so (kirkland)
    • one issue here is that this was an embargoed security update, therefore there is no proposed style location for the uploads to be tested. All testing occurs on a need to know basis and indeed Server were involved in testing Karmic and previous as they included more extensive KVM fixes. The failure was not more extensively testing Lucid. (apw)
  • We need to ensure a full matrix of tests are performed regardless of the size of the fixes (apw)
    • Lucid was not as extensively tested as the other releases as the changes were much more minor there (part of the security update had already been accepted in a prior upload and therefore already tested).
  • More formally list the stake-holders for an update such that the right people are brought in to test components (apw)
    • embargoed security updates are shared on a need to know basis, therefore we only engage external testers where their sub-systems (KVM, arm SOC, ecryptfs, etc) are affected. It would be helpful to more formally record the appropriate contacts for each significant sub-system where we have specific testing resource.
  • Evaluate whether we could get QA to do testing within the embargoed setting (apw)
  • We should have a defined location for discussion on on-going incidents, #ubuntu-release perhaps (apw)

IncidentReports/2010-06-03-kernel-kvm-security-update-breaks-lucid-kvm (last edited 2010-06-04 12:03:31 by apw)