2010-03-15-Launchpad-broke-apport

Incident Description

Apport could not create new bug reports in Launchpad.

Crisis Response Team

  • Martin Pitt
  • Graham Binns
  • Gary Poster

Events

All times are in UTC.

The bug 538097 contains the majority of the story. Below are highlights, with extra information at the end not included in the bug report because it is security related.

2010-03-12

Bug is first reported as apport problem.

2010-03-13

Bug is categorized as possibly a Launchpad problem. Rick Spencer highlights issue to Launchpad and Canonical Desktop teams.

2010-03-14

No action.

2010-03-15

  • Martin Pitt (apport et al) conclusively identifies problem as something in Launchpad. Problem is that Launchpad's +storeblob page fails with "500 Internal server error" on production when called from apport.
  • Graham Binns (Launchpad Bugs) diagnoses, passes off to Gary Poster (Launchpad Foundations) when they confirm root cause. Foundations had attempted to close an cross-site request forwarding (CSRF or XSRF) hole in Launchpad, and the new code demanded that browser requests have a REFERER header in many cases (the typical solution for this problem). apport does not send a REFERER header, so the code needed to make an exception for the +storeblob page that would allow apport's request through.
  • Gary Poster (Launchpad Foundations) makes branch for change, gets reviewed, and sent through tests to land on Launchpad's production code branch.

2010-03-16

07:57: Steve McInerney announces that change has been rolled out to production

Successes

  • Problem was addressed.
  • Once Launchpad team focused on issue, fix was pushed through to deployment about as quickly as systems allow.

Problems

  • This was a regression: it should not have happened.

Recommendations

Launchpad has automated tests and QA processes for testing browser interactions with the system. We do not have these things for other other Launchpad clients, except as tests for launchpadlib itself.

The Launchpad team should identify key non-browser clients, such as apport, and integrate them into our automated tests and QA. Production changes should not proceed without QA on these key clients. It would be nice to be able to sometimes assert that this kind of QA is unnecessary, but because some clients forge browser requests, like apport does with +storeblob, it is not safe to assume that any change, even in view code alone, will not cause a regression.

IncidentReports/2010-03-15-Launchpad-broke-apport (last edited 2010-04-28 09:08:24 by eth0)