KernelLucidBugHandling

Summary

It is still apparent that the incoming volume of kernel bugs remains problematic to manage. The ratio of incoming bugs to resources still doesn't scale. The goal of this spec is to re-evaluate our current bug management work flow and practices and determine a more effective way to manage kernel bugs.

Release Note

Due to the increasing volume of incoming kernel bugs an improved and sustainable approach to bug management is being introduced. See KernelTeam/KernelLucidBugHandling and KernelTeamBugPolicies for more information.

Rationale

During the Karmic cycle we started implementing Kernel Bug Days as well as writing kernel arsenal scripts to combat the growing volume of kernel bugs. If we stop to look at the numbers, up until Karmic Beta we were seeing much better improvements compared to Jaunty. In the 5 month time span between Karmic opening and Karmic Beta, there was an increase of ~1000 open bugs of which the bugs in a New state increased by ~300. Compare that to the 5 month time span between Jaunty opening and Jaunty Beta, there was a far less total increase of ~300 open bugs but the bugs in a new state increase by ~750.

However, then Karmic Beta landed and we were completely smothered with the increased bug volume. Again, just looking at the statistics, between Karmic Beta and Karmic Final (a time frame of less than 1 month), we saw an increase of ~1500 open bugs of which the bugs in a New state increased by ~1250. Compare that to the time frame between Jaunty Beta and Jaunty Final which saw an increase of ~900 open bugs of which the bugs in a New state increased by ~700.

Then we compound this with the fact that things like kerneloops wasn't disabled until after Karmic's Final release and in the 1 week following Karmic's release, we saw an additional increase of ~1200 open/new bugs.

It's obvious that we need to continue to examine our current bug management work flow and determine how we can effectively deal with this ever growing volume of kernel bugs.

User stories

  • Bob reported a bug originally against Hardy and hasn't updated his bug since. However, this bug remains open and contributes to the large volume of bugs that must be tracked. This bug should be confirmed against the latest kernel or closed.
  • John reports a kernel oops but is unaware that this has already been reported and has a possible fix for the issue. John's bug should be detected via a bugpattern or launchpad script to mark it as a duplicate.
  • Sue reports a high impact bug against the Lucid Alpha release but it lacks the necessary debug information to help narrow down the root cause. This bug gets set to incomplete and lost amongst the massive volume of existing kernel bugs.
  • Joe reports a regression he's seen from Karmic to Lucid but it never gets tagged as a regression and goes unnoticed.
  • Sally has reported 3 bugs all of which have gone untouched. She feels it's pointless to report bugs and thus stops reporting them.

Design

Some ideas to improve handling new bugs coming in:

  1. Interactive apport hooks for the kernel.
  2. Apport bug reporting improvements.
  3. LP lib scripts to search for known issues with fixes (for ex kerneloops)
  4. Improve arsenal scripts
  5. Leverage the HWDB to find bugs on the same hardware/platforms.
  6. Modify the http://wiki.ubuntu.com/KernelTeam/BugDay

  7. Continue to work on building and mentoring the Ubuntu kernel community.
  8. Additional Crack of the Day kernel builds

Implementation

  1. Interactive apport hooks for the kernel. Note: patch already accepted to apport for Lucid. See bug 444672
    • [DONE] Is the issue confirmed with the usptream kernel?
    • [DONE] Is this a regression?
    • [DONE] If a regression, please note most recent working kernel
    • [DONE] Is this reproducible?
    • [DONE] if reproducible, please describe steps to reproduce
    • [DONE] If not reproducible, how often does the issue appear?
    • [ACTION] Should we add something like "Is there a known patch to fix this issue?" so we know the ones we should be able to close quickly?
  2. Apport bug reporting improvements.
    • [ACTION] Don't allow reporting against linux-meta, just use linux
    • [ACTION] Is it possible to tag bugs based on subsystem (networking, audio, sound, wifi) ?
  3. LP lib scripts to search for known issues with fixes (for ex kerneloops)
  4. Improve arsenal scripts
    • [ACTION] Need to make these smarter so they can be run without intervention for at least bugs reported against lucid
    • [ACTION] go through and tag (w/ no e-mail?) bug reports using release information so tag bugs as karmic, hardy etc...
    • [ACTION] detect and tag bugs against staging drivers (also maybe inject [Staging] in the title of the bug?)
    • See lp:arsenal-devel for current set of scripts and lp:~leannogasawara/arsenal/kernel for additional merge request
  5. Leverage the HWDB to find bugs on the same hardware/platforms.
  6. Modify the http://wiki.ubuntu.com/KernelTeam/BugDay

    • [ACTION] Don't assign lists to specific developers. Have one common list for anyone to pick from.
    • [ACTION] look for interesting bugs with videos/pictures, simply cause they are more interesting
    • [ACTION] Re-evaluate our advertising of bug days
  7. Continue to work on building and mentoring the Ubuntu kernel community.
    • [ACTION] Update/Create wiki, ex "How Can we Help"
      • How to Test
      • How to Triage
      • How to run pre-proposed
      • How to bisect regression
      • Section listing area of kernel (network, audio etc) with pointers to the right people
  8. Additional Crack of the Day kernel builds
    • [ACTION] Lucid daily build of tip
    • [ACTION] build some bisect points between releases
    • [ACTION] add HEAD to /proc/version_signature for tracking purposes

BoF agenda and discussion

  • apw said to look at how kerneloops is working - upstream would like to be able to contact the reporter so we should provide upstream with a "stream" of kerneloops reports in Launchpad
  • modify apport to do consolidation of apport-kerneloops bug reports rather than it being done manually
  • consolidate existing apport-kerneloops duplicate reports
  • go through and tag (w/ no e-mail?) bug reports using release information so tag bugs as karmic, hardy etc...
  • enable automatic running of arsenal scripts for bugs reported for lucid (do you mean check release version? for apport jobs we know at least) onwards (> today)

  • apport should not report bugs about 'linux-meta' just change the package to linux
  • modify apport linux hook to tag bugs based off the subsystem (networking, audio, wireless driver)
  • create a "How Can We Help" page which lists
    • how to test
    • how to triage
    • how to run pre-proposed etc
    • linux area of kernel (network, audio etc) with pointers to the right people
  • can we use the X's lead on wiki pages to better help users
  • can we produce some push through bisect by pre-prepared kernels for bi-sect points
    • we should be building the tip of our development tree as a 'c-o-d' kernel [cod]
  • tag bugs as server related based off the kernel name, etc. (should be done when bug is being filed via apport)
  • can we detect and tag non -Ubuntu bugs
    • apport maybe reject those bugs (bugpatterns)
    • retrospectivly detect those bugs
  • HEAD in version_signature
  • identify bugs using staging drivers and inject in the title something like [Staging]
  • BugDays

    • look for interesting bugs with videos/pictures, simply cause they are more interesting
    • seems we advertising problem
  • other packages other than linux
    • linux-firmware


CategorySpec

KernelTeam/Specs/KernelLucidBugHandling (last edited 2009-11-30 22:04:31 by c-76-105-148-120)