KernelKarmicBugHandling

Summary

It has become apparent that the incoming volume of kernel bugs has become problematic to manage. The ratio of incoming bugs to resources available will not scale as the volume continues to increase. The goal of this spec is to introduce better bug management workflows and practices to combat this growing number of bugs.

Release Note

Due to the high volume of incoming kernel bugs an improved approach to bug management is being introduced. See KernelKarmicBugHandling and KernelTeamBugPoliciesfor more information.

Rationale

At the time of the Jaunty 9.04 release there were over 5000 open kernel bugs. That's a 1300+ net increase in bugs compared to the number of open bugs when Intrepid released. The kernel team can not realistically be expected to close 5000 bugs in a single release cycle. If the total number of incoming kernel bugs continues to grow at a faster rate than the existing number of bugs can be closed, it's obvious that a new approach to dealing with the incoming bug volume needs to be addressed. Otherwise the probability of a critical or high priority bug not being seen by the kernel team becomes a greater concern. Additionally Ubuntu users will continue to become discouraged when a bug they reported is not addressed. This will likely result in users not reporting bugs at all.

User stories

  • Bob reported a bug originally against Hardy and hasn't updated his bug since. However, this bug remains open and contributes to the large volume of bugs that must be tracked. This bug should be closed.
  • Sue reports a high impact bug against the Karmic Alpha release but it lacks the necessary debug information to help narrow down the root cause. This bug gets set to incomplete and lost amongst the massive volume of existing kernel bugs.
  • Joe reports a regression he's seen from Intrepid to Jaunty but may be resolved with the latest upstream mainline kernel. However, Joe is unaware he should test the upstream kernel and the bug remains unresolved.
  • Sally has reported 3 bugs all of which have gone untouched. She feels it's pointless to report bugs and thus stops reporting them.

Research

It seems Fedora has also ran into this same situation and have handled the scenarios similarly to what this spec will outline. I was unable to find any documentation regarding Suse's bug policy.

Implementation

The biggest issue is making sure that the incoming bugs are in an appropriate debug state for the developers to begin working on a bug. It may take weeks of communication back and forth between and reporter and triager before a bug has all the appropriate logs attached. It's been decided that much of this process could be streamlined by using a series of Arsenal scripts. Below is a description of how the Kernel Arsenal scripts will work. The end goal is for a bug reported against the Ubuntu kernel in Launchpad to have the appropriate debugging info attached as well as tested the latest upstream mainline kernel to verify if the issue exists or is fixed upstream. The Ubuntu Kernel Team's focus will shift to those bugs which are confirmed to exist upstream or are fixed upstream but exist in the Ubuntu kernel. We will update the KernelTeamBugPolicies wiki to match the changes being introduced here. It will more thoroughly document the new policies and procedures.

Kernel Arsenal Scripts

arsenal/contrib/linux/process-new-bugs.py

if Status == New
    if has_tag("omit"|"workflow"|"review-request")
        exit

    if has_tag("apport-bug")
        tag "needs-upstream-testing"
        logs_complete = True
        message = test upstream

    if has_tag("apport-package")
        logs_complete = True

    if Importance == Wishlist
        exit

    if bug_tasks > 3
        tag "review-request"
        exit

    if symptom == workflow
        tag "workflow"
        exit

    if symptom == sound bug
        tag "kernel-sound"

    if attachments == 0
        tag "needs-kernel-logs"
        tag "needs-upstream-testing"
        message = apport-collect
    elif parse_attachments()
        logs_complete = True

    if old bug (ie last comment > 120 days ago)
        tag "needs-verification"

    tag "kj-triage"

    if logs_complete
        status = Confirmed
    else
        status = Incomplete

    if message
        post message as a comment to the bug

arsenal/contrib/linux/process-incomplete-bugs.py

if status="Incomplete (with response)" and has_tag("needs-kernel-logs")
    if "apport-collect data" in message_colection:
        remove tag "needs-kernel-logs"
        status = Confirmed
    else
        tag "review-request"
elif status="Incomplete" and no comment for > 120 days
    status = Won't Fix

arsenal/contrib/linux/process-incomplete-bugs.py

if status="Confirmed" and has_tag("kj-triage") and not has_tag("needs-kernel-logs"&&"needs-upstream-testing")
    status = Triaged

Test/Demo Plan

The Kernel Arsenal scripts which should land soon at https://code.edge.launchpad.net/~arsenal-devel. They have a dryrun option enabled to ensure bugs are handled appropriately before turned on.

Additionally we will target small subsets of the overall number of kernel bugs until it's deemed reasonable to run the scripts for every kernel bug. This will then be handled by a cron to regularly run the Kernel Arsenal scripts.

Unresolved issues

BoF agenda and discussion

Below are a few ideas that have been suggested (some more drastic than others):

  • Mark all open bugs as Won't Fix after every release. Reporter must reopen the bug once it's confirmed against the latest development kernel.
    • Consider some exceptions, like don't close bugs tagged regression-*
    • Alternatively consider closing out all old Incomplete kernel bugs. These account for 1400+ open bugs.
    • Also consider closing out New, Confirmed, Triaged bugs which have not had a comment for say 2+ months.
  • Only allow Ubuntu specific kernel bugs to be reported. If the bug exists upstream it should be reported upstream. Ubuntu kernel devs can help resolve bugs reported upstream.
  • Modify the bug reporting process to incorporate a series of questions to be answered such as:
    • Is this a regression?
    • Have you tested the upstream kernel?

    • Has this bug been confirmed against the upstream kernel?
    • How reproducible if this bug? What are the steps to reproduce?
  • Automated bug handling? Similar to what xorg does.
    • For ex. a bot will ask them to run apport-collect if general debug files are missing.
  • Disable (ie. get rid of) the "Report a Bug" button from https://bugs.launchpad.net/ubuntu/+source/linux

    • Encourage the use 'ubuntu-bug linux' instead.

A new Kernel's bug bankrupcy policy was proposed for treating kernel bugs.

  1. Bugs reported against Linux package at LP must be tested against latest upstream kernel. If they're still valid, a new upstream bug report must be openand at LP an upstream bug watch must be put in place.
  2. If no response is received after 30 days, the Bug would be marked as Won't Fix.
  3. This policy would be discussed with Jono in order to get the Community input.
  4. The upstream bug report must include the kernel version.

a few ideas discussed at the meeting:

  • launchpad ubuntu bugs specific. check with kernel upstream. if the report still exists.
  • apport would help us to get more data, but we need more data. lp bugs must linked against upstream.
  • higher priority if a bug can be tested with upstream build
  • existing 5000 bugs. bug bankruptcy. Close them after 30 days automatically.
  • educate our users to indicate which kernel they are reporting the bug against to, specially at upstream.
  • take a survey against fedora and openssue about their bug bankrupcy policies/upstream bug report flow.
  • support for old hw i.e.
  • 95% time looking for patches to solve downstream problems.
  • testing upstream kernel to check if the bug stills there.
  • Ask for a new bug report if this exists at the latest upstream kernel.
  • compat-wireless stack against ... if you report a bug against it you need the date of building.. fast moving target.
  • encourage people to follow the path...
  • talk to jono about bug bankrupcy
  • upstream mainline kernel rights.
  • after release focus on regression.
  • be careful with junking upstream with bug reports not related.
  • take a sample and run a report, looking for bugs against older releases still present at latest cycle.


CategorySpec

specs/KernelKarmicBugHandling (last edited 2009-06-12 18:34:10 by c-76-105-148-120)