DuplicateBugConsolidation

Summary

With automated bug reporting from apport, kerneloops and checkbox we frequently receive duplicate bug reports. The duplicate bug reports usually don't provide much value, subsequently there should be mechanisms in place to prevent filing of duplicate bug reports. Additionally, tools for finding and consolidating duplicates already reported in Launchpad should be developed and distributed to bug triaging teams.

Rationale

During the Karmic release cycle we had lots of duplicate bugs reported, for example consider bug 422536 with 107 duplicates and bug 429322 with 1097 duplicates. The high quantity of duplicates utilizes extra resources for very little value and results in a poor user experience for bug reporters.

User stories

Arnold is an Ubuntu user who wants to help make Ubuntu better and is testing the Beta version of the next release of Ubuntu. Unfortunately, when he is are using pornview it ends up crashing on him and he receives an apport dialog notifying them of the crash. He proceeds to file the crash report in Launchpad only to have it duplicated hours later. He continues to receive e-mail for every bug that is a made a duplicate of the master report, since he is subscribed to it, and becomes rather annoyed.

Franco is an Ubuntu developer and is subscribed to package bug reports for the package pytrainer which he maintains. There happens to be a bug in pytrainer which affects every user of it - and there are lots! Franco becomes inundated with bug mail regarding duplicates of this bug and ends up unsubscribing from bug reports about pytrainer.

Implementation

Code Changes

The apport-retracer shall be modified to subscribe ubuntu-bugcontrol to crashes with more than 10 duplicates. This will act as a notification system so the team knows what bugs should have a pattern written for them. Additionally, in the even that apport finds a matching pattern, for an already reported bug, it will then 'metoo' that bug report. Also apport-collect shall be modified to encourage collectors to report a new bug instead of adding their information to someone else's bug report. Apport will also be modified so that more files are searchable by bug patterns.

Bughugger will extended to provide an easy way for search for duplicate bug reports and merge them.

The ubuntu-qa-tools bzr branch includes a few scripts for working with duplicate bug reports, but these could use some improvement. The is-duplicate script shall be modified to deal with a bug report that has duplicates. A single launchpadlib script shall be written for moving all the duplicates from one bug to another. (Similar to move_duplicates.py from the examples folder in python-launchpad-bugs.) A launchpadlib script will also be written that uses searchTasks.findSimilarBugs() to facilitate in finding duplicate bug reports.

The search-bugs script from ubuntu-bugpatterns will be modified to consolidate duplicate bug reports that it finds.

Documentation / Process Changes

Documentation regarding how to write and test bug patterns will be written in the Ubuntu wiki and linked to from relevant Bug Squad pages. The documentation will also be blogged about to get more people writing bug patterns.

BoF agenda and discussion

With automated bug reporting from apport, kerneloops and checkbox(?) we frequently have duplicate bug reports submitted. The duplicate bug reports usually don't provide much value, subsequently patterns should be written to prevent filing of duplicates. There should also be facilities for identifying which bugs would most benefit from a pattern. Additionally, tools for finding and consolidating duplicates already reported should be developed and deployed.

Finding duplicates

  • apport-retracer will subscribe ubuntu-bugcontrol to crashes with more than 10 duplicates so they are notified of what bugs to write patterns for
  • duplicate finding functionality in apport-retracer could be extended to all apport-originated bugs (not just crashes)
  • also consider bugs with apport-collect data
  • search-bugs from ubuntu-bugpatterns can be used for finding already reported duplicates
    • could provide an option to then go ahead and mark them all (using is-duplicate)

Consolidating duplicates

  • is-duplicate from ubuntu-qa-tools can consolidate bugs to one master bug report
    • could be improved to deal with a duplicate that already has duplicates

Avoiding duplicates

  • have apport do some duplicate detection on the client side and "metoo" the bug already reported
    • for these classes of bug reports e.g. ones that do not need retracing
      • kerneloops
      • python crashes
      • suspend/resume

Issues

  • need a script for moving duplicates of one bug with duplicates to another one
  • could do better with localized error messages in apport-package bug reports on the retracer end
    • encourage loco teams to look at these bug reports and translate the error messages
    • can (should) launchpad lookup the error message and find the English translation for you?
  • finding and consolidation of duplicates is a multi-step process and some what complicated

ACTIONS

  • apport-retracer will subscribe ubuntu-bugcontrol to crashes with more than 10 duplicates so they are notified of what bugs to write patterns for (Martin Pitt)
  • check with ubuntu-bugcontrol to confirm that bug subscription is okay with them (Brian Murray)
  • find out / document process for getting translator help with bug reports (Brian Murray)
    • useful for localized error messages like apport-package
  • DONE on 2009-11-23 - notify bugsquad / triagers about apport symptoms and particulary the one for storage devices in karmic (Brian Murray)
  • write a launchpadlib script utilizing searchTasks.findSimilarBugs() (Brian Murray)
  • write a tool that will make working with duplicates easier (strongly consider using bughugger for this) (Brian Murray)
    • it should write or facilitate a bug pattern for testing
    • search for duplicates
    • merge or propose for merging bug duplicates
    • mark all the bugs as duplicates
  • work with Launchpad team to ensure that bugs being marked as a duplicate does NOT send an e-mail to subscribers of the master bug report (Brian Murray)
  • update apport to strongly encourage people to report a new bug instead of using apport-collect with a bug that is not theirs (Brian Murray)
    • it already asks for confirmation but this is insufficient
  • update the files that apport searches - https://bugs.edge.launchpad.net/ubuntu/+source/apport/+bug/444975

How to write a pattern

ACTION: document this better for the Ubuntu Bug Control team (and developers) and notify (Brian Murray)

https://wiki.ubuntu.com/Apport/DeveloperHowTo#Bug patterns

  • write an xml file with part of an apport bug report "OopsText.txt" and a string to search for

  • the pattern url points the reporter to either a bug in launchpad or a wiki page
  • bzr branch is at bzr+ssh://bazaar.launchpad.net/~ubuntu-bugcontrol/apport/ubuntu-bugpatterns/

  • anybody in bug control can add a pattern
  • use test-local and your master bug number to see if you pattern works
  • use search-bugs to find existing bug reports that match your pattern


CategorySpec

QATeam/Specs/DuplicateBugConsolidation (last edited 2009-12-02 21:25:32 by c-24-21-43-9)