Summary

This spec describes a method for categorizing and sorting AutomatedProblemReports to help developers identify similar or identical bugs.

Rationale

CrashReporting can result in DrinkingFromTheFirehose due to the massive influx of problem reports. We must provide a way for developers to wade through the inundation without getting lost in sudden influxes of thousands of copies of the same problem.

Use cases

There are many use cases:

In each of these cases the CrashReporting daemon would report a problem back to Ubuntu. On the server, these reports would be analyzed, categorized, and tagged using the resulting information.

Scope

The scope of this spec includes all problems reported by CrashReporting.

The scope of tagging is to identify characteristics of crash reports such as what caused the crash, what program was being executed, what libraries were linked in, where the crash occurred, and elements of the program's state such as call traces (if not destroyed by stack smashes).

The characteristics identified should pertain to the process in question only, and not to its relation to other crash reports. Other than this, there is no limit to the characteristics identified; in fact, as wide a scope as possible is desired, especially including any signal delivered at death and any information such as if a SIGSEGV occurred due to a read or execute and if it occurred due to insufficient memory protections or due to unmapped memory and where the attempt was made.

Automatically identifying bugs as duplicates is out of scope.

Design

We will need a crash handler and reporter, which is from CrashReporting.

The server handling the CrashReporting should tag problems exhibiting known characteristics to fall into certain categories. Known characteristics could include termination methods such as SIGSEGV read unmapped or SIGKILL self.

The interface developers use to view CrashReporting reports would be capable of displaying all characteristics of a report and allowing developers to select a subset of these characteristics. This subset would then be cross-checked against all reports to generate a list of matching reports.

From a list of reports, developers should be able to review reports and tag them as being related to any other reports. For our purposes the only "relation" we care about is whether two reports are the same bug.

Reports marked as the same bug should always mark as the same bug as the earliest reported instance. The earliest instance known of any bug will be marked in the report as the being the same as "itself". This may complicate the marking process in some cases; but the searching process will be much faster because everything will point to the same bug, and thus listing reports that are "the same bug as this report" is simply looking for reports whose "the same bug as" field is the same as the current report's.

Implementation

The stuff in Design needs to be implemented on top of the facilities of CrashReporting.

Code

We will need CrashReporting working first with automatic submissions to the server.

The server handling CrashReporting must tag problems exhibiting known characteristics, allowing developers to sort and examine them.

Data preservation and migration

No issues exist.

Unresolved issues

Characteristics to tag

There are several characteristics we can use to identify crashes, including:

There are several more I don't discuss here. We need a complete list.

Adding new characteristics later is alright because the only characteristics that help us are those heuristically detectable. It thus stands to reason that when a new characteristic is added, the system can rescan every report ever made and tag any matches.

Use of tags

The CrashReporting server would tag reports with information such as the above. Developers would later be able to use these tags as search criteria to find similar bugs. For example, pitti may select a crash and "search for similar crashes." He may then enter (through check boxes) various similarities to locate. Let's take the example of a SIGSEGV attempting to execute the stack; the below options may be available to him:

The range of options is limitless; but some example searches include:

This would prove to be a powerful tool for taking characteristics of an automatically detected problem and matching it with other problems.

The other problems listed may or may not be related. Developers would manually tag them as being the same bug; group them in the same group; and have quick and easy future reference to all the different reports.

BoF agenda and discussion

Comments


CategorySpec

AutomatedProblemReportsTagging (last edited 2008-08-06 16:36:12 by localhost)