RegressionRooter

Differences between revisions 1 and 2
Revision 1 as of 2010-06-03 20:10:24
Size: 1234
Editor: pool-74-107-147-166
Comment:
Revision 2 as of 2010-06-03 20:56:53
Size: 4472
Editor: pool-74-107-147-166
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
This is a preliminary/conceptual design of a tool for assisting with narrowing down regressions in Ubuntu. This is a preliminary/conceptual design of a tool for assisting with narrowing down regressions in Ubuntu, by doing mechanical analysis of version/date info from /var/log/dpkg.log, and correlating with other users seeing similar symptoms.
Line 12: Line 12:

== Fundamentals ==

When a user reports a bug, if they identify that it is a regression in Ubuntu, we can assume three things. First, that there existed a state on that user's machine where the regression did not occur, second that the current state of the user's machine exhibits the bug, and third that the packages the user tested were installed via dpkg.

If we're fortunate, the user may remember approximately when they did not see the regression, thus giving us a more tangible initial date for #1. Without at least a rough guess, it will be hard to apply regression analysis techniques, so to simplify things we'll just exclude regressions where the initial date is unknown.

Now, dpkg maintains a log at {{{/var/log/dpkg.log*}}} which lists each package that is updated, the version updated, and when it was updated.

That is some really key information, but we can't use it until we know one more bit of information - what package (or packages) are relevant to the bug.

This doesn't tell us whether the user was doing a system update, vs. installing particular packages, but we can intuit that based on the quantity (and type) of packages that were updated within a short period of time.

Given some ability to distinguish updates from individual package installs, we can derive an additional bit of data - the update frequency for the machine.

One other fundamental bit of data we can draw from is the date the given version of the package first became available for mirrors. This can be useful in correlating different users' bug reports that share similar symptoms. This data can be extracted from Launchpad itself.

== Concept ==

RegressionRooter is a desktop application for analyzing dpkg.log files to help isolate regression bugs with the goal of making them easier to solve.

To use the tool, the first step is to start a new analysis session. (You may be working on multiple bugs at the same time, so we keep track of the state of each analysis session separately.) The user can input a description of the problem or a bug report # at this point. They also need to select range of datetimes when they believe they first started noticing the regression; a fuzzy confidence score can be specified for these dates. Optionally they can specify one or more package names/regex's they think may be related; a fuzzy confidence score can be entered here too.

The tool then parses /var/log/dpkg.log* files and builds a table of package changes. At this time it also calculates a rough weighting score for each package based on whether it was updated inside the date range and the confidence score. Packages which match the regex's the user provided have their score increased by an amount commensurate with their confidence score. This table is saved as a file (format should be textual such as .txt, .csv, .json, or .xml).

Next, the user has a choice of steps to take. RegressionRooter can assist in filing a bug for the user by suggesting a package name to file against, and invoking apport to submit a bug report; along with the other standard info

This is a preliminary/conceptual design of a tool for assisting with narrowing down regressions in Ubuntu, by doing mechanical analysis of version/date info from /var/log/dpkg.log, and correlating with other users seeing similar symptoms.

Background

Regression bugs are distinctive because they can be analyzed effectively by just comparing the broken and working cases. In a best case situation, the bug can be narrowed to a specific change in a specific package, which gives a developer really, really good info for solving the issue quickly.

Regression bugs also tend to be very important, since the user experience at having software worsen is extremely undesirable all around.

Currently, we have a variety of ad hoc methodologies for tracking and analyzing regression bugs. For instance, we might ask the user to do a git bisection search - which is effective, but may be intimidating for the non-technical. We can also ask people to downgrade to earlier versions of ubuntu packages, but that works only if they still have the earlier .deb file somewhere handy. Tracking of regression bugs is done by giving appropriate tags to bug reports; there are several tools like apport hooks which assist with this.

But as a whole, the overall approach is not very systematic or cohesive, and depends on a lot of manual work and user coaching.

Fundamentals

When a user reports a bug, if they identify that it is a regression in Ubuntu, we can assume three things. First, that there existed a state on that user's machine where the regression did not occur, second that the current state of the user's machine exhibits the bug, and third that the packages the user tested were installed via dpkg.

If we're fortunate, the user may remember approximately when they did not see the regression, thus giving us a more tangible initial date for #1. Without at least a rough guess, it will be hard to apply regression analysis techniques, so to simplify things we'll just exclude regressions where the initial date is unknown.

Now, dpkg maintains a log at /var/log/dpkg.log* which lists each package that is updated, the version updated, and when it was updated.

That is some really key information, but we can't use it until we know one more bit of information - what package (or packages) are relevant to the bug.

This doesn't tell us whether the user was doing a system update, vs. installing particular packages, but we can intuit that based on the quantity (and type) of packages that were updated within a short period of time.

Given some ability to distinguish updates from individual package installs, we can derive an additional bit of data - the update frequency for the machine.

One other fundamental bit of data we can draw from is the date the given version of the package first became available for mirrors. This can be useful in correlating different users' bug reports that share similar symptoms. This data can be extracted from Launchpad itself.

Concept

RegressionRooter is a desktop application for analyzing dpkg.log files to help isolate regression bugs with the goal of making them easier to solve.

To use the tool, the first step is to start a new analysis session. (You may be working on multiple bugs at the same time, so we keep track of the state of each analysis session separately.) The user can input a description of the problem or a bug report # at this point. They also need to select range of datetimes when they believe they first started noticing the regression; a fuzzy confidence score can be specified for these dates. Optionally they can specify one or more package names/regex's they think may be related; a fuzzy confidence score can be entered here too.

The tool then parses /var/log/dpkg.log* files and builds a table of package changes. At this time it also calculates a rough weighting score for each package based on whether it was updated inside the date range and the confidence score. Packages which match the regex's the user provided have their score increased by an amount commensurate with their confidence score. This table is saved as a file (format should be textual such as .txt, .csv, .json, or .xml).

Next, the user has a choice of steps to take. RegressionRooter can assist in filing a bug for the user by suggesting a package name to file against, and invoking apport to submit a bug report; along with the other standard info

X/Blueprints/RegressionRooter (last edited 2012-05-29 21:24:53 by static-50-53-79-63)