ErrorTracker

Revision 5 as of 2011-07-05 14:02:48

Clear message

Rationale

To help Ubuntu reach a standard of quality similar to competing operating systems, developers should spend less time asking for information on individual bug reports, and more time fixing the bugs that affect users most often.

To determine which bugs those are, we should collect crash and hang reports from as many people as possible, before and after release. This means not requiring people to sign in to any Web site, enter any text, submit hundreds of megabytes of data, receive e-mail, or do anything more complicated than clicking a button. An automated system should then analyze which problems are caused by the same bug. If developers need more information about a particular kind of crash, they should be able to configure the system to automatically retrieve that information when the problem next occurs.

Statistics collected by Microsoft show that a bug reported by their Windows Error Reporting system “is 4.5 to 5.1 times more likely to be fixed than a bug reported directly by a human”, that fixing the right 1 percent of bugs addresses 50 percent of customer issues, and that fixing 20 percent of bugs addresses 80 percent of customer issues.

Prior art

Windows Error Reporting is probably the most advanced crash reporting system. As described in “Debugging in the (very) large: Ten years of implementation and experience” (PDF), it uses progressive data collection, where developers can request more than the “minidump” if necessary to understand particular problems. It also automatically notifies users if a software update fixes their problem. And hardware vendors can see crash reports specific to their hardware.

windows-app-progress.png windows-app.png windows-os.png

Mac OS X has a CrashReporter system that submits crash data to Apple. As described in Technical Note TN2123, “There is currently no way for third party developers to access the reports submitted via CrashReporter”.

mac-app.png mac-plugin.png mac-os.png mac-hang.png

As a result, some Mac software developers have created their own crash tracking systems, such as Adobe’s and Adium’s.

Mozilla uses Breakpad to collect and submit minidumps on the client side, and Socorro to analyze and present them on the server side. Anyone can access crash data at crash-stats.mozilla.com.

Client design

When there is an error, an alert should appear with text and buttons depending on the situation.

The problem can be reported

Your admin has blocked problem reporting

Part of the OS crashes

An application thread crashes

(no alert shown)

An application crashes

An application hangs

If you choose “Report…”, a secondary dialog should appear on top of the alert.

If you choose “Send”, the “Send more information automatically if requested” checkbox and “Send” button should become insensitive, and the “Privacy Policy” button should be replaced by a progress bar extending from the left margin to just before the “Cancel” button. If you have left “Send more information automatically if requested” checked, progress of the progress bar should be allocated amongst the subtasks of sending the initial report, asymptotically waiting for analysis, and sending any further information requested.

Once the problem report is cancelled or completed, the secondary dialog should close. In the primary alert, if the problem report was completed, “You can help fix the problem by submitting an error report.” should change to “Thanks for reporting this problem.”. If you subsequently click “Report…” again, the “Send” button should be insensitive.

Future work: If a software update is known to fix the problem, replace the primary alert with the software update alert (or progress window, depending on the update policy), with customized primary text.

Server design