FixingCrashes

Revision 1 as of 2009-11-21 21:39:10

Clear message

I often think with bugs about the "Payoff Ratio" - how much time it will take me, and how much benefit it brings to users. Crash bugs have a really nice payoff ratio. Once you learn the techniques they don't take long to solve, and since X crashing is a major drag for users experiencing it, the fix brings a nice juicy chunk of value to Ubuntu.

It looks like I probably won't have as much time to focus on these going forward since other projects demand attention, but I thought I'd walk you through how the work is done. Maybe you will find these bugs as rewarding to work on as I have, and can help make the contributions that make Ubuntu's X as robust and reliable as it ought to be.

"How much do I need to learn?"

To fix X crashes, you don't really need to know much about X.org, believe it or not. But you do need to have a working knowledge of C and *nix. Most especially, you need to have a solid understanding of how pointers work in C, and how to create patches using diff. If you don't, hit google and come back when you're ready.

In this tutorial, I'll walk you though how to understand backtraces, common kinds of pointer failures that cause crashes, and where to locate the crashing code in the X codebase. I'll go through how to code up a fix, and how to prepare and submit patches to ubuntu and upstream, and how to put your patch into a PPA for others to test. If you know some of this already, great!

"What's my payoff?"

The main payoff here is intrinsic - you know your effort will directly make X.org more robust, which will make user experience with Linux more pleasurable. From a user perspective, an X crash is just as severe as a kernel crash, and it plants that seed of worry, "Maybe I can't depend on this operating system..." Your fix kills that seed before it sprouts.

In addition, once you've mastered this technique, you'll find it helps directly towards fixing crashes in just about any Ubuntu software. X is a bit more sophisticated than the average piece of software, so if you can fix a crash in X, you can probably learn to fix a crash in anything.

Anatomy of a backtrace

Let's dive in with both feet. The meat and potatoes of crash fixing is backtraces, so lets take a look at a typical one:

A backtrace is essentially a snapshot made at the point of crash which shows the function that the crash occurred in at the top, then the function that called that function, and then the function that called *that* function, and so on up to the top of the stack. Usually the main() routine is the top of the stack, but not always.

Where to find crash bugs

Types of X crashes

For sake of our understanding, let's split all the different types of X crashes into three broad categories:

  1. Naughty pointers
  2. Bad software or hardware configurations
  3. Corrupted X logic

That's roughly the order of "easiness to fix" too. We'll be mostly focusing on #1 in this guide, since those are the kind of bugs that can be fixed with hardly any X.org know-how. But let's discuss the other two just enough so you can spot the difference, so you don't get in over your head with "hard" crash bugs.

Bad software configuration crashes typically happen because the user has either some weird combination of software package versions. For instance, maybe they are running newish xserver with an ancient kernel. A really common case is where they are running virtualbox and have installed video drivers compiled against the wrong version of X. Another common case is where they have previously installed the -nvidia or -fglrx driver, perhaps uninstalling the driver but not completely purging it; this can result in really weird crashes since they still have some proprietary bits floating around trying to handle function calls they aren't able to handle. In fact, if you see any evidence in the backtrace or bug report of the user having installed a proprietary driver, just move on to another bug report - even in a best case situation where you could find the cause of the crash, since the proprietary drivers are closed source you wouldn't be able to make a patch to fix it.

Bad hardware configurations include running on extremely old hardware, extremely new hardware, or hardware being operated in extreme conditions like overheating. Often the work to solve these

Here's a checklist of things to look for:

  • Description indicates proprietary kernel module? (-fglrx, -nvidia, -psb video drivers)
  • Description shows old kernel version, or a non-standard kernel like the -rt, -pae, server kernel, etc.
  • Description shows VirtualBox in use

  • XorgConf.txt shows any driver in use other than {intel, radeon, ati, nv, nouveau} or any "weird" settings

While bugs that match the above conditions are certainly legitimate issues, they're going to need some understanding of X.org beyond the scope of this tutorial, so just skip these bugs.

Understanding the crash

Coding a fix

Preparing the patch

  • Creating the patch
  • Packaging the patch
  • Creating a PPA

Soliciting testing

  • Requesting user test PPA
  • Sending upstream for comment

Appendix A: Apport crash hooks

==