Debugging

Differences between revisions 23 and 24
Revision 23 as of 2007-11-23 13:49:11
Size: 26541
Editor: 85-211-145-61
Comment: more sane TOC format
Revision 24 as of 2008-02-02 18:52:41
Size: 31227
Editor: c-67-168-235-241
Comment:
Deletions are marked like this. Additions are marked like this.
Line 331: Line 331:
TODO: Process for adding these changes Xorg crashes often tend to be hardware specific, as do issues with TV
out and resolution detection failures.

Because they are hardware specific, it's required to provide your hardware
identifiers. For video cards, this is the PCI ID, which can be found via
`lspci -vvnn`. For monitors, you should provide the exact model id and
manufacturer; EDID information (from `ddcprobe` or via read-edid by
`get-edid | parse-edid`) can also sometimes be useful.
Line 336: Line 343:
Few bugs are truly random; usually this just means more data is needed.
Things to consider:

 * Resource utilization over time
 * Specific to one piece of hardware? If so, is that HW faulty?
 * How is the system being used when it fault occurs? If it's idle,
   could it be a screensaver, power savings, or something?
 * Fluctuating Network/power conditions?
Few bugs are truly random; usually this just means more data is needed. In any case, it's nearly impossible to "troubleshoot-by-proxy" Xorg bugs that occur randomly, so it's important that the reporter do some extra work to either a) reduce it to a reproducible case, or b) identify the location in the code where the fault occurs by following other recommendations in this document (e.g. if it is crashing, use the directions for obtaining a backtrace).

Here are some tips for turning a random issue into a deterministic one:

 * Does it seem to come on only after running the screensaver? Or perhaps after resuming from suspend, hibernate, or other power savings mode?
 * Examine resource utilization over time - could it be triggered by a high memory or cpu load?
 * When you experience it, write down everything you did over the last 5 minutes or so; then, the next time it occurs, look at the list and see if you were doing some of those same things. Repeat those actions and see if you can trigger the problem.
 * If you boot a newer or older Ubuntu LiveCD on the same hardware, do you see the same issue?
 * If there is an alternate video driver available for your hardware, can you recreate the problem with that driver?
 * Could it be caused by fluctuating Network/power conditions?
Line 348: Line 357:
TODO
Many video bugs have already been reported, are well-known, and require significant architectural work to be done upstream in order to resolve them. So, before reporting a bug, please review existing bug reports to see if the issue is already known. Many of the known issues relate to use of Xv and/or compiz, so typical workarounds include not using one or the other or both.

If you are finding your issue is not already reported, first identify which Xorg extensions are involved. You can isolate the issue by varying some of these (e.g. trying with or without Compiz, comparing with XAA vs. EXA, enable/disable DRI, etc.) and by testing out different options in your video player (like whether to use Xv).
Line 353: Line 363:
TODO Many of the open source Xorg drivers do better with 2D than 3D, so issues are not uncommon. Unfortunately, this also means that most "solutions" will involve switching to one of the proprietary binary drivers, or to open but experimental 3D drivers.

In general, these issues need to be forwarded upstream. Gather as much supporting evidence in the form of screenshots/screencasts, log files, `lspci -vvnn` and detailed steps to reproduce the issue.

If you are seeing the issue with a proprietary application such as a commercial game, first try to reproduce the issue with various open source games or tools such as glxgears; your bug will be much more likely to gain attention upstream if it can be demonstrated with software that upstream developers can easily get their hands on.
Line 358: Line 372:
TODO For general performance degradation, you can often isolate these issues by experimenting with different driver options, such as switching between XAA and EXA, turning DRI on or off, or using an alternate driver. Many of these issues will already be well-known, so be sure to search the bug tracker before reporting. They're also often issues that are not trivially solved, so you may need to either live with the issue or use a workaround (such as no Compiz) for a while.

For performance regressions in specific situations, first try to identify exactly when the issue started happening; try reverting to older versions of your video driver (perhaps by testing different alpha releases of Ubuntu). If you can narrow down the range of driver versions where the issue occurred, this can help a great deal in identifying the patch that caused the regression.

If the issue has been present for a long while, and is specific to a particular application, consider reporting the issue against that application. Even if it is a driver limitation at core, sometimes the application developers will be able to work around it on their end more easily than it can be handled in the driver.
Line 394: Line 412:
These issues can be narrowed down by checking a few things:

 * Does xorg.conf have the correct values? If so, then something is
   wrong in how xserver is interpreting them. Review Xorg.0.log.

 * Is the hardware new? If so, is it's pciid registered properly?

 * xresprobe - Is ddcprobe outputting the right parameters? Is
   xresprobe selecting the correct one from this set?
These sorts of issues used to be quite common in Feisty and earlier versions of Ubuntu, but should be quite rare these days. However, they do happen from time to time.

Today, the issues are generally due to lack of hardware support in Xorg's hardware autodetection algorithms; this happens usually only if your hardware is very new, very old, or very obscure. You can work around this by manually specifying the configuration in xorg.conf, such as the device driver, monitor vrefresh and hsync, modes, and so on. In a worst case, you may also need to generate your own modeline for the hardware.

In any case, report the issue with example Xorg.0.log files and include detailed information including your monitor manufacturer and model number, and your video card details via `lspci -vvnn`. Also provide the failed xorg.conf configuration and your working xorg.conf after manual configuration. This information can be used to improve Xorg's autodetection, so it'll be able to properly detect and configure your hardware in the future.

X and Ubuntu

The X Windows System is a critical component in the Ubuntu operating system. X is not without its bugs, but fortunately debugging X issues is not rocket science.

The vast majority of Ubuntu X issues fall into one of several distinct categories, and based on the way they manifest, there are several different tactics that can be employed in a nearly paint-by-numbers fashion to isolate them.

Even non-developers can help! The goal of this handbook is to give folks a toolset for rendering these bugs easily solveable. By making Ubuntu's X strong, we can help drive Open Source to world domination!

Bug Reporting

The lifecycle of a bug report begins, unsurprisingly, with the preliminary report. How a bug is initially reported can have a huge effect on how it's handled and how quickly it gets fixed.

Choosing a Good Title

Your title should communicate two things: The symptom you're seeing, and whatever is unique or unusual about your system. Otherwise, your bug may not get proper attention.

Examples:

BAD:

Crazy screen issues on boot

BAD:

Multiple problems with CD today

BAD:

Randomly doesn't work

GOOD:

[Feisty] Screen briefly corrupts during boot with -nv (NVidia 6100)

GOOD:

[Hardy Alpha-3] Alt-CD (only) selected wrong driver (Matrox / BenQ FP91+)

GOOD:

[Gutsy] Periodic crashes w/ high CPU on Dell Latitude D505 (-intel 855GM)

GOOD:

[Dapper,Edgy] Wrong default refresh rates on 16:10 LCD panels

Do's and Don't's

DON'T:

Assume "they must already know about this"

DO:

Look for existing bug reports that match your problem

DON'T:

Assume a "similar" bug is exactly what you're seeing

DO:

File a new bug, but mention the ID's of all bugs that sound similar. Someone can dupe them together later.

DON'T:

Add "me too" responses. Wastes everyone's time.

DO:

Add missing data (photos, logs) to add to an existing bug's "knowledge base". Or if you just wish to be notified, then Subscribe yourself to the bug.

DON'T:

Post bugs with only a brief description of the problem

DO:

Post relevant logs, config files, and data (see table below) ALWAYS ATTACH YOUR /var/log/Xorg.0.log

DON'T:

Assume "everyone" is seeing this same bug

DO:

Consider what is unique about your system

DON'T:

Assume others will "just know" how the bug occurs

DO:

Itemize the exact steps that result in the issue. Can you reproduce it at will?

DON'T:

Fire and forget. Abandoned bugs rarely get fixed.

DO:

Follow up on your bug from time to time, even if it seems ignored. Report if the issue goes away or remains when new Ubuntu's come out.

What to Include in Bug Reports

Problem class:

Things to Include:

General X bug

Description of problem

Paste in output of lspci -nn | grep VGA

Attach /etc/X11/xorg.conf

Attach /var/log/Xorg.0.log

Attach output of lspci -vvnn

Wrong resolutions, refresh rates, or monitor specs

Resolution, rate, or other parameter expected

Resolutions, rates, or other parameters actually obtained

/etc/X11/xorg.conf

/var/log/Xorg.0.log

output of lspci -vvnn

output of sudo ddcprobe

output of xrandr

Wrong font dpi or size

Are you running GNOME, KDE, XFCE, or ...?

Affected (and unaffected) applications

/var/log/Xorg.0.log

output of sudo ddcprobe

Screenshot showing font differences

X crash, lockup, freeze, exit, or doesn't start/shutdown

Detailed description of problem

List any versions you tried that did not have this issue

Detailed list of steps to reproduce

How complete is the X failure? BR + Does ctrl+alt+f1 take you to a console? BR + Does ctrl+alt+backspace restart X? BR + Does mouse pointer still move? BR + Does the keyboard LED come on when hitting the CAPSLOCK key?

/etc/X11/xorg.conf

/var/log/Xorg.0.log

/var/log/Xorg.0.log.old

~/.xsession-errors

output of lspci -vvnn

output of cat /proc/acpi/video/*/DOS

output of sudo cat /proc/acpi/dsdt

Keyboard, touchpad, and mouse issues

Description of the problem

/var/log/Xorg.0.log

output of xprop -root

output of gconftool-2 -R /desktop/gnome/peripherals

Screen display corruption

Photo of the screen

Description of the problem

Does it also occur if DRI is disabled?

/var/log/Xorg.0.log

Bad video playback

/etc/X11/xorg.conf

/var/log/Xorg.0.log

output of lspci -vvnn

Bug Triage

Ubuntu receives a huge amount of bug reports, many of which are important and valid issues needing attention. Even so, nearly all X bugs are initially reported without information necessary for classification and analysis. This is where the bug triaging role comes in.

Bug triaging for Ubuntu's Xorg components does not require any particular expertise with X, just regular Linux know-how should be sufficient. As a bug triager, your role is twofold: First as a coach to help bug reporters in maximizing their chances of getting the bug addressed by providing complete information, and second as a filter to help developers focus their time on important and/or easy-to-fix bugs.

After initially reported, a bug is reviewed and several basic things are checked by the bug triager:

1. Is it definitely an X bug? Sometimes things get misfiled, and sometimes reports are just invalid, or are really just support requests and should be directed to Launchpad Answers instead. If unsure, leave it as is.

2. Is it clearly a dupe of an already known bug? Ideally reporters should do a cursory scan of existing bug reports to see if it's obviously already in the system, but not all reporters do. If unsure, don't dupe it - someone can handle this later.

3. Is there at least the basic minimum amount of data present? If not, mark it Incomplete and see below for a table of what kinds of files and command output is needed. Once you think the basic required info is present, move it to the Confirmed state.

4. Review log files for error messages or other obvious anomalies. Highlight these in the bug report, and search launchpad for other reports of that same error message. Mention these as potential dupes, or dupe them where obvious.

5. Tidy up the bug report. This may involve improving the bug's title or wordsmithing the description to clarify it. Also, make sure it has an Importance assigned to it (see below).

Since step 3 requires waiting on replies from bug reporters, you can't go through all five of the above steps in one bug triaging session. Instead, when doing bulk triage work, you can consider dividing the workflow into two types of sessions:

INITIAL TRIAGE: (steps 1->2->3)

  • Do a query for NEW bugs.
  • For each bug, review according to steps 1, 2, and 3 above
  • Post a request for more information and set to INCOMPLETE

FINAL TRIAGE: (step 3->4->5)

  • Do a query for INCOMPLETE-WITH-RESPONSE bugs.
  • For each bug, read the reporter's reply and information posted
  • If still insufficient info, ask for more info and leave INCOMPLETE
  • Otherwise, do steps 4 and 5, mark CONFIRMED, and assign an Importance

Once this basic triage work is in place, a reviewer (generally a developer or official bug master) reviews CONFIRMED bugs and just doublechecks that all the necessary stuff has been done. They then set the bug to TRIAGED state.

Bug Importance

Bug "Importance" is not the same thing as Development Priority. Importance is an indication of the severity of the issue, not an indication of when it will be fixed (although a bug's importance is a factor to consider when prioritizing).

Low: These bugs are merely cosmetic or make things inconvenient, or occur only rarely.

Medium: Most bugs are medium importance. They hamper use of the system in some fashion, sometimes requiring an inconvenient workaround or other unusual steps (like disabling hardware or software features or reverting to older versions) to get around it.

High: These are serious bugs that are preventing users from using the system, either with no known workaround or an extremely cumbersome one.

Critical: This importance level is not often used, and is saved for widespread catastrophic failures, like X failing to start for all Ubuntu users.

A bug that affects a lot of users may deserve one bump up from where it would be otherwise. A bug which was not well reported, that can't be reproduced, or that only occurs in obscure situations may have its importance bumped down one step.

The bug triager should make an attempt to set an appropriate importance, but don't worry about getting it perfect; it can be adjusted later.

Bug Priority

The priority for a bug is determined by the developers themselves, based on a variety of factors, and so the bug triager does not need to do anything with regard to priority usually.

One factor that can drive a bug to a high priority is if there is a known, tested fix for it, that simply needs integrated into the development version of Ubuntu.

There are two ways priority is indicated in Launchpad:

1. Milestones: Bugs that are assigned to a milestone will gain priority attention during that development cycle. Do not milestone bugs until after they've been fully triaged and have all necessary information available to troubleshoot them.

2. Assignees: Bugs that are assigned to a particular individual will be priorities for them to work on. Generally, ask before assigning bugs to a developer, unless you're that developer's manager.

Bug Research

For many bugs, a little googling and searching in upstream bug trackers can reveal important additional info.

1. Review all attached log files for error messages.

2. Look for other similar/duplicate bug reports to gain additional perspectives and look for obvious commonalities, like same error messages, driver, hardware, etc. Places to search:

  • google.com
  • bugs.launchpad.net/ubuntu
  • bugs.debian.org/
  • bugs.freedesktop.org/
  • ubuntuforums.org
  • If you find the same issue reported in Launchpad, mark the less complete and/or newer bug as a dupe of the other. If you find the same issue reported in debian or xorg, mark it as "Also affects project..." With Google, it often helps to include "ubuntu" in the search string. Also, you can use "site:freedesktop.org" or "site:debian.org" to narrow the search to a specific domain.

3. Try reproducing the issue, especially if you have similar hardware.

4. Look for a newer version of the package, and review its changes to see if there's a fix for this issue.

  • If so, check apt-get update; apt-cache madison $pkgname to see if the new version is already packaged. If not, ask a packager to produce a test package of the new release to test for this bug.

5. Upload any patches you run across directly to Launchpad, and be sure to tick the "patch" checkbox, so patches can be queried for later.

6. Have them try an older Ubuntu Live CD, or have them downgrade a specific package. For example, to downgrade the xserver:

         apt-get install xserver-xorg-core=2:1.3.0.0.dfsg-4ubuntu
  • If an older version fixes the issue, then possibly you can bisect things down to find a specific patch causing the issue. See the Analysis section for how to do this.

7. Unless you've been lucky and found the fix already, finish up the research phase by doing the following:

  • Summarize your findings. Restate the problem, describe progress made, outline remaining suspicions or questions.
  • If appropriate, report the bug upstream to Debian and/or Xorg, attaching all relevant files and a link to the Ubuntu bug report. Summarize the research you did, patches that were tested, and any other details that may be relevant.

Analyzing X Problems

For hard bugs the analysis phase is the most important, and most challenging part of bug work. It often requires both strength of insight and skill with code.

Depending on how the bug is behaving, there are multiple directions to investigate the issue. Here's some different approaches:

Problem manifested only recently

If the issue has been narrowed to occur only after (or before) a given point in time or software version, then it is possible to narrow in on the specific cause of the issue through a "Bi-Section" strategy.

Essentially, if you know it occurred in Version 1, but not Version 8, have a person able to replicate the issue try Version 4. If it's there, then have them try Version 6, otherwise Version 2.

If the problem is in the current Ubuntu, but not in the prior Ubuntu, it can be useful to have them test the intermediate Alpha versions of the new release.

Once you have bracketed it down to a specific version of something, you can then go through the individual patches included in that version compared with the prior one. Sometimes the patch descriptions can give a strong clue to this. If there are a number of changes, then rather than trying each patch one-by-one you may want to simply disable the latter half of patches, and bisect that way.

If you've narrowed it to an upstream version change, then you may wish to use git's bisecting functionality to assist with this.

Problem manifests only with specific configuration options

Often, a bug exhibits itself only with a specific configuration setting turned on (or off). Often, knowing that the bug appears with one option but not another provides a very strong clue for further debugging; it can also provide a short-term workaround for people until it's fixed. Here's a few options that are worthwhile to vary:

Setting

Alternate(s)

Comment

Section "Module"

Load "dri"

Disable "dri"

Many bugs (esp. on -intel) exhibit only with direct rendering enabled

Load "glx"

Disable "glx"

Section "Monitor"

HorizSync 28.0 - 80.0BRVertRefresh 48.0 - 75.0

Comment lines out

Xorg 1.3+ often can figure these out automatically (and usually better). Hardcoding these incorrectly can lead to resolution, dpi, and other mode issues

Section "Device"

Driver "..."

Install alternate driver

Option "DRI" "true"

Option "DRI" "false"

Option "VBERestore" "true"

Option "VBERestore" "false"

Option "XAANoOffscreenPixmaps" "true"

Option "XAANoOffscreenPixmaps" "false"

Section "ServerLayout"

Option "AIGLX" "true"BROr blank

Option "AIGLX" "false"

AIGLX is already enabled in the server so is on by default. Turning it off might affect performance or AIGLX-related issues.

Section "Extensions"

Option "Composite" "Enable"

Option "Composite" "Disable"

Composite is enabled by default starting with Ubuntu. It can cause issues on some systems in some conditions; this is one way to turn it off.

Problem manifests only with a particular driver

If the research found that most people with this problem were all using the same driver, then obviously it makes sense to explore it from that aspect.

Note that for most graphics hardware, there are at least two different drivers. It can be worthwhile to test the alternate driver to verify it's a driver issue.

  • NVidia: -nv (open) and -nvidia (proprietary)
  • ATI: -ati (open) and -fglrx (proprietary)
  • Intel: -intel (current) and -i810 (legacy).

Each driver has its own source code package, which can be retrieved via xserver-xorg-video-<driver>. The open source drivers also have git repositories at http://gitweb.freedesktop.org.

Resolving these issues will generally require patching the driver code, although some driver-specific issues end up requiring changes to other pieces of code, like the xserver.

The following can be added to your /etc/X11/xorg.conf to provide additional debug information:

Section "Device"
        ...
        Option "ModeDebug" "true"
        ...
EndSection

Problem manifests only with particular kind of hardware

Many issues are highly specific to a particular kind of hardware, such as only Intel 855, or only a particular monitor model. Sometimes these end up being general bugs, but often they require adding hardware-specific quirks to the driver or to xserver.

Xorg crashes often tend to be hardware specific, as do issues with TV out and resolution detection failures.

Because they are hardware specific, it's required to provide your hardware identifiers. For video cards, this is the PCI ID, which can be found via lspci -vvnn. For monitors, you should provide the exact model id and manufacturer; EDID information (from ddcprobe or via read-edid by get-edid | parse-edid) can also sometimes be useful.

Problem manifests under seemingly random conditions

Few bugs are truly random; usually this just means more data is needed. In any case, it's nearly impossible to "troubleshoot-by-proxy" Xorg bugs that occur randomly, so it's important that the reporter do some extra work to either a) reduce it to a reproducible case, or b) identify the location in the code where the fault occurs by following other recommendations in this document (e.g. if it is crashing, use the directions for obtaining a backtrace).

Here are some tips for turning a random issue into a deterministic one:

  • Does it seem to come on only after running the screensaver? Or perhaps after resuming from suspend, hibernate, or other power savings mode?
  • Examine resource utilization over time - could it be triggered by a high memory or cpu load?
  • When you experience it, write down everything you did over the last 5 minutes or so; then, the next time it occurs, look at the list and see if you were doing some of those same things. Repeat those actions and see if you can trigger the problem.
  • If you boot a newer or older Ubuntu LiveCD on the same hardware, do you see the same issue?
  • If there is an alternate video driver available for your hardware, can you recreate the problem with that driver?
  • Could it be caused by fluctuating Network/power conditions?

Problem manifests itself during video playback

Many video bugs have already been reported, are well-known, and require significant architectural work to be done upstream in order to resolve them. So, before reporting a bug, please review existing bug reports to see if the issue is already known. Many of the known issues relate to use of Xv and/or compiz, so typical workarounds include not using one or the other or both.

If you are finding your issue is not already reported, first identify which Xorg extensions are involved. You can isolate the issue by varying some of these (e.g. trying with or without Compiz, comparing with XAA vs. EXA, enable/disable DRI, etc.) and by testing out different options in your video player (like whether to use Xv).

Problem manifests itself when using 3D software (compiz, games, GL...)

Many of the open source Xorg drivers do better with 2D than 3D, so issues are not uncommon. Unfortunately, this also means that most "solutions" will involve switching to one of the proprietary binary drivers, or to open but experimental 3D drivers.

In general, these issues need to be forwarded upstream. Gather as much supporting evidence in the form of screenshots/screencasts, log files, lspci -vvnn and detailed steps to reproduce the issue.

If you are seeing the issue with a proprietary application such as a commercial game, first try to reproduce the issue with various open source games or tools such as glxgears; your bug will be much more likely to gain attention upstream if it can be demonstrated with software that upstream developers can easily get their hands on.

Problem manifests as a performance degradation issue

For general performance degradation, you can often isolate these issues by experimenting with different driver options, such as switching between XAA and EXA, turning DRI on or off, or using an alternate driver. Many of these issues will already be well-known, so be sure to search the bug tracker before reporting. They're also often issues that are not trivially solved, so you may need to either live with the issue or use a workaround (such as no Compiz) for a while.

For performance regressions in specific situations, first try to identify exactly when the issue started happening; try reverting to older versions of your video driver (perhaps by testing different alpha releases of Ubuntu). If you can narrow down the range of driver versions where the issue occurred, this can help a great deal in identifying the patch that caused the regression.

If the issue has been present for a long while, and is specific to a particular application, consider reporting the issue against that application. Even if it is a driver limitation at core, sometimes the application developers will be able to work around it on their end more easily than it can be handled in the driver.

Problem results in screen display corruption

Nearly all screen corruption issues will be due to a bug in a driver. Identify the driver and the specific steps to produce the corruption. Then run the xserver through gdb to identify the line or lines immediately prior to the corruption.

From here, things to try could include checking for invalid/undefined values, adding usleep() calls to add delay, or even disabling the lines in question.

Once a preliminary patch exists, post it to the upstream xorg list for feedback. Often they can suggest a better patch.

Problem results in X crash, lockup, freeze, or exit

In some cases, an error message will be printed before the fault; these can be used to identify where in the codebase the fault occurred, and often give an explanation as to why.

Otherwise, use gdb to get a backtrace. Once the issue is found, step through the code leading up to the line where the fault occurred. Look for invalid/undefined values, or questionable logic. Try disabling the line or lines where the fault occurred, adding usleep() before it, or etc.

Once a preliminary patch exists, post it to the upstream xorg list for feedback. Often they can suggest a better patch.

Problem involves wrong resolutions, refresh rates, or monitor specs

These sorts of issues used to be quite common in Feisty and earlier versions of Ubuntu, but should be quite rare these days. However, they do happen from time to time.

Today, the issues are generally due to lack of hardware support in Xorg's hardware autodetection algorithms; this happens usually only if your hardware is very new, very old, or very obscure. You can work around this by manually specifying the configuration in xorg.conf, such as the device driver, monitor vrefresh and hsync, modes, and so on. In a worst case, you may also need to generate your own modeline for the hardware.

In any case, report the issue with example Xorg.0.log files and include detailed information including your monitor manufacturer and model number, and your video card details via lspci -vvnn. Also provide the failed xorg.conf configuration and your working xorg.conf after manual configuration. This information can be used to improve Xorg's autodetection, so it'll be able to properly detect and configure your hardware in the future.

Problem involves wrong font dpi or size

TODO

Problem involves buggy EDID from monitor

If the monitor is clearly advertising an incorrect mode (such as not advertising a preferred mode), a quirk can be added to the xserver to prefer a specific mode.

TODO: What's the code change to do this?

Problem occurs on screen mode change (startup/shutdown, hibernate, suspend, screensaver, power save, or tty switch)

A large class of graphics bugs occur when changing modes, such as:

  • On resume from hibernate
  • On resume from suspend
  • On closing (or opening) laptop lid (independently of hibernate/suspend)
  • When screensaver comes on
  • When a GL screensaver comes on
  • Right after typing in password on the login screen
  • When switching to a tty console (e.g. ctrl-alt-f1)

Some reporters see issues on only one of the above situations, but many see it on a combination. If you experience one kind of crash from the above list, try some of the others to identify if you're seeing a single-mode or multiple-mode issue.

As well, the symptoms and characteristics of the failure can vary:

  • Screen is blank (like it's turned off)
  • Screen is black but being drawn (like it's on, but on a black screen)
  • Mouse cursor is visible and can be moved, but clicking does nothing
  • Mouse cursor is visible but won't move
  • Despite the display issue, the system works fine (startup sounds play, can ssh in, etc.)
  • Xorg crashes, returning user to the login screen
  • System seems locked up - can be pinged, but cannot ssh in nor run any programs
  • Xorg seems frozen - Ctrl-alt-backspace does not work
  • System seems frozen - Caps lock does not work
  • Occurs every time the mode changes
  • Occurs every other time the mode changes
  • Occurs only some times when the mode changes
  • Occurs only when there is a mode change after the system has been idle some period of time

Typically the failure will involve some combination of the above symptoms. When two reporters have the exact same set of symptoms for a given set of mode change failures, and are using the same chipset family, it's a good bet that they're experiencing the same bug. If the symptoms don't match up exactly, and they have the same hardware, then it's likely they aren't having quite the same bug, and the fix for one will probably not fix the other. On the other hand, sometimes a bug will exhibit different symptoms on different kinds of hardware. So on i915 it may show up as a system freeze, whereas on i945 it just crashes X, but a single bug fix will solve both issues.

Analysis and Workarounds

There are a number of tricks for working around these issues. In some cases they're good enough to address the user's needs. In other cases it simply changes the symptoms, or decreases the frequency of the issue. Sometimes they make things worse! But regardless, this analysis can provide very useful evidence to know which workarounds affect the issue, as that gives strong hints as to where the bug lays.

  • Disable Compiz
  • Disable all power management
  • Force a different mode switch
    • Log out of X, and log into the console, and try suspend/resume (or hibernate, or whatever) from there
      • If it does not come up ok, the failure is in the kernel rather than X
      • Test if caps lock works; whether this works determines how bad the kernel failure is
    • Switch virtual consoles and back (alt-ctrl-f1, alt-ctrl-f7 or f8 or f9)
    • If laptop, close lid and reopen
    • Attach another external monitor
  • Try installing an alternate driver (like -i810 instead of -intel), or an older or newer version of the driver
  • Adjust ACPI settings
    • In /etc/default/acpi-support, set SAVE_VBE_STATE=false
    • In /etc/default/acpi-support, set SAVE_VIDEO_PCI_STATE=true
    • In /etc/default/acpi-support, set POST_VIDEO=false
    • In /etc/default/acpi-support, set USE_DPMS=true
    • Add the VBERestore option in Devices section of /etc/X11/xorg.conf
    • Remove the VBERestore option in Devices section of /etc/X11/xorg.conf
  • Adjust kernel ACPI settings in /boot/grub/menu.lst
    • Add to grub kernel parameters: "acpi_sleep=s3_bios,s3_mode"
    • Add to grub kernel parameters: "notsc"
  • Remove ACPI components to narrow down which causes the failure
    • sudo rmmod video

    • Check the contents of cat /proc/acpi/video/*/DOS before and after removing video
    • This can help identify if the bug is in the driver / BIOS
  • BIOS settings
    • Change "Internal Graphics Mode Select" BIOS setting from 1MB to 8MB

Problem involves missing support for some keyboard keys

TODO

Problem involves missing support for mouse or touchpad functions

TODO

Problem involves GUI application that crashes with an X error message

TODO

Problem may be due to prior installs of a binary driver

Try:

 dpkg -l '*fglrx*'

and

 locate fglrx

to see if there is still some proprietary bits around causing problems.

Debugging Memory Issues

top xrestop

Limiting the ram X uses to 80% to prevent memory leaks via ulimit -m in the X startup script

Troubleshooting Common Error Messages

Common Intel Driver Error Messages

   (II) intel(0): [drm] removed 1 reserved context for kernel
   (II) intel(0): [drm] unmapping 8192 bytes of SAREA 0xf89c1000 at 0xb7b65000

These appear only on system shutdown, and generally don't indicate an issue.

   (EE) intel(0): I830 Vblank Pipe Setup Failed 0

This is because the X driver calls the DRM_I915_SET_VBLANK_PIPE ioctl after de-initializing the DRM. It should be harmless.

   (II) AIGLX: Suspending AIGLX clients for VT switch
   Error in I830WaitLpRing(), timeout for 2 seconds

This is a generic error indicating that the GPU locked up. It could be caused by a variety of issues.

X/Debugging (last edited 2016-01-10 22:13:08 by penalvch)