Troubleshooting

Differences between revisions 35 and 37 (spanning 2 versions)
Revision 35 as of 2009-03-10 23:06:43
Size: 20513
Editor: WNDR3300
Comment:
Revision 37 as of 2009-03-15 03:00:14
Size: 1223
Editor: pool-71-117-254-52
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
 * [[X/Troubleshooting/SingleColorScreen|Single color (brown/blue/etc.) screen on startup]]
Line 18: Line 19:
== Problem manifested only recently ==

If the issue has been narrowed to occur only after (or before) a given
point in time or software version, then it is possible to narrow in on
the specific cause of the issue through a "Bi-Section" strategy.

Essentially, if you know it occurred in Version 1, but not Version 8,
have a person able to replicate the issue try Version 4. If it's there,
then have them try Version 6, otherwise Version 2.

If the problem is in the current Ubuntu, but not in the prior Ubuntu, it
can be useful to have them test the intermediate Alpha versions of the
new release.

Once you have bracketed it down to a specific version of something, you
can then go through the individual patches included in that version
compared with the prior one. Sometimes the patch descriptions can give
a strong clue to this. If there are a number of changes, then rather
than trying each patch one-by-one you may want to simply disable the
latter half of patches, and bisect that way.

If you've narrowed it to an upstream version change, then you may wish
to use git's bisecting functionality to assist with this.

== X starts up with the wrong video driver (e.g. vesa) ==

The current X server attempts to auto-configure itself, which works most of the time, but occasionally will result in an improperly selected video driver. In some cases it may not be able to determine the right driver at all, and will default to -vesa.

The lookup of pci id to driver name is done in videoPtrToDriverName() and chooseVideoDriver() in hw/xfree86/common/xf86AutoConfig.c. This uses pci ids listed in .ids files in /usr/share/xserver-xorg/pci/. These files are generated by the makefile of the appropriate video driver.

If you have a very unusual video driver (like a new one, or one supplied by a 3rd party vendor), you may need to patch videoPtrToDriverName() to add it.

Or, if you have a new-ish video card but are using a fairly standard driver, you may simply need to update the pci id list. Try editing the appropriate file in /usr/share/xserver-xorg/pci/ to add it; if that works, then simply send in a bug report requesting addition of that pci id to the driver's code, and a developer can take care of adding the appropriate pci id to the driver's source (for example, with -intel it goes in src/common.h, while -ati has it in src/ati_pciids_gen.h).


== Problem manifests itself during video playback ==

Many video bugs have already been reported, are well-known, and require significant architectural work to be done upstream in order to resolve them. So, before reporting a bug, please review existing bug reports to see if the issue is already known. Many of the known issues relate to use of Xv and/or compiz, so typical workarounds include not using one or the other or both.

If you are finding your issue is not already reported, first identify which Xorg extensions are involved. You can isolate the issue by varying some of these (e.g. trying with or without Compiz, comparing with XAA vs. EXA, enable/disable DRI, etc.) and by testing out different options in your video player (like whether to use Xv).


== Problem manifests as a performance degradation issue ==

For general performance degradation, you can often isolate these issues by experimenting with different driver options, such as switching between XAA and EXA, turning DRI on or off, or using an alternate driver. Many of these issues will already be well-known, so be sure to search the bug tracker before reporting. They're also often issues that are not trivially solved, so you may need to either live with the issue or use a workaround (such as no Compiz) for a while.

For performance regressions in specific situations, first try to identify exactly when the issue started happening; try reverting to older versions of your video driver (perhaps by testing different alpha releases of Ubuntu). If you can narrow down the range of driver versions where the issue occurred, this can help a great deal in identifying the patch that caused the regression.

If the issue has been present for a long while, and is specific to a particular application, consider reporting the issue against that application. Even if it is a driver limitation at core, sometimes the application developers will be able to work around it on their end more easily than it can be handled in the driver.

If the performance issue occurs with 3D apps and/or with Compiz enabled, but not with 2D, this means your system's 3d acceleration (DRI) is not working properly. 3D can be accelerated either in hardware or software; if you expect hardware acceleration but your driver is falling back to software, that could be the problem. There have been some cases where EXA attempted to do partial hardware acceleration - which is worse than if it just did it all in software; in this case forcing software rendering will help the performance problem (though obviously it won't give the best performance your hardware can do). See http://cworth.org/talks/lca_2008/ for a thorough discussion.

During Hardy development, we ran into some severe performance problems with EXA on -intel, which resolved by setting 'Option "MigrationHeuristic" "greedy"' on by default, which seems to work well for everyone, but experimenting with other MigrationHeuristic values may be of use if you see performance loss with EXA but not with XAA. Option "ExaNoComposite" "true"/"false" is another option to play with.

== Problem results in screen display corruption ==

Nearly all screen corruption issues will be due to a bug in a driver.
Identify the driver and the specific steps to produce the corruption.
Then run the xserver through gdb to identify the line or lines
immediately prior to the corruption.

From here, things to try could include checking for invalid/undefined
values, adding usleep() calls to add delay, or even disabling the lines
in question.

Once a preliminary patch exists, post it to the upstream xorg list for
feedback. Often they can suggest a better patch.


== Brown screen after entering login credentials ==

The gdm login screen appears and seems to let you log in, but then the system locks up after that, typically with a brown screen.

What this means is that X by itself works (the gdm login screen is a simple X session), but something else caused X to fail. You can take several steps to help narrow in on what component caused the failure:

a) Try disabling 3D Effects (aka compiz). One way is to do `sudo chmod a-x /usr/bin/compiz`. If you can log in properly, then the bug is likely to be an error in the 3D functionality of '''your video driver'''.

b) Try using the "recovery mode" option in grub, then drop to a command line and try launching X via "startx". If that works properly, it may suggest an issue in '''gdm'''.

c) Try creating a new user and log in as them, especially if you've gone through an upgrade. If this works, then possibly something in your config files is causing the issue.

d) Try switching to the "vesa" driver in your xorg.conf. If this works, then the issue is likely to be either your video driver or some advanced functionality in the X server (such as DRI).


== Problem results in X crash, lockup, freeze, or exit ==

In some cases, an error message will be printed before the fault; these
can be used to identify where in the codebase the fault occurred, and
often give an explanation as to why.

Otherwise, use gdb to get a backtrace. Once the issue is found, step
through the code leading up to the line where the fault occurred.
Look for invalid/undefined values, or questionable logic. Try disabling
the line or lines where the fault occurred, adding usleep() before it,
or etc.

Once a preliminary patch exists, post it to the upstream xorg list for
feedback. Often they can suggest a better patch.


== Problem occurs on resume, logout, screensaver, power savings mode, tty switch, etc. ==

A large class of graphics bugs occur when changing modes, such as:

 * On resume from hibernate
 * On resume from suspend
 * On closing (or opening) laptop lid (independently of hibernate/suspend)
 * When screensaver comes on
 * When a GL screensaver comes on
 * Right after typing in password on the login screen
 * When switching to a tty console (e.g. ctrl-alt-f1)

Some reporters see issues on only one of the above situations, but many see it on a combination. If you experience one kind of crash from the above list, try some of the others to identify if you're seeing a single-mode or multiple-mode issue.

As well, the symptoms and characteristics of the failure can vary:

 * Screen is blank (like it's turned off)
 * Screen is black but being drawn (like it's on, but on a black screen)
 * Mouse cursor is visible and can be moved, but clicking does nothing
 * Mouse cursor is visible but won't move
 * Despite the display issue, the system works fine (startup sounds play, can ssh in, etc.)
 * Xorg crashes, returning user to the login screen
 * System seems locked up - can be pinged, but cannot ssh in nor run any programs
 * Xorg seems frozen - Ctrl-alt-backspace does not work
 * System seems frozen - Caps lock does not work
 * Occurs every time the mode changes
 * Occurs every other time the mode changes
 * Occurs only some times when the mode changes
 * Occurs only when there is a mode change after the system has been idle some period of time

Typically the failure will involve some combination of the above symptoms. When two reporters have the exact same set of symptoms for a given set of mode change failures, and are using the same chipset family, it's a good bet that they're experiencing the same bug. If the symptoms don't match up exactly, and they have the same hardware, then it's likely they aren't having quite the same bug, and the fix for one will probably not fix the other. On the other hand, sometimes a bug will exhibit different symptoms on different kinds of hardware. So on i915 it may show up as a system freeze, whereas on i945 it just crashes X, but a single bug fix will solve both issues.

'''Analysis and Workarounds'''

There are a number of tricks for working around these issues. In some cases they're good enough to address the user's needs. In other cases it simply changes the symptoms, or decreases the frequency of the issue. Sometimes they make things worse! But regardless, this analysis can provide very useful evidence to know which workarounds affect the issue, as that gives strong hints as to where the bug lays.

 * Disable Compiz
 * Disable all screen savers and power management; do some or all of the following:
   * Disable/remove screensaver. To remove:
     * rpm -ev gnome-screensaver
     * rpm -ev gnome-power-manager
   * Disable power management
   * Verify apmd is not running
   * xset s off
   * setterm -blank 0 -powersave off -powerdown 0
   * Check BIOS for any power management options
   * Configure X.org to turn off power management
{{{
Section "ServerFlags"
 Option "BlankTime" "0"
 Option "StandbyTime" "0"
 Option "SuspendTime" "0"
 Option "OffTime" "0"
EndSection
}}}
 * Force a different mode switch
   * Log out of X, and log into the console, and try suspend/resume (or hibernate, or whatever) from there
     * Running /etc/acpi/hibernate.sh or /etc/acpi/sleep.sh should do it
     * If it does not come up ok, the failure is in the kernel rather than X
     * Test if caps lock works; whether this works determines how bad the kernel failure is
   * Switch virtual consoles and back (alt-ctrl-f1, alt-ctrl-f7 or f8 or f9)
   * If laptop, close lid and reopen
   * Attach another external monitor
 * Try installing an alternate driver (like -i810 instead of -intel), or an older or newer version of the driver
 * Adjust ACPI settings
   * In /etc/default/acpi-support, set SAVE_VBE_STATE=false
   * In /etc/default/acpi-support, set SAVE_VIDEO_PCI_STATE=true
   * In /etc/default/acpi-support, set POST_VIDEO=false
   * In /etc/default/acpi-support, set USE_DPMS=true
   * Add the VBERestore option in Devices section of /etc/X11/xorg.conf
   * Remove the VBERestore option in Devices section of /etc/X11/xorg.conf
 * Adjust kernel ACPI settings in /boot/grub/menu.lst
   * Add to grub kernel parameters: "acpi_sleep=s3_bios,s3_mode"
   * Add to grub kernel parameters: "notsc"
 * Remove ACPI components to narrow down which causes the failure
   * `sudo rmmod video`
   * Check the contents of cat /proc/acpi/video/*/DOS before and after removing video
   * This can help identify if the bug is in the driver / BIOS
 * BIOS settings
   * Change "Internal Graphics Mode Select" BIOS setting from 1MB to 8MB


== Problem involves missing support for mouse, touchpad, or gamepad functions ==

Starting with Intrepid, Ubuntu now uses input-hotplug for most input devices. This section only addresses input-hotplug issues; if you're using an older version of the distro, please look elsewhere for troubleshooting directions.

For most devices you will need both a kernel driver and an X driver, in order for X to interpret its events correctly.

=== Kernel Driver ===
Use {{{dmesg}}} and {{{lsmod}}} to determine if the device is detected and the kernel module loaded. If not, you can load it using {{{modprobe}}}.

To determine the current kernel support level for your device, and to identify the kernel driver to use, here are some sites which may be of use:

* [http://www.qbik.ch/usb/devices/showdev.php?id=4410 Linux USB devices]
* [http://kmuto.jp/debian/hcl/ Debian device driver check page]
* [http://hardware4linux.info/ hardware4linux]
* [http://www.ubuntuhcl.org/ Comprehensive Ubuntu Hardware Database]

If a kernel driver is not available in Ubuntu for your device, but one does exist somewhere, please file a wishlist bug against the "linux" product requesting it be added.

If a kernel driver is available in Ubuntu, but didn't get loaded automatically when you plugged in your device, file a bug against the "linux" product about this.

=== X Driver ===

For most devices, the {{{evdev}}} or {{{evtouch}}} drivers should be sufficient. These are used by default by X if no driver is listed in your xorg.conf, so if you have a different driver listed, try removing it or hardcode it to evdev.

In some cases, your device may require an X input driver other than {{{evdev}}}. If this is the case (e.g., it works when you hardcode the device in your xorg.conf), but it doesn't work automatically, the next step is to make sure the driver is specified in your HAL configuration.

=== X Input Device Configuration ===

If the right X driver is being selected for your device, but some of the settings are not quite right, you can adjust the settings using the HAL fdi files.

For more information on configuring HAL fdi files for X input devices, please see [[X/Config]]


== Problem involves GUI application that crashes with an X error message ==

Some GUI applications print out an X error message when they crash, which can make it seem like the issue is in Xorg. In fact, while the error may indeed be generated by X code, most of the time it's due to the application making the call incorrectly or under invalid conditions. So these bugs need to be reported against the application, not X.

For GNOME applications, the X error can be caught using the following approach:

{{{
   gdk_error_trap_push ();

   /* ... Call the X function which may cause an error here ... */

   /* Flush the X queue to catch errors now. */
   gdk_flush ();
   if (gdk_error_trap_pop ())
     {
       /* ... Handle the error here ... */
     }
}}}

To determine the X error, it is useful to have a backtrace that includes both Xorg and GNOME debug symbols. See DebuggingProgramCrash for info on adding ddeb sources to your config. Then install libgtk2.0-0-dbgsym, and other gnome packages with -dbgsym appended.


== Problem may be caused by gdm/kdm ==

If you suspect the issue may be caused by or in gdm, or for whatever reason you wish to isolate X from gdm, you can disable gdm from running at startup via:

{{{
  mv /etc/rc2.d/S30gdm /etc/rc2.d/disabled-S30gdm
  mv /etc/rc2.d/K30gdm /etc/rc2.d/disabled-K30gdm
}}}

Then start up X manually via:

{{{
  startx
}}}

You can subsequently restore gdm via:

{{{
  mv /etc/rc2.d/disabled-S30gdm /etc/rc2.d/S30gdm
  mv /etc/rc2.d/disabled-K30gdm /etc/rc2.d/K30gdm
}}}

Depending on how severe your login issue is, you may need to log into single user mode first to run the above commands.

A typical issue where you might want to do this, is if X is constantly crashing on bootup before you can log in, or if you can get to the login screen but can't access the tty consoles for whatever reason, but X is crashing when you login. This kind of situation is extremely rare, but unfortunately can happen with the right combination of bugs.

Note: Some docs you can find via google advocate running {{{update-rc.d -f gdm remove}}}, but this is incorrect - update-rc.d is a packaging utility, and not designed to be run by a user. It's changes will be undone automatically on the next package upgrade, so even though it may *seem* to work, it's accidental.

In the future, we hope to see gdm's startup manageable via upstart, which will provide a much cleaner and easier interface, but obviously such a change will take ample testing to make sure it's not introducing other bad side effects.


== Debugging Memory Issues ==

X maintains a pool of memory for GUI applications. As a consequence, it will often appear that X is using a lot of memory, when in fact it is one or more applications that are consuming the memory. `top` may not indicate which application is to blame, but use of `xrestop` or other X memory display tools may help indicate it.

As a workaround, you can limit the amount of RAM X is allowed to allocate via `ulimit -m` in your X startup script.

== Customizing Keyboard Compose Key Settings ==

The USA International keyboard layout is helpful for people who wish to use a USA keyboard with US English, but need to type in another language. However, since this is a general purpose layout, it doesn't include all key symbols that a given language needs, and it includes other symbols you probably don't want. There's a few ways to configure around this problem:

- Use a keyboard map that supports compose sequences but doesn't use deadkeys by default - e.g., USA
Alternative international (former us_intl) or USA International (AltGr dead keys).

- Enable multiple keyboard layouts for the different use cases using the System -> Preferences -> Keyboard
-> Layouts list and switch between them, using the GNOME keyboard applet or a layout-switching hotkey.

- override the standard compose map by creating a .XCompose file in your home directory, changing the maps
that you want to be have differently. e.g.:

{{{
<dead_acute> <l> : "'l"
<dead_acute> <L> : "'L"
<dead_acute> <m> : "'m"
}}}

Otherwise, get used to typing a space after every ' if you want it to be rendered as an apostrophe, since this is what the use of "dead keys" implies. ;-)
 * [[X/Troubleshooting/Other|Other common issues...]]

Contents

For hard bugs the analysis phase is the most important, and most challenging part of bug work. Depending on how the bug is behaving, there are multiple directions to investigate the issue. Here's some different approaches:


CategoryDebugging

X/Troubleshooting (last edited 2017-11-18 18:25:00 by penalvch)