Some tips for debugging the X server.

Apport - Or debugging the easy way

Since Intrepid Ibex it should be easily possible to get a full X backtrace with Apport which also attaches all other needed information to a new bug report like xorg.conf, Xorg.0.log and so on. One of the advantages is that for debugging no second computer is needed and no extra package installation.

By default Apport is disabled in stable Ubuntu releases, so you may need to activate it temporarily (this command works on Karmic and later releases):

sudo service apport start force_start=1

Reproduce the crash as soon as Apport is running. Afterwards, a message should appear in the Gnome/KDE Panel (at least after a logout/in) as described in the Apport-Wikipage. You can create new bug report with this message.

If Apport isn't able to create a backtrace, or you're running an older Ubuntu version, the following steps are needed:

"Crash".... or "Freeze"?

Some "crashes" are not really crashes (segmentation violations) but are instead what we call "Freezes" (or, "GPU lockups").

In a "freeze", the system will stop responding to input, you may see a blank/black screen or corruption, or just no graphical updates. If you have a freeze rather than a crash, collecting a backtrace won't be of help. Instead refer to the troubleshooting guides for freezes for your graphics driver.

In a true crash, X will terminate and drop back to a login screen. You can use the steps on this page to debug or report these kinds of issues.

Debug symbol information

You will likely need to install the package xserver-xorg-core-dbg, libgl1-mesa-dri-dbg and the one for your graphic driver xserver-xorg-video-<name>-dbg. Often you'll want dbg packages for other libraries or packages mentioned in your backtrace. Look for lines marked '??' which indicate missing symbols.

Log in remotely

You will want to run the commands in a terminal window on another computer since you will not have access to the local screen and keyboard. This is explained in DebuggingSystemCrash, but essentially just ssh into the sick machine from a well one.

Backtrace with gdb

Logged in remotely on your "sick" machine, you can now run the gdb debugger on the X server process. First, find the process ID (pid) of Xorg:

pgrep Xorg

Then start gdb and attach to that process:

sudo gdb /usr/bin/Xorg 2>&1 | tee gdb-Xorg.txt

(gdb starts up and gives you its (gdb) prompt)

(gdb) attach <the process ID you found above>
(gdb) cont

Now do what you need to make the X server crash. Or, if the problem is that the X server is locked up and doesn't react, stop it with ctrl-C. Now get a backtrace:

(gdb) backtrace full

See also Backtrace for more information on this. Note that if the process is already running, you should use the attach pid command instead of run.

You can now find the output of gdb in /home/<username>/gdb-Xorg.txt

You can do a lot of stuff with gdb. See the gdb documentation or for instance one tutorial out of many.

If you stopped it with ctrl-C, you can let it run again with the continue command:

(gdb) cont

gdb problems

gdb and Xorg don't always work well together. It may help to start Xorg with the options -keeptty -dumbSched

  • keeptty allows you to ^C to get into gdb at anytime

  • dumpSched stops the smart scheduler interrupting each time you step

For instance, to start Xorg from within gdb (over a ssh connection), start gdb:

sudo gdb /usr/bin/Xorg 2>&1 | tee gdb-Xorg.txt

inside gdm, start up Xorg:

(gdb) run -keeptty -dumbSched

(posted by Barry Scott on the xorg ML)

Post-mortem backtrace

If the server has died and dumped a core dump, and you're using the current development version of Ubuntu, apport can be used for filing the bug. It should automatically prompt you to file a bug.

Otherwise, if apport doesn't do it automatically, you can get a backtrace manually. Locate the core dump (usually in /etc/X11/core) and run

sudo gdb /usr/bin/Xorg /etc/X11/core

Then run the "backtrace full" command inside gdb.

If you can't find any core files after a crash, look also in /var/crash, where apport (the automatic crash reporter) leaves its reports.

Another problem can be that the default maximum size of core files has been set to 0. To avoid this limitation, run ulimit (in the same shell) before restarting the X server. Don't restart gdm as it seems to enforce soft core limit to zero. Use startx instead:

sudo /etc/init.d/gdm stop
ulimit -c unlimited

Untrap signals

The X server will by default intercept signals and for instance trap its own crashes and dump a stack trace in /var/log/Xorg.0.log. However, this stack trace is modified by the signal handler itself. To disable this signal interception, add this to your /etc/X11/xorg.conf:

Section "ServerFlags"
        Option "NoTrapSignals" "true"

and restart your X server. It is sometimes restarted when logging out, but you can also switch to a text console with Ctrl-Alt-F1, log in and run:

sudo /etc/init.d/gdm restart

You can also run this command remotely, in case you have trouble with your text consoles etc.

Debugging Error Exits

Much like a crash, the X server can terminate normally on an error. Since it terminated normally, you can't get a backtrace. However, typically an error will be printed on the console (but not in /var/log/Xorg.0.log). To look for the error message, look at the log files at /var/log/gdm/. If you just reproduced the crash it will be in :0.log; if it was the boot before that, look in :0.log.1.

Alternatively, it is not hard to view the exit messages directly. Login at a vt console or through ssh and start up X manually without gdm (or kdm):

sudo /etc/init.d/gdm stop

Now do whatever triggered the fault, and then look at the console output to see the error message.

Debugging Hangs / Freezes / Lockups

Hangs (aka freezes or lockups) differ from crashes or exits. In a crash, the server terminates at a specific point which can be backtraced. Hangs do not result in server termination, so the spot where the fault occurred is harder to isolate and identify, but with some persistence and gdb-fu you can find it manually.

First, start by finding a point in the code near where the error occurs. If you're lucky, one way to do this is to tail -f /var/log/Xorg.0.log from an ssh session and watch for what prints out immediately before the lockup. Then find the spot in the codebase where that message gets printed.

If you're not lucky, you'll need to make some guesses, or just pick a random spot.

Next, set a breakpoint in gdb:

 (gdb) break <function-name>

Now run X until it hits the breakpoint and then start stepping through it until the fault occurs.

 (gdb) run <args>
...runs until hits the breakpoint...
 (gdb) step
 (gdb) step

Note that this can be tedious! As you do it, look for additional spots to set breakpoints so you can skip over stepping through code you know isn't involved.

DRI / drm problems

More verbose debugging information can be obtained by enabling the debug option of the drm kernel module:

echo 1 | sudo tee /sys/module/drm/parameters/debug

Note that leaving this option on will generate a lot of messages in your /var/log/kern.log and /var/log/syslog! To turn it off again:

echo 0 | sudo tee /sys/module/drm/parameters/debug

Xorg Memory Usage

If you notice Xorg is using large amounts of memory, you can get a better indication of the server-side resource usage of X's client apps via the top-like xrestop program. For reporting issues, the xrestop -b option is handy. For example, xrestop -b -m 5 | grep -A 15 metacity would print 5 samples of resource usage of the window manager, taken 2 seconds apart.

Backtracing Using LiveCD

Generally upstream is most responsive if the bug can be verified in a new version of their code, but you may not be in a position to upgrade to the latest versions and prefer to do the testing using a temporary LiveCD environment. Here are tips for doing this:

1. Burn a CD of the latest development version of Ubuntu, using either:

2. Boot the LiveCD environment

3. If X is failing to run as normal, switch to a virtual terminal (VT), via ctrl-alt-F1 and log in

4. Turn on the ssh server, so you can log in remotely

 $ sudo apt-get install ssh
 $ sudo /etc/init.d/ssh start
  * Starting OpenBSD Secure Shell server sshd [ OK ] 

5. Make any configuration changes needed.

6. Restart X (without doing a full reboot) using any of the following:

  • alt-sysrq-k

  • ctrl-alt-backspace (pre-jaunty only)

  • /etc/init.d/gdm restart

  • pkill -9 /usr/bin/X

  • See X/NonGraphicalBoot for more boot options

Continue debugging as normal.

Using Screen to get backtraces for Suspend/Resume crashes

When resuming from suspend, your ssh sessions will terminate, so the normal procedure of running gdb through ssh won't work. Fortunately, you can work around this issue by using a screen session.

Boot computer and login to X. openssh-server must be installed and running, and you must know the computer's IP and be able to access it from another system on the network.

Switch to tty1. Run:

   screen -S xcrash

You may call the session whatever you want, I called it "xcrash".

Now inside the screen session, run:

   pgrep Xorg
   sudo gdb /usr/bin/Xorg


   (gdb) attach <the process ID you found above>
   (gdb) handle SIGUSR1 nostop
   (gdb) cont

The second line is required in order to be able to switch back to X. Now detach from the screen session (ctrl+a+d).

Switch back to X (usually tty7, sometimes tty9). Activate suspend/standby. Wait a few seconds, then pull the system out of suspend. Screen is blank.

From a remote computer, open an ssh session. Run:

   screen -x xcrash

Now you have recovered your screen session and you will see some output in gdb. Enable logging:

   (gdb) set logging on

Get the backtrace:

   (gdb) backtrace full

Enter your way through the backtrace.

Now open another terminal and grab your log file, this is easiest with scp:

   scp:username@ip:gdb.txt .

example (not real name/IP)

   scp connor@ .

You now have the gdb.txt file with the backtrace on the machine you made your remote connection from.

Obtaining the video BIOS

First obtain the pci id for your video card, by looking at the lspci output.

Next, as root, do the following (replacing the pci bit with your own):

# cd /sys/devices/pci0000\:00/0000\:00\:02.0/
# echo 1 > rom
# cat rom > /tmp/rom.bin
# echo 0 > rom

Then send the resulting rom.bin

More information

X/Backtracing (last edited 2012-03-07 08:16:46 by jtv)