Some tips for debugging the X server.
Apport - Or debugging the easy way
Since Intrepid Ibex it should be easily possible to get a full X backtrace with Apport which also attaches all other needed information to a new bug report like xorg.conf, Xorg.0.log and so on. One of the advantages is that for debugging no second computer is needed and no extra package installation.
By default Apport is disabled in stable Ubuntu releases, so you may need to activate it temporarily (this command works on Karmic and later releases):
sudo service apport start force_start=1
Reproduce the crash as soon as Apport is running. Afterwards, a message should appear in the Gnome/KDE Panel (at least after a logout/in) as described in the Apport-Wikipage. You can create new bug report with this message.
If Apport isn't able to create a backtrace, or you're running an older Ubuntu version, the following steps are needed:
"Crash".... or "Freeze"?
Some "crashes" are not really crashes (segmentation violations) but are instead what we call "Freezes" (or, "GPU lockups").
In a "freeze", the system will stop responding to input, you may see a blank/black screen or corruption, or just no graphical updates. If you have a freeze rather than a crash, collecting a backtrace won't be of help. Instead refer to the troubleshooting guides for freezes for your graphics driver.
In a true crash, X will terminate and drop back to a login screen. You can use the steps on this page to debug or report these kinds of issues.
Debug symbol information
You will likely need to install the package xserver-xorg-core-dbg, libgl1-mesa-dri-dbg and the one for your graphic driver xserver-xorg-video-<name>-dbg. Often you'll want dbg packages for other libraries or packages mentioned in your backtrace. Look for lines marked '??' which indicate missing symbols.
Log in remotely
You will want to run the commands in a terminal window on another computer since you will not have access to the local screen and keyboard. This is explained in DebuggingSystemCrash, but essentially just ssh into the sick machine from a well one.
Backtrace with gdb
Logged in remotely on your "sick" machine, you can now run the gdb debugger on the X server process. First, find the process ID (pid) of Xorg:
Then start gdb and attach to that process:
sudo gdb /usr/bin/Xorg 2>&1 | tee gdb-Xorg.txt
(gdb starts up and gives you its (gdb) prompt)
(gdb) attach <the process ID you found above> (gdb) cont
Now do what you need to make the X server crash. Or, if the problem is that the X server is locked up and doesn't react, stop it with ctrl-C. Now get a backtrace:
(gdb) backtrace full
See also Backtrace for more information on this. Note that if the process is already running, you should use the attach pid command instead of run.
You can now find the output of gdb in /home/<username>/gdb-Xorg.txt
If you stopped it with ctrl-C, you can let it run again with the continue command:
gdb and Xorg don't always work well together. It may help to start Xorg with the options -keeptty -dumbSched
keeptty allows you to ^C to get into gdb at anytime
dumpSched stops the smart scheduler interrupting each time you step
For instance, to start Xorg from within gdb (over a ssh connection), start gdb:
sudo gdb /usr/bin/Xorg 2>&1 | tee gdb-Xorg.txt
inside gdm, start up Xorg:
(gdb) run -keeptty -dumbSched
(posted by Barry Scott on the xorg ML)
If the server has died and dumped a core dump, and you're using the current development version of Ubuntu, apport can be used for filing the bug. It should automatically prompt you to file a bug.
Otherwise, if apport doesn't do it automatically, you can get a backtrace manually. Locate the core dump (usually in /etc/X11/core) and run
sudo gdb /usr/bin/Xorg /etc/X11/core
Then run the "backtrace full" command inside gdb.
If you can't find any core files after a crash, look also in /var/crash, where apport (the automatic crash reporter) leaves its reports.
Another problem can be that the default maximum size of core files has been set to 0. To avoid this limitation, run ulimit (in the same shell) before restarting the X server. Don't restart gdm as it seems to enforce soft core limit to zero. Use startx instead:
sudo /etc/init.d/gdm stop ulimit -c unlimited startx
The X server will by default intercept signals and for instance trap its own crashes and dump a stack trace in /var/log/Xorg.0.log. However, this stack trace is modified by the signal handler itself. To disable this signal interception, add this to your /etc/X11/xorg.conf:
Section "ServerFlags" Option "NoTrapSignals" "true" EndSection
and restart your X server. It is sometimes restarted when logging out, but you can also switch to a text console with Ctrl-Alt-F1, log in and run:
sudo /etc/init.d/gdm restart
You can also run this command remotely, in case you have trouble with your text consoles etc.
Debugging Error Exits
Much like a crash, the X server can terminate normally on an error. Since it terminated normally, you can't get a backtrace. However, typically an error will be printed on the console (but not in /var/log/Xorg.0.log). To look for the error message, look at the log files at /var/log/gdm/. If you just reproduced the crash it will be in :0.log; if it was the boot before that, look in :0.log.1.
Alternatively, it is not hard to view the exit messages directly. Login at a vt console or through ssh and start up X manually without gdm (or kdm):
sudo /etc/init.d/gdm stop startx
Now do whatever triggered the fault, and then look at the console output to see the error message.
Debugging Hangs / Freezes / Lockups
Hangs (aka freezes or lockups) differ from crashes or exits. In a crash, the server terminates at a specific point which can be backtraced. Hangs do not result in server termination, so the spot where the fault occurred is harder to isolate and identify, but with some persistence and gdb-fu you can find it manually.
First, start by finding a point in the code near where the error occurs. If you're lucky, one way to do this is to tail -f /var/log/Xorg.0.log from an ssh session and watch for what prints out immediately before the lockup. Then find the spot in the codebase where that message gets printed.
If you're not lucky, you'll need to make some guesses, or just pick a random spot.
Next, set a breakpoint in gdb:
(gdb) break <function-name>
Now run X until it hits the breakpoint and then start stepping through it until the fault occurs.
(gdb) run <args> ...runs until hits the breakpoint... (gdb) step (gdb) step ...
Note that this can be tedious! As you do it, look for additional spots to set breakpoints so you can skip over stepping through code you know isn't involved.
DRI / drm problems
More verbose debugging information can be obtained by enabling the debug option of the drm kernel module:
echo 1 | sudo tee /sys/module/drm/parameters/debug
Note that leaving this option on will generate a lot of messages in your /var/log/kern.log and /var/log/syslog! To turn it off again:
echo 0 | sudo tee /sys/module/drm/parameters/debug
Xorg Memory Usage
If you notice Xorg is using large amounts of memory, you can get a better indication of the server-side resource usage of X's client apps via the top-like xrestop program. For reporting issues, the xrestop -b option is handy. For example, xrestop -b -m 5 | grep -A 15 metacity would print 5 samples of resource usage of the window manager, taken 2 seconds apart.
Backtracing Using LiveCD
Generally upstream is most responsive if the bug can be verified in a new version of their code, but you may not be in a position to upgrade to the latest versions and prefer to do the testing using a temporary LiveCD environment. Here are tips for doing this:
1. Burn a CD of the latest development version of Ubuntu, using either:
Latest alpha releases images: http://cdimage.ubuntu.com/releases/
Latest daily images: http://cdimage.ubuntu.com/daily-live/current/
2. Boot the LiveCD environment
3. If X is failing to run as normal, switch to a virtual terminal (VT), via ctrl-alt-F1 and log in
4. Turn on the ssh server, so you can log in remotely
$ sudo apt-get install ssh $ sudo /etc/init.d/ssh start * Starting OpenBSD Secure Shell server sshd [ OK ]
5. Make any configuration changes needed.
6. Restart X (without doing a full reboot) using any of the following:
ctrl-alt-backspace (pre-jaunty only)
pkill -9 /usr/bin/X
See X/NonGraphicalBoot for more boot options
Continue debugging as normal.
Using Screen to get backtraces for Suspend/Resume crashes
When resuming from suspend, your ssh sessions will terminate, so the normal procedure of running gdb through ssh won't work. Fortunately, you can work around this issue by using a screen session.
Boot computer and login to X. openssh-server must be installed and running, and you must know the computer's IP and be able to access it from another system on the network.
Switch to tty1. Run:
screen -S xcrash
You may call the session whatever you want, I called it "xcrash".
Now inside the screen session, run:
pgrep Xorg sudo gdb /usr/bin/Xorg
(gdb) attach <the process ID you found above> (gdb) handle SIGUSR1 nostop (gdb) cont
The second line is required in order to be able to switch back to X. Now detach from the screen session (ctrl+a+d).
Switch back to X (usually tty7, sometimes tty9). Activate suspend/standby. Wait a few seconds, then pull the system out of suspend. Screen is blank.
From a remote computer, open an ssh session. Run:
screen -x xcrash
Now you have recovered your screen session and you will see some output in gdb. Enable logging:
(gdb) set logging on
Get the backtrace:
(gdb) backtrace full
Enter your way through the backtrace.
Now open another terminal and grab your log file, this is easiest with scp:
example (not real name/IP)
scp email@example.com:gdb.txt .
You now have the gdb.txt file with the backtrace on the machine you made your remote connection from.
Obtaining the video BIOS
First obtain the pci id for your video card, by looking at the lspci output.
Next, as root, do the following (replacing the pci bit with your own):
# cd /sys/devices/pci0000\:00/0000\:00\:02.0/ # echo 1 > rom # cat rom > /tmp/rom.bin # echo 0 > rom
Then send the resulting rom.bin