Freeze

Symptoms

  • X stops responding to input (sometimes mouse cursor can still move, but clicking has no effect)
  • The screen displays but does not update. Sometimes there is screen corruption too, sometimes the screen goes black.
  • Often, X cannot be killed from the console or if it can, it won't restart properly; only a reboot clears the state
  • You can ping the machine and SSH into it remotely
  • Error messages such as "GPU lockup" are (sometimes) present in your dmesg output

Non-Symptoms

  • X crashes and returns you to the login screen - most of the time this indicates a crash, not a freeze. Collect a full backtrace

  • X seems to be working, but the monitor appears to just be "off" (See X/Troubleshooting/BlankScreen instead)

  • The caps lock key blinks - this indicates a kernel failure, not X
  • X CPU or memory load is high, making system laggy or freeze up. This usually indicates a client application is making too many X requests (See X/Troubleshooting/HighCPU).

  • Screen still updates (look at clock), but can't be interacted with - could be an input bug, not a GPU freeze

How It Works

The Graphical Processor Unit (GPU) is a hardware component on the video card or integrated into the CPU. It has memory structures and registers which it uses to produce graphical effects. If the video driver gives bad data to the GPU, the GPU can get stuck and must be reset. In some cases the driver can trigger a reset from software and the user might notice only a brief glitchiness (or maybe nothing at all), but in other cases it cannot and the graphics system remains in its bad state. Thus, a GPU lockup.

Some GPU lockups are triggerable and easily reproduced. Others are situational and "tend" to occur with certain programs loaded, certain load levels, or certain periods of time passing. Still others are seemingly random and impossible to tie to any definite set of preconditions. Knowing which of these three classes your bug fits in can provide a clue, as different types of driver errors can lead to the different classes.

GPU lockups are always handled as driver-specific bugs. Typically the source of the error is the handling of memory or command registers, graphics state, or other parameters of the hardware that the driver is responsible for. Often with the open source drivers (-intel, -ati, and -nouveau) the bug requires fixed in the kernel's drm drivers or in mesa's GLX drivers, thus many "X freeze" bugs technically are actually kernel or mesa bugs. But for bug reporting purposes in Ubuntu, to keep things simple we just lump them together and treat them as X bugs, until we have patches in hand.

Reporting GPU lockup Bugs

It's best to file a new bug report rather than joining onto someone else's. The reason is that every GPU lockup bug has the exact same symptom so it's impossible for a layman to tell the difference. A bug report involving dozens of different actual problems is just confusing and noisy for everyone involved. It's best to have a single bug report per specific problem.

For similar reasons, choose a title that is specific and descriptive. Something generic like "Ubuntu freezes randomly" will cause others with vaguely similar problems to post unhelpful "Me too!" comments, which will detract from getting your own issue looked into.

Reproduce the freeze, and with your system frozen ssh into it (over ethernet) and collect:

  • dmesg > dmesg.txt

  • /var/log/Xorg.0.log
  • /sys/kernel/debug/dri/0/i915_error_state [Intel graphics only]

Important questions to answer in your report:

  • Have you experienced just one lockup, or have you had a series of these lockups?
    • If you've had several, how often does it occur? Every few hours? Once or twice a day? Couple times a week?
  • When did you first notice it?
    • Shortly after upgrading?
    • After updating?
    • After changing compositing (Desktop Effects) settings?
  • Under what conditions does it seem most likely to reproduce?
    • Only at boot time?
    • When resuming from suspend or hibernate?
    • Only when using compositing (Unity, et al)
    • When changing resolution or enabling/disabling monitors
    • When the screensaver (or power saving mode) kicks in
    • Visiting particular web pages or loading particular files
    • Switching between desktops
    • When performing a specific sequence of actions (List them!)

Test Upstream Kernels

If you are able to reproduce the freeze relatively easily, a good place to start is to see whether or not the bug affects the current upstream kernel.

If you find a newer kernel version that works, include this fact in your bug report and it may help the maintainers locate the hardware support patches necessary. If you want to be particularly helpful, you can do a bisection search to find exactly when the needed support landed for your hardware.

Test Upstream X.org (mesa)

While the majority of GPU freezes tend to be bugs in the kernel code, occasionally there will be GPU lockup bugs caused by mesa or other components in the X stack. Builds of upstream X.org components are provided via the xorg-edgers PPA:

Although you probably only need mesa, when installing xorg-edgers you should install the entire stack; read the directions in the PPA carefully for guidance.

Gathering Register Dumps

Register Dumps for -ati (r5xx and newer chips)

avivotool can be used to assist in debugging issues on the newer generations of ATI chips (Radeon R500 and up - see ATI GPUs for a detailed listing of Radeon marketing names to series numbers.)

avivotool is provided with the radeontool package, so to install it run:

sudo apt-get install radeontool

After installing it, you run it like this:

   sudo avivotool regs all > regdump_good.txt

   sudo avivotool regs all > regdump_broke.txt

Run it two times. Once when you have a good, working screen (for any driver including -vesa), and once in the broken case (either from the tty console or logged into the sick box remotely).

Register Dumps for -ati (pre-r5xx chips)

radeontool can be used to assist in debugging these kinds of issues on pre-r5xx ATI GPUs where the screen blanks on start. After installing it, you run it like this:

    sudo radeontool regmatch '*' > regdump_good.txt

    sudo radeontool regmatch '*' > regdump_broke.txt

Run it two times. Once when you have a good, working screen (for any driver including -vesa), and once in the broken case (either from the tty console or logged into the sick box remotely).

Problem: Freezes when screensaver or video player changes DPMS settings

Display Power Management (DPMS) allows controlling the standby, suspend, and off time for your video monitor. Various software apps utilize this to do things like prevent the screensaver from turning on while watching a movie, or to power off the monitor after it has been idle a while.

Freezes that occur when the machine has been idle, that aren't due to OpenGL screensavers, may indicate bugs in how DPMS is working. Freezes that seem associated with exiting applications or switching to or from full screen, can suggest possibly the app tried to poke at DPMS and triggered a bug.

You can manually invoke and control DPMS using the xset command line tool:

 sleep 1; xset s activate

Or to turn the screensaver off:

 sleep 1; xset dpms force off

You can also use commands standby, suspend, or on instead of off.

Another workaround is to disable DPMS in xorg.conf, by adding an option to your Monitor section:

Section "Monitor"
...
        Option  "DPMS" "Off"
EndSection

Problem: Freeze began after a system update

Regression bugs that freeze the system reliably are ironically the most productive to solve.

If you updated your system and it started to freeze, start reverting package updates backwards until the freezes stop occurring.

The first thing to try is booting an earlier kernel. Hold down the left shift key during boot, so that the grub bootloader menu comes up. Look through the set of available kernels for one you used prior to the update; boot that and attempt to reproduce the freeze. If it does not freeze, then you now have a "Good" and "Bad" kernel and can proceed with a kernel bisection search to isolate what patch caused the failure. (Yes, compiling kernels sounds intimidating and time consuming, but stick with it - the process is well documented and it has a *very* high likelihood of narrowing it to a specific cause!)

X/Troubleshooting/Freeze (last edited 2017-11-18 18:07:07 by penalvch)