Freeze

Differences between revisions 16 and 17
Revision 16 as of 2010-01-19 17:29:32
Size: 11655
Editor: pool-74-107-129-37
Comment:
Revision 17 as of 2010-01-21 21:07:06
Size: 12253
Editor: 71
Comment: Batchbuffer dump: Add DebugFlushCaches option before producing freeze
Deletions are marked like this. Additions are marked like this.
Line 50: Line 50:
 1. `apt-get install intel-gpu-tools`
 2. `sudo INTEL_DEBUG=batch /etc/init.d/gdm restart`
 3. Reproduce your X freeze (This test script may help: http://launchpadlibrarian.net/25683477/repro.sh)
 4. ssh into the frozen machine
 5. Collect this info:
 . 1. `apt-get install intel-gpu-tools`
 . 2. For xserver-xorg-video-intel version >= 2.10.0, add the option `"DebugFlushCaches" "1"` to the Device section of `/etc/X11/xorg.conf`. A minimal `xorg.conf` with this option looks like this:
{{{
Section "Device"
 Identifier "my-self-configured-device"
 Driver "intel"
 Option "DebugFlushCaches" "1"
EndSection
}}}
 (Lucid will most likely ship with 2.10.x, but as of 2010-01-21 this option is only available via [[https://launchpad.net/~xorg-edgers/+archive/ppa|xorg-edgers]])
 . 3. `sudo INTEL_DEBUG=batch /etc/init.d/gdm restart`
 . 4. Reproduce your X freeze (This test script may help: http://launchpadlibrarian.net/25683477/repro.sh)
 . 5. ssh into the frozen machine
 . 6. Collect this info:
Line 66: Line 75:
 6. Then attach dri_debug-YYYYMMDD.tgz to your bug report  . 7. Then attach dri_debug-YYYYMMDD.tgz to your bug report
Line 70: Line 79:
  http://intellinuxgraphics.org/VOL_3_display_registers_updated.pdf  * http://intellinuxgraphics.org/VOL_3_display_registers_updated.pdf
 * https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/447892/comments/23

Symptoms

  • X stops responding to input (sometimes mouse cursor can still move, but clicking has no effect)
  • The screen displays but does not update. Sometimes there is screen corruption too, but usually there isn't.
  • Often, X cannot be killed; only a reboot clears the state
  • It can be hard to tie the problem to an exact test case; it seems to occur "randomly"

Non-Symptoms

  • A backtrace appears in Xorg.0.log or elsewhere - most of the time this indicates a crash, not a freeze
  • Screen blanks to a solid color (See X/Troubleshooting/BlankScreen instead)

  • The caps lock key blanks - this indicates a kernel failure, not X
  • CPU load is high. This indicates a performance issue rather than a freeze
  • Screen still updates (look at clock), but can't be interacted with - probably is an input bug, not a GPU freeze
  • System freezes for a period but then comes back. Real freezes never come back.

How It Works

X can freeze for a number of different reasons. Unfortunately, the symptoms are extremely similar, so it can be hard to determine if your freeze is the same as someone else's. Many bugs get incorrectly duped as a result.

It is also not unusual to have two or more different freeze bugs. This makes debugging hard. You may find a fix for the first freeze, but since the second freeze still happens, you can't easily tell that you fixed one.

In general, most freezes are due to Graphical Processor Unit (GPU) lockups. The GPU is the hardware chip that does graphics processing. GPU's have registers that the driver interacts with to produce graphical effects; if the driver interacts incorrectly, the GPU can get stuck into an error state that it cannot escape except by power cycling.

GPU layouts vary from model to model, and errors typically occur because a change was not adequately tested across a range of models before it was committed. So for instance, if a fix solves a bug by poking the Frobnitz register at address 0x11111111 on chipset A, but on chipset B Frobnitz is at address 0x22222222, unless the fix is limited to only be done on chipset A it could cause a GPU lockup on B.

Often, freezes seem to occur randomly, but in truth they're hardly even truly random. Freezes represent a tangible bug that exists in some particular section of code; that code is executed under certain conditions. Often there are methods to bypass the code in question (which can also give good clues for where to look for the bug) such as disabling DRI or Compiz, turning off DPMS or the screensaver, or avoiding use of Xv when playing video.

Reporting Freeze Bugs

  • When did you first notice it? Did you change any settings (Desktop Effects?) or update your system prior to first noticing it?
  • What frequency does it occur? Just once? Hourly? Daily?
  • Try to determine actions which reproduce it or make it more/less likely to reproduce.
    • Open lots of applications?
    • Heavy switching between desktops or vts
    • Suspend/resume
    • Visiting particular web pages or loading particular files
  • Include a Batchbuffer Dump. On -intel this is REQUIRED.

How to Get a Batchbuffer Dump (-intel only)

A batchbuffer dump provides invaluable debug statistics for upstream to troubleshoot X freezes. For this, you need a kernel version 2.6.30-rc2 or newer.

STEPS:

  • 1. apt-get install intel-gpu-tools

  • 2. For xserver-xorg-video-intel version >= 2.10.0, add the option "DebugFlushCaches" "1" to the Device section of /etc/X11/xorg.conf. A minimal xorg.conf with this option looks like this:

Section "Device"
 Identifier "my-self-configured-device"
 Driver "intel"
 Option "DebugFlushCaches" "1"
EndSection
  • (Lucid will most likely ship with 2.10.x, but as of 2010-01-21 this option is only available via xorg-edgers)

  • 3. sudo INTEL_DEBUG=batch /etc/init.d/gdm restart

  • 4. Reproduce your X freeze (This test script may help: http://launchpadlibrarian.net/25683477/repro.sh)

  • 5. ssh into the frozen machine
  • 6. Collect this info:

       sudo mount -t debugfs debugfs /sys/kernel/debug
       datestr=$(date +%Y%m%d)
       mkdir dri_debug-$datestr
       sudo cp -r /sys/kernel/debug/dri/0/i915* dri_debug-$datestr
       sudo intel_gpu_dump > dri_debug-$datestr/intel_gpu_dump.txt
       dmesg > dri_debug-$datestr/dmesg.txt
       cp /var/log/Xorg.0.log dri_debug-$datestr/
       sudo cp /var/log/gdm/\:0.log dri_debug-$datestr/gdm.log
       sudo tar czf dri_debug-$datestr.tgz dri_debug-$datestr/
  • 7. Then attach dri_debug-YYYYMMDD.tgz to your bug report

REFERENCES:

Narrow Subsystem it Occurs in

Often lockups occur due to code in a specific subsystem within the xserver or video driver. You can sometimes narrow the problem down usefully by testing with different things turned off. This is done via your xorg.conf. Common things to test include (try each one at a time!):

  • Option "AccelMethod" "xxx" - Try "XAA", "EXA" (ignored on -intel > 2.8.0)

  • Option "Accel" "Off" - turns off the 2D acceleration (ignored on -intel except for i810 and i815 chipsets)
  • Option "DRI" "Off" - turns off the 3D acceleration
  • Option "AIGLX" "Off" - turns off OpenGL indirect rendering acceleration
  • Option "PM" "Off" - turns off power management events
  • Option "NoMTRR" - turns off Memory Type Range Register support, which greatly improves performance so is usually on, but some hardware has buggy support for it

Other options can be found in man xorg.conf, man intel, man radeon, et al.

Sometimes adjusting settings (such as reducing video memory) can make a freeze more (or less) easily reproduced. This can be instrumental in helping debug the problem.

Problem: Freezes right after entering login credentials

By default, Ubuntu is typically configured to use Desktop Effects in your logged in session. It does not enable Desktop Effects for the login screen itself, though. Thus, if you never see freezes with the login screen itself, but always right after logging in, that can suggest that you're experiencing a freeze bug in the 3D system triggered by compiz or kwin coming on.

The standard way to disable Desktop Effects in the menu is via System>Preferences>Appearance, however if your freeze happens 100% of the time, it may be tricky to get to this menu! Also, it only applies for that user account.

A crude but effective brute force method is to just make compiz non-executable:

 sudo chmod a-x /usr/bin/compiz

You can always re-enable it later like this:

 sudo chmod a+x /usr/bin/compiz

Note that if you get a compiz update when updating Ubuntu, the update may fix the permissions. A better long-term work-around in such a case would be to uninstall compiz entirely.

Another approach is to leave compiz as is, and just disable compositing or DRI in xorg.conf (see below).

Problem: Freezes occur when idle and screensaver is set to random settings or OpenGL

A lot of freezes occur in the 3D code, and go unnoticed by users that don't otherwise use 3D stuff, except when an OpenGL screensaver activates.

One common situation is when the screensavers are set to "random", and allowed to mix in OpenGL 3D screensavers with the regular 2D ones. Not all 3D screensavers will trigger the freeze, and some will trigger it only a portion of the time.

An obvious workaround would be to set the screensaver to blank screen (or avoid the problematic OpenGL screensaver). Alternatively, you can disable DRI support in your xorg.conf (see below).

Problem: Freezes when screensaver or video player changes DPMS settings

Display Power Management (DPMS) allows controlling the standby, suspend, and off time for your video monitor. Various software apps utilize this to do things like prevent the screensaver from turning on while watching a movie, or to power off the monitor after it has been idle a while.

Freezes that occur when the machine has been idle, that aren't due to OpenGL screensavers, may indicate bugs in how DPMS is working. Freezes that seem associated with exiting applications or switching to or from full screen, can suggest possibly the app tried to poke at DPMS and triggered a bug.

You can manually invoke and control DPMS using the xset command line tool:

 sleep 1; xset s activate

Or to turn the screensaver off:

 sleep 1; xset dpms force off

You can also use commands standby, suspend, or on instead of off.

Another workaround is to disable DPMS in xorg.conf, by adding an option to your Monitor section:

Section "Monitor"
...
        Option  "DPMS" "Off"
EndSection

Problem: Log shows "[mi] EQ overflowing" and X freezes

This message indicates that the server has noticed that the GPU is locked up.

Problem: Log shows something about ring buffers and I830WaitLpRing (-intel only)

The ring buffer is the chunk of memory that contains commands we send down to the GPU. A WaitLpRing bug is generally a GPU hang, which can be caused by sending the GPU a bad instruction or address.

Problem: Freeze began after a system update

Regression bugs that freeze the system reliably are ironically the most productive to solve.

If you updated your system and it started to freeze, start reverting package updates backwards until the freezes stop occurring. /var/log/dpkg.log lists the packages it has updated in order. Older versions of debs are often cached for a time in your /var/cache/apt/archives/ directory.

Problem: Freeze began after upgrading from an older version of Ubuntu

Due to the large number of packages updated in an upgrade, it's impractical to revert packages step by step as above. Also, with these bugs it is more likely the error was introduced upstream.

If the freeze occurs whether 3D is enabled or not, then the problem may be in your video driver. If it only occurs when 3D is used, then the bug may be in mesa.

These regressions can be analyzed through bisection. Build and install the older, working version of the video driver or mesa and verify that when downgrading the broken system to those versions that the problem goes away. From there, use X/Bisecting techniques to narrow in on the specific change that caused the regression.

Stock Reply for "random freeze" bugs

Thanks for reporting this bug and helping to make Ubuntu better. What you have described is a generic X freeze. It could be caused by any number of things, and you need to take some additional steps to provide a complete report.

When did you upgrade to this version of Ubuntu? When did you first notice the freezes occurring?

How frequently do the freezes occur? How many per day would you say you experience?

List the applications you typically have open at the time of the freeze.

Think back to the last few times it froze. What activities were you doing in each of those times?

Do you have compiz enabled? Does the issue go away if you disable it?

If your system is a laptop, do you suspend/resume it? Had you resumed at some point prior to the freezes?

Finally, and most importantly, please collect a GPU dump after reproducing the bug. Packages to install and directions for doing this are available at:

With the GPU dump in hand, we will be able to upstream this bug.

For more tips on troubleshooting freeze bugs, please refer to these links:

X/Troubleshooting/Freeze (last edited 2017-11-18 18:07:07 by penalvch)