IntelPerformance
Symptoms:
- With compiz enabled, some animations such as desktop switching or compiz cube view are not smooth and take longer to complete than previously.
- Scrolling in firefox is laggy
- Switching to UXA or XAA makes performance issues go away
- Switching to UXA makes performance issues worse
- Switching compiz off makes performance significantly better
- 3D video game FPS measures are significantly lower than previously
Non-Symptoms:
- glxgears FPS drops. glxgears is not a benchmark, don't use it!
CPU usage is high. Rather, see X/Troubleshooting/HighCPU instead.
How It Works
Historically, a distinction has been made between 2D and 3D acceleration. 2D acceleration was provided by the venerable XFree86 Acceleration Architecture (XAA), which makes the video card's 2D hardware acceleration available to the X server. In cases where this was broken or unavailable, one could fall back to software rendering (e.g. the NoAccel option).
3D acceleration was provided via the Direct Rendering Manager, which worked by mapping 3D rendered pictures on top of the 2D picture.
This had some buggy corner cases, but more or less worked... until compositing (compiz, kwin) entered the picture. This blurs the distinction between 2D things and 3D, as it adds 3D effects into what was once exclusively 2D. Thus the distinction has become the source of a lot of brittleness, bugs, and performance problems.
A better solution is to move to 100% hardware accelerated OpenGL 3D support, with 2D becoming just a subset of 3D rendering. Unfortunately, switching entirely over is not so simple, and not without some major pitfalls.
EXA was introduced as a stopgap measure, to provide better integration with XRender than XAA did. In practice, while this proved quite advantageous in some respects, it also exhibited a number of corner cases and regressions.
The next stage of development, UXA, reimplements the EXA api using the new Graphics Execution Manager (GEM), a memory manager that better leverages the Linux kernel. EXA also now uses GEM (they share the same render acceleration code now).
Current/Ongoing Development
Upstream, Intel is in the midst of a long re-architecture effort. The end goal of this work is to solve a myriad number of long standing issues relating to multi-head, dynamic configuration, 3D support, and performance.
Part of this architecture work includes moving functionality from its traditional location to places where it will be more optimally able to support new capabilities. For example, graphics mode/resolution setting is moving out of the 2d video driver into the kernel. These changes are pretty significant, and can either uncover or generate bugs that regress previously working behavior; these bugs either emerge directly (bugs in the code) or indirectly from irregularities in how they're integrated together in the overall operating system.
Another part of the work involves introducing more efficient algorithms, experiment with bypassing low-performance legacy code, and timing events to occur more optimally, with the hope that this will net huge performance benefits. In some cases, this can be akin to balancing on a pin; and when the optimization fails, it can make performance much worse than if it hadn't been done to begin with. Sadly, the corner cases often don't appear until the code is deployed into the wild in real-world situations.
A third factor is hardware variability. In general, recent model chips (945, 965, etc.) are thoroughly tested, whereas older cards (8xx) are less widely tested and thus problems aren't found until the driver ships. More particularly, even with common models like 945 and 965, there are myriad little variations within the families that aren't always accounted for in testing.
All these things together conspire to make a witch's brew of performance regressions...
Problem: /dev/dri/card0 is missing
The /dev/dri/card0 device file is provided by the Linux kernel. If the file is missing, it prevents hardware acceleration from working. See if reloading the kernel module makes it work:
# Save your work first! $ sudo rmmod i915 $ sudo modprobe i915 $ lsmod | grep i915 i915 65412 1 drm 96424 2 i915 $ sudo pkill X
Problem: /dev/dri/card0 permissions are incorrect
In early Jaunty development, libdrm switched to using udev for creating the /dev/dri/card0 device node; Permissions for this device node are now set with an access-list, using hal and policy-kit. If the access-list is not set correctly, you will not have proper permissions to access /dev/dri/card0 which can have a small but measurable impact on performance with 3D games in some cases.
bryce$ ls -l /dev/dri/card0 crw-rw----+ 1 root video 226, 0 Mar 30 14:37 /dev/dri/card0
The file permissions show user bryce does not have read/write permissions to /dev/dri/card0. However, the + sign in the file permissions shows an access-list is used. You can see the access-list details with the getfacl command:
bryce$ getfacl /dev/dri/card0 getfacl: Removing leading '/' from absolute path names # file: dev/dri/card0 # owner: root # group: video user::rw- user:bryce:rw- group::rw- mask::rw- other::---
If getfacl shows the rw- permissions for your user, the access-list is working correctly. If it does not show the rw- permissions, you can debug the problem, for example using the steps as used in bug #306014.
Note that since hal now takes care of these permissions, setting DRI permissions in your xorg.conf is no longer necessary.
Problem: Falling back to OpenGL software rendering
Check if you're using software or hardware 3D rendering via glxinfo|grep render.
1. Is properly using hardware rendering:
bryce$ glxinfo | grep render direct rendering: Yes OpenGL renderer string: Mesa DRI Intel(R) 965GM GEM 20090114 x86/MMX/SSE2
2. Is falling back to software rendering:
bryce$ glxinfo | grep render direct rendering: Yes OpenGL renderer string: Software Rendering
Having DRI disabled can lead to this. Check if it has been loaded; your Xorg.0.log should show something like this:
bryce$ egrep "(GLX|DRI)" /var/log/Xorg.0.log (==) AIGLX enabled (II) Loading extension GLX (II) Loading extension XFree86-DRI (II) Loading extension DRI2 (WW) intel(0): DRI2 requires UXA (II) intel(0): [DRI] installation complete (II) intel(0): 0x0077f000-0x0cd3ffff: DRI memory manager (202500 kB) (II) intel(0): direct rendering: XF86DRI Enabled (II) AIGLX: Screen 0 is not DRI2 capable (II) AIGLX: enabled GLX_SGI_make_current_read (II) AIGLX: enabled GLX_MESA_copy_sub_buffer (II) AIGLX: enabled GLX_SGI_swap_control and GLX_MESA_swap_control (II) AIGLX: enabled GLX_texture_from_pixmap with driver support (II) AIGLX: Loaded and initialized /usr/lib/dri/i965_dri.so (II) GLX: Initialized DRI GL provider for screen 0
GLX and DRI need to be loaded and enabled. DRI2 doesn't matter unless you're using UXA.
If you have an i865 graphics card (check lspci |grep VGA), in Jaunty we've disabled DRI intentionally due to an X freeze-on-boot bug (see bug #317457).
3. Has no direct 3D rendering:
bryce$ glxinfo | grep render direct rendering: No (If you want to find out why, try setting LIBGL_DEBUG=verbose) OpenGL renderer string: Mesa GLX Indirect
Having DRI disabled can cause this, as in #2, if mesa is unable to fallback to software rendering.
Configuration issues are another common cause for this. Check that Xgl is not installed, and check your xorg.conf for extraneous settings (often these days on -intel you can run with no xorg.conf at all.)
Problem: UXA gives better performance than EXA - why not move to UXA?
As described in the preface above, UXA is a follow-on to EXA that promises a much better architecture for 2D/3D compositing, allowing DRI2 and KMS. It promises to solve a range of longstanding issues seen with EXA and provide a foundation for achieving higher performance.
However, UXA is still new and has been found to have some bugs. For Jaunty, our UXA Testing found mixed results. Many find it gives much better performance, but at the cost of increased instability (search Launchpad for "(UXA bug)" to get a list.)
For Jaunty, we figured that stability is more important than performance so have opted to stick with EXA. You are encouraged to try out UXA and use it if you find it works acceptably. We will probably move to it in Karmic if the issues get resolved.
To enable it, put this in your xorg.conf:
Section "Device" Identifier "Configured Video Device" # ... Option "AccelMethod" "uxa" EndSection
If your X breaks, remove that option to restore your system. You can do this by either booting to a safe mode and editing xorg.conf again, or run xfix.
If you run UXA, please be certain to forward bugs you find upstream.
Problem: Huge performance drop with UXA due to tiled rendering disabled
On some pre-965 platforms, tiled rendering got disabled when using dri2 instead of dri1. DRI2 gets enabled when UXA is used (it can't be used with EXA).
This is fixed in the 2.6.29 kernel and the 2.7 driver. Since we are not shipping UXA enabled by default due to stability concerns, those changes are not backported, but if you wish to use UXA you may find it better to upgrade to the upstream versions. (We will probably pull these in Karmic, assuming the stability issues get resolved.)
See jbarnes' message to intel-gfx, subject "[RFC] support tiled rendering on pre-965 chips"
Also see the recently fixed kernel bug #349314 allocate MCHBAR space and enable if necessary which fixes performance issues due to tiling support.
Problem: UXA slow due to "a17 swizzling" bug
Upstream has been working on a fix for a dual channel memory problem, the so-called "a17 swizzle" bug, which results in a performance slowdown.
A17 ("address bit 17") refers to a memory addressing mode. The Intel GPU and memory controller are on the same chip, and there are some intricate addressing modes used in combination with the CPU to increase performance. A17 is one of the bits that gets set or cleared in certain dual channel configs. Normally that's transparent to the user, but if a page of graphics memory gets swapped out and then back in at a different address, the a17 setting for the new page might be different.
Previously, this swap is not tracked, so tiling had to be disabled altogether.
Look for "drm/i915: Allow tiling of objects with bit 17 swizzling by the CPU." on the dri-devel mailing list.
Problem: EXA performance has regressed since Intrepid
In Jaunty, we are using the GEM memory manager, both for UXA and EXA.
At least one person has reported good results when backing that out via the patch at http://paste.ubuntu.com/142337/. This hasn't been widely tested though, so we're not shipping it in Jaunty at this time.
Problem: Sync to VBlank
VBlank synchronization is an optimization technique that balances the output rate of your graphics card with the rate that your monitor displays images on the screen. Your monitor's refresh rate -- also called the vertical blanking interval (VBlank) -- is the maximum speed your monitor can present images. If your graphics card renders faster than this rate, the extra images are rendered for naught, as you'll never actually see them. So if your monitor's VBlank is 60 Hz, and your graphics card is rendering 1000 frames-per-second, then 940 frames are getting drawn for no purpose. So theory synchronizing the two makes a lot of sense.
HOWEVER, things are not so simple, and vblank syncing can actually make performance worse. For a deeper discussion, see http://www.tweakguides.com/Graphics_9.html
To turn off vblank sync, use any of the three methods:
1. Setting vblank using compizconfig-settings-manager: Go to System->Preferences-> Compiz config settings manager -> general options -> Display settings-> Sync to VBlank. Turn it to 'off'.
2. From the compiz gconf configuration:
sync_to_vblank = false
3. Via driconf: 'Synchronization with vertical refresh (swap intervals)' -> set to 'Never synchronize with vertical refresh'.
Problem: glxgears has low FPS
It is important to note and remember that glxgears is *not* a benchmark tool. It simply measures how fast the driver writes images to the screen, whereas most 3D applications are limited by render speed, not merely blit speed. Instead use a 3D game (flightgear, tremulous, etc.) that has a real rendering workload to make comparisons.
glxgears is installed by default, easy to run, and generates results that are easy to read, so users like to utilize it as a rough measure of performance, and thus the question of why these numbers change comes up quite a bit.
For discussion we can categorize the fps changes into four groups:
1. glxgears drops from XXXX to YYY
This is generally nothing to worry about. High glxgears numbers generally indicate excessively high screen repainting, so if this drops from say 1500 fps to 500 fps, it should not produce any noticeable performance impact.
2. glxgears drops from XXX to YYY
As in #1, these drops are *usually* nothing to worry about, especially if you haven't noticed any change in performance. In *some* cases if you notice a performance drop, glxgears numbers may drop as well, and there may indeed be a correlation. However, glxgears isn't a benchmark, so it does little good to post the numbers (indeed, if anything it flags your post as ill-informed on the subject!)
Instead, get before and after FPS measurements using a 3D game, such as sauerbraten, tuxracer, tremulous, or the like.
3. glxgears drops from XXX to YY (e.g. usually around ~50-60fps)
This may indicate that you have vblank syncing turned on. Is the glxgears fps roughly equal to the refresh rate of your monitor? (LCDs tend to operate at 60Hz, so 50-60fps is common in these cases)
If so, see the previous section for a discussion of vblank settings. Normally vblank is a GOOD thing, but it can cause side-effects. If you are experiencing performance impacts, try disabling vblank.
4. glxgears drops to 30fps or below
Any time glxgears returns fps less than your monitor's refresh rate, it definitely indicates a performance problem.
If the fps seems to hit at about half your monitor refresh rate, it indicates the graphics card is not synchronizing properly, and is missing every other frame request by the monitor.