VideoDriverDetection

Differences between revisions 7 and 8
Revision 7 as of 2009-11-10 00:00:31
Size: 7635
Editor: pool-74-107-129-37
Comment:
Revision 8 as of 2009-11-10 00:04:54
Size: 7731
Editor: pool-74-107-129-37
Comment:
Deletions are marked like this. Additions are marked like this.
Line 17: Line 17:
Some people have found that simply restarting the computer results in it coming back properly; others find they need to run: Some people have found that simply restarting the computer results in it coming back with fglrx properly; others find they need to run:
Line 23: Line 23:
or reinstall the nvidia driver. (Many people reinstall the nvidia driver from the nvidia website at this point, which also will solve the problem but puts your system into a non-stock state which may make future support more challenging.) and some find they must fully reinstall the nvidia driver. (Many people reinstall the nvidia driver from the nvidia website at this point, which also will solve the problem but puts your system into a non-stock state which may make future support more challenging.  It is probably sufficient to purge and reinstall nvidia.)

Missing kernel module on boot causes rapidly flashing black screen during boot

In Karmic it came to our attention that a number of people are experiencing a problem during upgrades where X fails to load and instead flashes a black screen continuously rather than going into the low graphics failsafe mode. Further investigation into logs shows that the problem is that the nvidia.ko kernel module is missing during boot.

Flashing GDM Bug: The black screen flashing behavior on X failure was root caused to be due to the newly rewritten gdm 2.8, which dropped support for triggering failsafe on X failure. This trigger would kick in for a variety of different types of X failures, such as a typo in the xorg.conf, mis-installation of video drivers, certain kinds of X crashes, and so on. This failsafe trigger would boot the user into a low graphics mode session that includes some basic tools for diagnosing, reporting, and working around the problem. Since this trigger feature was not included in the gdm rewrite, instead of triggering failsafe-x, gdm would just keep trying to start X continuously. Because the new gdm was introduced fairly late in our release cycle, our testing only discovered the bug shortly before release, so we decided to handle it as a post-release update. This flashing issue was solved several days after the release by introducing an upstart job which detects if gdm has failed several times, and then boots into the failsafe session. This is ready for upload to Karmic as SRU bug #441638 and is currently in karmic-proposed for testing.

Effect on Nvidia: The flashing behavior can kick in for a variety of reasons and it is not driver-specific. However, since the broken failsafe symptoms are so strikingly unusual, many people with drastically different underlying bugs have assumed they have the same problem, and this has led to much confusion. Nvidia was notable in particular because, as a proprietary binary driver, it has a need to rebuild its kernel module when the kernel version changes (such as during upgrades), and needless to say this is not an error-proof process. In the past, if it failed for whatever reason, GDM would load the failsafe-x session, which then displayed a relevant error message to the user; with the broken failsafe in Karmic users were not able to see this and thus had no way to differentiate one issue from the next.

Troubleshooting Advice: Generally for missing nvidia.ko bugs, what we need is the make.log file - this is found in a subdirectory under /var/lib/dkms/nvidia/. Our bug tools should be given the logic to attach this file automatically to apport bug reports, but for now it needs to be collected manually by the bug reporter.

Below we go into the technical details of some of these different bugs that can cause a missing nvidia.ko.

No nvidia.ko because GDM boots without waiting for DKMS to complete

Ironically, perhaps the most common situation that can cause a missing nvidia.ko kernel module is likely due to boot performance improvements in Karmic. In the past, when a new kernel was installed, on the next boot DKMS would rebuild nvidia.ko during the boot process. By the time X was ready to start this building would have been completed. In Karmic we now start X much earlier in the boot sequence, and this may not always give DKMS enough time to finish making nvidia.ko. When X discovers it is not there, it either falls back to the -nv or -vesa video driver (if possible) or triggers the failsafe.

Some people have found that simply restarting the computer results in it coming back with fglrx properly; others find they need to run:

    dpkg-reconfigure nvidia-185-kernel-source

and some find they must fully reinstall the nvidia driver. (Many people reinstall the nvidia driver from the nvidia website at this point, which also will solve the problem but puts your system into a non-stock state which may make future support more challenging. It is probably sufficient to purge and reinstall nvidia.)

Ultimately the real fix will be to serialize the driver update process to ensure nvidia.ko always gets built for the new kernel before attempting to boot X. Since this issue is something which could affect any DKMS-using driver, it may be best to implement this as a new feature for DKMS in Lucid.

As a shorter term solution which might be safe enough to roll out for Karmic, we could add an upstart job which causes gdm to be delayed if a DKMS build is in process.

No nvidia.ko module for -rt and -pae specialized kernels

Historically, the most common reason for a failed build of nvidia.ko was because the reporter was using one of the specialized kernels such as the real-time kernel or the PAE kernel, which may not receive as much testing as the standard Ubuntu kernel.

We put attention into Karmic to get these issues resolved, and most if not all of these bug reports are confirmed as solved, but it remains an area that remains only lightly tested and so will need continued vigilance.

No nvidia.ko module due to missing/broken build system pieces

In order to compile the nvidia.ko module it is necessary to have a working compiler, patch utility, and other build tools. There are certain (obscure) corner cases where these may be unavailable or broken. These are lower priority issues but still worth fixing. (Patches welcomed!)

One example of this that we need to examine closely for Lucid is upgrades directly from Hardy. (E.g. bug #467490)

Bug Table: Missing nvidia.ko

Bug

Description

Status in Karmic

Status in Lucid

Notes

464125

nvidia.ko fails to build on -rt kernel prior to -0ubuntu9

RESOLVED

RESOLVED

Fixed pre-release by patch in -nvidia 185.18.36-0ubuntu9

475494

nvidia.ko fails to build on -rt kernel with -0ubuntu9

Waiting for make.log

467490

Obsolete lrm-video still present gives "-Q is invalid option"

Low importance

High importance

Occurs when upgrading directly from Hardy to Karmic

434154

nvidia.ko fails to build if "patch" not installed

RESOLVED

RESOLVED

Fixed pre-release by change in -nvidia 185.18.36-0ubuntu8

474300

nvidia.ko fails to build(?)

User having some trouble locating make.log; problem unclear

438398

DKMS build can fail, but package upgrade will still be marked successful

High importance

High importance

Race condition in upgrade handling with DKMS?

474917

nvidia.ko built against 2.6.28, not 2.6.31

High importance

High importance

Simply rerunning dpkg-reconfigure solved it, so probably a DKMS race condition, maybe a dupe of 438398.

454220

nvidia.ko fails to build on -pae kernel

"Kernel headers for 2.6.31-14-generic-pae are not installed."

464113

nvidia.ko fails to build(?)

Waiting for make.log

456240

nvidia.ko fails to build due to failed CC sanity check

Oldish compiler/libc bug? may be already fixed

X/Troubleshooting/VideoDriverDetection (last edited 2013-02-06 21:20:48 by c-174-55-144-102)