VideoDriverDetection

Revision 6 as of 2009-11-09 21:23:43

Clear message

Missing kernel module on boot causes rapidly flashing black screen during boot

In Karmic it came to our attention that a number of people are experiencing a problem during upgrades where X fails to load and instead blinks continuously rather than going into the low graphics failsafe mode. Further investigation into logs shows that the problem is that the nvidia.ko kernel module is missing during boot.

The black screen flashing behavior on X failure was root caused to be due to the newly rewritten gdm 2.8, which dropped support for triggering failsafe on X failure. This trigger would kick in for a variety of different types of X failures, such as a typo in the xorg.conf, mis-installation of video drivers, certain kinds of X crashes, and so on. This failsafe trigger would boot the user into a low graphics mode session that includes some basic tools for diagnosing, reporting, and working around the problem. Since this trigger feature was not included in the gdm rewrite, instead of triggering failsafe-x, gdm would just keep trying to start X continuously. Because the new gdm was introduced fairly late in our release cycle, our testing only discovered the bug shortly before release, so we decided to handle it as a post-release update. This flashing issue was solved several days after the release by introducing an upstart job which detects if gdm has failed several times, and then boots into the failsafe session. This is ready for upload to Karmic as SRU bug #441638 and is currently in karmic-proposed for testing.

The flashing behavior can kick in for a variety of reasons and it is not driver-specific. However, since the broken failsafe symptoms are so strikingly unusual, many people with drastically different underlying bugs have assumed they have the same problem, and this has led to much confusion. Nvidia was notable in particular because, as a proprietary binary driver, it has a need to rebuild its kernel module when the kernel version changes (such as during upgrades), and needless to say this is not an error-proof process. In the past, if it failed for whatever reason, the failsafe-x session would have kicked in and displayed a relevant error message to the user; with the broken failsafe in Karmic users were not able to see this and thus had no way to differentiate one issue from the next.

This section examines a class of bugs where boot either fails or fallsback to a non-nvidia driver due to the nvidia.ko module being missing. There are a number of root causes for why nvidia.ko can end up missing, but they tend to have seemingly identical symptoms.

Bug

Description

Status in Karmic

Status in Lucid

Notes

464125

nvidia.ko fails to build on -rt kernel prior to -0ubuntu9

RESOLVED

RESOLVED

Fixed pre-release by patch in -nvidia 185.18.36-0ubuntu9

475494

nvidia.ko fails to build on -rt kernel with -0ubuntu9

Waiting for make.log

467490

Obsolete lrm-video still present gives "-Q is invalid option"

Low importance

High importance

Occurs when upgrading directly from Hardy to Karmic

434154

nvidia.ko fails to build if "patch" not installed

RESOLVED

RESOLVED

Fixed pre-release by change in -nvidia 185.18.36-0ubuntu8

474300

nvidia.ko fails to build(?)

User having some trouble locating make.log; problem unclear

438398

DKMS build can fail, but package upgrade will still be marked successful

High importance

High importance

Race condition in upgrade handling with DKMS?

474917

nvidia.ko built against 2.6.28, not 2.6.31

High importance

High importance

Simply rerunning dpkg-reconfigure solved it, so probably a DKMS race condition, maybe a dupe of 438398.

454220

nvidia.ko fails to build on -pae kernel

"Kernel headers for 2.6.31-14-generic-pae are not installed."

464113

nvidia.ko fails to build(?)

Waiting for make.log

456240

nvidia.ko fails to build due to failed CC sanity check

Oldish compiler/libc bug? may be already fixed