VideoDriverDetection
Missing kernel module on boot causes rapidly flashing black screen during boot
In Karmic it came to our attention that a number of people are experiencing a problem during upgrades where X fails to load and instead blinks continuously rather than going into the low graphics failsafe mode. Further investigation into logs shows that the problem is that the nvidia.ko kernel module is missing during boot.
The black screen flashing behavior on X failure was root caused to be due to the newly rewritten gdm 2.8, which dropped support for triggering failsafe on X failure. This trigger would kick in for a variety of different types of X failures, such as a typo in the xorg.conf, mis-installation of video drivers, certain kinds of X crashes, and so on. This failsafe trigger would boot the user into a low graphics mode session that includes some basic tools for diagnosing, reporting, and working around the problem. Since this trigger feature was not included in the gdm rewrite, instead of triggering failsafe-x, gdm would just keep trying to start X continuously. Because the new gdm was introduced fairly late in our release cycle, our testing only discovered the bug shortly before release, so we decided to handle it as a post-release update. This flashing issue was solved several days after the release by introducing an upstart job which detects if gdm has failed several times, and then boots into the failsafe session. This is ready for upload to Karmic as SRU bug #441638 and is currently in karmic-proposed for testing.
The flashing behavior can kick in for a variety of reasons and it is not driver-specific. However, since the broken failsafe symptoms are so strikingly unusual, many people with drastically different underlying bugs have assumed they have the same problem, and this has led to much confusion. Nvidia was notable in particular because, as a proprietary binary driver, it has a need to rebuild its kernel module when the kernel version changes (such as during upgrades), and needless to say this is not an error-proof process. In the past, if it failed for whatever reason, the failsafe-x session would have kicked in and displayed a relevant error message to the user; with the broken failsafe in Karmic users were not able to see this and thus had no way to differentiate one issue from the next.
This section examines a class of bugs where boot either fails or fallsback to a non-nvidia driver due to the nvidia.ko module being missing. There are a number of root causes for why nvidia.ko can end up missing, but they tend to have seemingly identical symptoms.
Bug |
Description |
Status in Karmic |
Status in Lucid |
Notes |
nvidia.ko fails to build on -rt kernel prior to -0ubuntu9 |
RESOLVED |
RESOLVED |
Fixed pre-release by patch in -nvidia 185.18.36-0ubuntu9 |
|
nvidia.ko fails to build on -rt kernel with -0ubuntu9 |
|
|
Waiting for make.log |
|
Obsolete lrm-video still present gives "-Q is invalid option" |
Low importance |
High importance |
Occurs when upgrading directly from Hardy to Karmic |
|
nvidia.ko fails to build if "patch" not installed |
RESOLVED |
RESOLVED |
Fixed pre-release by change in -nvidia 185.18.36-0ubuntu8 |
|
nvidia.ko fails to build(?) |
|
|
User having some trouble locating make.log; problem unclear |
|
DKMS build can fail, but package upgrade will still be marked successful |
High importance |
High importance |
Race condition in upgrade handling with DKMS? |
|
nvidia.ko built against 2.6.28, not 2.6.31 |
High importance |
High importance |
Simply rerunning dpkg-reconfigure solved it, so probably a DKMS race condition, maybe a dupe of 438398. |
|
nvidia.ko fails to build on -pae kernel |
|
|
"Kernel headers for 2.6.31-14-generic-pae are not installed." |
|
nvidia.ko fails to build(?) |
|
|
Waiting for make.log |
|
nvidia.ko fails to build due to failed CC sanity check |
|
|
Oldish compiler/libc bug? may be already fixed |