VideoDriverDetection
Symptoms:
- The NVIDIA binary driver is installed but X boots without 3D, or fails to boot intermittently
- With no driver specified in /etc/X11/xorg.conf, booting X loads the vesa driver (as seen by many "VESA:" lines in your /var/log/Xorg.0.log instead of "INTEL:", "FGLRX:", etc.)
- With a driver specified in /etc/X11/xorg.conf, booting X ignores the setting and loads vesa instead
Non-Symptoms:
- /var/log/Xorg.0.log shows some other non-VESA driver, but just isn't loading up for some reason. In this case look in /var/log/gdm/*, /var/log/Xorg.0.log[.old], dmesg, or /var/log/syslog for error messages and go from there.
How It Works
Since Hardy, the X server auto-detects what driver should be loaded for the video hardware it detects as present. It does this by polling the PCI bus for VGA devices, and then checks their PCI ID's against a registry of PCI ID's each video driver has reported as supported; if none are found, -vesa is the fallback. Next, it loads the driver and initializes it. Finally, it probes the monitor(s) for DDC or EDID information to determine its resolution, dpi, and other capabilities, and brings them up.
Problem: Video card not supported by driver
Determine the PCI ID of the chip:
lspci -nn | grep VGA 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV535 [Radeon X1650 Series] [1002:71c7] (rev 9e)
Here, the PCI ID is 1002:71c7.
Now look in the PCI ID registry to see which driver is used. (Drop the colon ':' when doing this.)
grep -i 100271c7 /usr/share/xserver-xorg/pci/*.ids /usr/share/xserver-xorg/pci/radeon.ids:100271C7
This indicates that X would pick the -radeon driver for me. If you did not get output, then your chip is not currently supported by any driver. There could be several reasons:
1. If you have newer hardware (such as if you're an OEM and testing hardware that is not yet released) then it may simply be the case that support for your chip has not been added to the driver, or was added only recently and is not available in the released driver.
You can try appending your pci id to the /usr/share/xserver-xorg/pci/*.ids file. This will tell X to load the driver anyway. *Sometimes* this works; more often it does not.
You can also try a newer version of the video driver by installing from upstream git. See Freedesktop Git Usage for directions on using git.
2. On the other hand, if you have older hardware, then if your hardware used to work with a non-vesa driver in the past, possibly support for your hardware has been dropped from the video driver. Check that driver's mailing list for more information.
3. If you have unusual hardware, such as a video card from a minor vendor, then a third possibility is that it needs a driver not available in the standard distribution (or perhaps not available at all). In this case, you'll need to locate and install the necessary driver.
Problem: No monitor detected
If no monitor is connected at the time X boots, it may not know what driver to load. Try attaching a monitor.
Problem: X is falling back to vesa
In some cases, X will fail to boot with the correct driver, and try again with a different driver, like vesa. Usually in this case you should have a /var/log/Xorg.0.log or /var/log/Xorg.0.log.old that explains why it failed to load the original driver.
Problem: Multiple video cards are installed
Currently (as of Intrepid at least), X does not work properly if two or more video cards are installed. Often you'll see an error in the /var/log/Xorg.0.log regarding multiple devices being detected, and then at the end will be a "No screens found" error.
It is also conceivable that one card will boot properly, but the other one does not (due to various issues such as those listed above), and that X is just unluckily picking the broken one.
To force X to use one card or the other, first find its bus id:
lspci -nn | grep VGA 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV535 [Radeon X1650 Series] [1002:71c7] (rev 9e)
In my case, this is '01:00.0'. Now specify this in your xorg.conf:
Section "Device" Identifier "Configured Video Device" Driver "ati" BusId "PCI:1:0:0" EndSection
Problem: Missing NVIDIA kernel module on boot causes rapidly flashing black screen during boot
By and large it sounds like the vast majority of users had smooth upgrade experiences. However, it came to our attention that some nvidia users experienced a problem during upgrades to Karmic where X fails to load and instead flashes a black screen continuously rather than going into the low graphics failsafe mode; this problem is now solved. Further investigation as to why it was going into failsafe mode to begin with, showed that the problem was that the nvidia.ko kernel module was missing during boot; these are due to a variety of unusual situations, and are described in more detail below.
Flashing GDM Bug: The black screen flashing behavior on X failure was root caused to be due to the newly rewritten gdm 2.8, which dropped support for triggering failsafe on X failure. This trigger would kick in for a variety of different types of X failures, such as a typo in the xorg.conf, mis-installation of video drivers, certain kinds of X crashes, and so on. This failsafe trigger would boot the user into a low graphics mode session that includes some basic tools for diagnosing, reporting, and working around the problem. Since this trigger feature was not included in the gdm rewrite, instead of triggering failsafe-x, gdm would just keep trying to start X continuously. Because the new gdm was introduced fairly late in our release cycle, our testing only discovered the bug shortly before release, so we decided to handle it as a post-release update. This flashing issue was solved several days after the release by introducing an upstart job which detects if gdm has failed several times, and then boots into the failsafe session. This is ready for upload to Karmic as SRU bug #441638 and is currently in karmic-proposed for testing.
Effect on Nvidia: The flashing behavior can kick in for a variety of reasons and it is not driver-specific. However, since the broken failsafe symptoms are so strikingly unusual, many people with drastically different underlying bugs have assumed they have the same problem, and this has led to much confusion. Nvidia was notable in particular because, as a proprietary binary driver, it has a need to rebuild its kernel module when the kernel version changes (such as during upgrades), and needless to say this is not an error-proof process. In the past, if it failed for whatever reason, GDM would load the failsafe-x session, which then displayed a relevant error message to the user; with the broken failsafe in Karmic users were not able to see this and thus had no way to differentiate one issue from the next.
Troubleshooting Advice: Generally for missing nvidia.ko bugs, what we need is the make.log file - this is found in a subdirectory under /var/lib/dkms/nvidia/. Our bug tools should be given the logic to attach this file automatically to apport bug reports, but for now it needs to be collected manually by the bug reporter.
Below we go into the technical details of some of these different bugs that can cause a missing nvidia.ko.
A. No nvidia.ko because it failed to build but Jockey didn't notice
Normally, most users rely on Jockey, the hardware driver administration tool, to install -nvidia. In certain cases it may be possible that nvidia.ko simply fails to build for the given combination of hardware, kernel, and driver. The expectation is that on failure Jockey ought to bail out and notify the user of the problem, but there are cases where this does not happen, see bug #451305. A fix for this Jockey issue is uploaded to karmic-proposed now and will be released after gaining sufficient testing.
B. No nvidia.ko because GDM boots without waiting for DKMS to complete
Ironically, perhaps the most common situation that can cause a missing nvidia.ko kernel module is likely due to boot performance improvements in Karmic. In the past, when a new kernel was installed, on the next boot DKMS would rebuild nvidia.ko during the boot process. By the time X was ready to start this building would have been completed. In Karmic we now start X much earlier in the boot sequence, and this may not always give DKMS enough time to finish making nvidia.ko. When X discovers it is not there, it either falls back to the -nv or -vesa video driver (if possible) or triggers the failsafe.
At least, this is our theory. Because the way this works there is either never a true error, or else the log file with the error gets overwritten. So either way there is no solid evidence this issue has happened in the wild. Still, even if it is just theoretical, we should ensure it cannot happen. See bug #438398 for tracking this problem.
Some people have found that simply restarting the computer results in it coming back with -nvidia properly; others find they need to run:
dpkg-reconfigure nvidia-185-kernel-source
and some find they must fully reinstall the nvidia driver. (Many people reinstall the nvidia driver from the nvidia website at this point, which also will solve the problem but puts your system into a non-stock state which may make future support more challenging. It is probably sufficient to purge and reinstall nvidia.)
Ultimately the real fix will be to serialize the driver update process to ensure nvidia.ko always gets built for the new kernel before attempting to boot X. Since this issue is something which could affect any DKMS-using driver, it may be best to implement this as a new feature for DKMS in Lucid.
As a shorter term solution which might be safe enough to roll out for Karmic, we could add an upstart job which causes gdm to be delayed if a DKMS build is in process.
C. No nvidia.ko module for -rt, -pae, -server specialized kernels
Historically, the most common reason for a failed build of nvidia.ko was because the reporter was using one of the specialized kernels such as the real-time kernel or the PAE kernel, which may not receive as much testing as the standard Ubuntu kernel.
We put attention into Karmic to get these issues resolved, and most if not all of these bug reports are confirmed as solved, but it remains an area that remains only lightly tested and so will need continued vigilance.
D. No nvidia.ko module due to missing/broken build system pieces
In order to compile the nvidia.ko module it is necessary to have a working compiler, patch utility, and other build tools. There are certain (obscure) corner cases where these may be unavailable or broken. These are lower priority issues but still worth fixing. (Patches welcomed!)
One example of this that we need to examine closely for Lucid is upgrades directly from Hardy. (E.g. bug #467490)
Bug Table: Missing nvidia.ko
Bug |
Description |
Status in Karmic |
Status in Lucid |
Notes |
nvidia.ko fails to build on -rt kernel prior to -0ubuntu9 |
RESOLVED |
RESOLVED |
Fixed pre-release by patch in -nvidia 185.18.36-0ubuntu9 |
|
nvidia.ko fails to build on -rt kernel with -0ubuntu9 |
|
|
Waiting for make.log |
|
Obsolete lrm-video still present gives "-Q is invalid option" |
Low importance |
High importance |
Occurs when upgrading directly from Hardy to Karmic |
|
nvidia.ko fails to build if "patch" not installed |
RESOLVED |
RESOLVED |
Fixed pre-release by change in -nvidia 185.18.36-0ubuntu8 |
|
nvidia.ko fails to build(?) |
|
|
User having some trouble locating make.log; problem unclear |
|
DKMS build can fail, but package upgrade will still be marked successful |
High importance |
High importance |
Race condition in upgrade handling with DKMS? |
|
nvidia.ko built against 2.6.28, not 2.6.31 |
High importance |
High importance |
Simply rerunning dpkg-reconfigure solved it, so probably a DKMS race condition, maybe a dupe of 438398. |
|
nvidia.ko fails to build on -pae kernel |
|
|
"Kernel headers for 2.6.31-14-generic-pae are not installed." |
|
nvidia.ko fails to build(?) |
|
|
Waiting for make.log |
|
nvidia.ko fails to build due to failed CC sanity check |
RESOLVED |
RESOLVED |
Oldish compiler/libc bug(?) |