Proposal for removing old kernels
As users follow development cycles, and to a lesser degree, upgrade between releases, they accumulate a never ending list of kernels (in the form of ABI updates). For example, someone who has been following dapper since it was released would have around 6 kernels just from that release, and upgrading to hardy would have gotten them at least 3 more.
With each kernel taking up roughly 110Megs, that adds up to almost 1Gig of kernels and modules! This isn't even counting things like linux-restricted-modules and firmware.
Why aren't kernels autoremoved?
Apt, and by extension, synaptics and other package handlers, are able to autoremove packages that are considered to be no longer needed. Normally, these older kernels would be tagged and removed from the users system by this mechanism. Problem solved. NO! We cannot remove old kernels because we have no idea which kernels are the one(s) that the user needs or doesn't need. There's nothing saying that the newest X number of kernels actually work, or whether the user has even booted into them. So apt has an exception to not tag linux-image-* packages for autoremoval.
So this labels our first problem: Having a known good kernel around in case an upgrade fails. Right now, we still fall short on this, even saving all kernels. For example, let's say someone installs 2.6.24-16-generic with their 8.04 final CD. We put out a security update that doesn't change the ABI. In reality, we are overwriting their existing kernel. If it fails to boot for some reason, the user is dead in the water. They have no way to recover.
Now we're getting down to some core issues. Until we solve this, we cannot automatically remove any kernels.
The first part of the solution
Now we know what problem we need to solve, and why. So this is the first step to the rest of the issue. We need a way to keep a known working kernel around, so no matter what happens, we can boot into it.
Introducing last-good-boot. This is a new mechanism, established in grub, to detect a good boot, save away the resources used for that boot, and make it available to the user as a recovery option.
How it works
The real question is, when do we consider it a successful boot? For the purposes of this problem we consider reaching runlevel 2 successful. This may change, but for now, it's a good start.
Once the machine is booted into runlevel 2, an init script will take notice and copy the existing resources away (out of package control, so it wont be overwritten or removed). Several things are saved:
These are saved to a special location:
/boot/last-good-boot/version (the real uname -r of the saved kernel for reference)
- /lib/modules/last-good-boot/modules.dep (munged from original to change module locations)
The update-grub script will always create a "Last successful boot (recovery mode)" entry, even if a last-good-boot doesn't exist, simply because it can appear without update-grub being run afterwards. It will always be the third entry in the grub meny (below the newest+recover entries).
The grub menu.lst entry adds "last-good-boot" to the kernel command line for two reasons:
- So we wont save away last-good-boot on top of itself
- To signify to modprobe to use /lib/modules/last-good-boot/ for finding modules.dep
The change to modprobe will not affect it in initramfs (since modules in the initrd will remain in the uname -r location).
Where to go after here
The next steps are still being decided. Here is a quick run down of what may happen:
- Kernel image packages names will lose their versioned names. This will create the case where you will always have only one kernel installed (and one last-good-boot to back it up). Upgrades to the kernel will always overwrite the previous kernel.
- Apt will lose its exception for linux-image-* packages, so the old unused ones will start to be autoremoved.
- Could possibly stop update-initramfs from creating a .bak, since the reasons for it are no longer valid.
- Detect boot into last-good-boot and have a desktop notification so it's more visible to the user.
- Look into mechanism for grub to detect a boot failure and automatically choose last-good-boot next time.
How do I test this?
Last-good-boot is implemented fully in Intrepid/8.10 final, however it has been disabled because it was not considered stable enough. The setting is a single line in the file /etc/default/kernel-helper-rc.
As a stop-gap for now, a friendly utility should be written that prompts the user that they "could free disk space by removing unused kernels", explains to them enough to understand what they are doing, and guides them to help them remove the old ones on their own.