DealWithMaintainerScriptFailures

Deal with maintainer script failures in packages

Summary

Ubuntu should deal better with maintainer script (postinst, postrm, etc) failures during install/upgrade/remove of packages. There are a lot of situations where a failure is not critical but it still breaks installation/upgrades. This is especially important for complex scenarios like upgrades.

Release Notes

Use Cases

  • Robert has some third party package installed that causes a failure during the upgrade. The error is reported and the upgrade continues just fine.

Scope

Make libapt deal better with failures in maintainer scripts.

Research

The following problems were found during edgy->feisty: http://tinyurl.com/2c5npp and dapper->edgy http://tinyurl.com/3btaat

Here is a collection of failures:

  • failures where invoke-rc.d fails during the upgrade: #107739 (prerm), #107820, #108347 (postinst), #75946 (prerm), #104921 (prerm)
  • hotkey-setup 109679 (prerm)
  • segfault in maintainer script: #108091 (postinst)
  • emacs-package-install failing: #108384 (postinst)
  • grub poinst failure: #108576 (postinst)

Design

In general the problem to recovery gracefully from failing maintainer scripts is unsolvable with the current paradigm as a maintainer script can alter the system state in all possible ways. In practise however there are some heuristics we can apply to make the impact of most of these failures small (there are some failure cases where little can be done though).

Failing maintainer scripts lead to two problem problems:

  1. apt stops after dpkg returned a error. This can lead to a unbootable system or a system where core components are in a inconsistent state (nvidia kernel modules are upgraded but nvidia X driver is not making X fail)
  2. apt will complain and not apply updates if a system has broken dependencies (this includes security updates) because one core assumption is that the system is always in a state without broken dependencies. It can deal just fine with packages that are not configured yet.

In addition to dealing with the failure situation we should try to make sure that it happens less often (given the amount of packages we have it can not be avoided entirely).

Better recovery is also important, a "plugin" for the FriendlyRecovery mode will be added that will be able to bring a broken system into a supported configuration.

Implementation

A approach to solve problem (1) is a "DPkg::StopOnErrors" apt option. If that is set to false apt will keep runing dpkg in --unpack, --configure mode even if packages fail during the install/upgrade. The disadvantage of this mode is, that it can lead to error chains because packages that depend on other packages are tried to be configured when the dependant package failed to configure. Additional logic should be added to avoid that dpkg tries to unpack/configure those packages that are known to fail. This will be used on a desktop system.

On a server system we offer the user the choice to stop the upgrade on the first error so that he can manually recover. The assumption is that the server admins will be more experienced with the low-level packaging tools. When anything but the "postinst" script fails, we can roll back to the currently installed package usually and mark it "keep". Even with broken packages we should ensure (if at all possible) that at the end of the upgrade, at least the kernel, X and firefox are in a good state so that the user can get help online.

In addition to that there are some low-hanging fruits here to avoid failures in the first place. A lot of them are caused by invoke-rc.d failing. We can patch invoke-rc.d or add a rc-policy.d that checks for a environment variable like "RELEASE_UPGRADE_IN_PROCESS" and will not restart the daemon in that case (or restarts them but ignores failures). A daemon that fails to restart is not critical as we have to reboot after the upgrade anyway.

The FriendlyRecovery plugin will be the release-upgrader in text frontend mode with some added logic. In addition to the regular features it will ask about the desktop environment used if it can't auto-detect it and it will be able to work even when some packages are in broken dependency state. Special care must be taken to ensure that X gets properly installed. For a home user a machine without X is broken (FUBAR) and he will reinstall. This becomes less of a issue with the BulletProofX spec.

For problem (2) the apt algorithms and the new aptitude resolver will be investigated to see if that problem can be fixed. The scope for this fix can be as limited as to allow apt to still be able to calculate upgrades that involve upgrading package (regular security upgrades) and install some new packages (kernel security upgrades that break the ABI).

Some Example cases

Fixed by ignoring invoke-rc.d errors: #107739, #107820, #108347

Segfault in hplip postinst (actually in update-python-modules): #108091. Bad memory? May well be ok to just try to run the maintainer script again. Leaving it unconfigured (and its rdepends) will do no harm.

The hotkey-setup breakage #109679 on thinkpads where no thinkpad-keys daemon is running: http://librarian.launchpad.net/7469561/hotkey-setup_0.1-17ubuntu10.debdiff Leaving this package and the dependant ubuntu-desktop in unconfigured state would not harm the system in any way. Can be easily fixed via a SRU.

Mailman breakage (#108978) when /var/lib/mailman/qfiles is not empty. The right approach here is to set mailman on "keep" and carry on with the rest of the upgrade.

dmsetup fails in "update-initramfs -u" (#109625). The reason why this fails is unclear.

Future work

It should be investigated if a different order of installing the packages helps us. A ordering by "main","universe" first and then "required", "essential", "optional", "extra". It is likely that effectively we have this ordering already because usually the important packages are highest down the dependency chain anyway.

BOF discussion

It was discussed to "lie" to apt and mark a package that failed to configure as installed. This would make apts assumption that the system has no broken dependencies satisfied again. This inconsistency will lead to confusion and should not be implemented.

A dpkg option "--ignore-maintainer-script-failures" was discussed. This is equivalent of just deleting the maintainer script and install/remove the package again so that the broken script gets ignored. Not a desirable course of action.

Another breakage is with packages generated by checkinstall. The plan would be to change checkinstall so that packages created by it identify themselves in some way, via a change in the description or dropping a file in /usr/share/doc. Then the update-manager could check to see if any such packages have been installed and warn the user/remove those packages as needed.

It may be worth considering to define a concept of "trust metrics" with various repositories. This trust metric should be a sliding scale, decided by Ubuntu developers. Thus certain trusted 3rd party developers who have considered updates in their package building can have their users not being warned.

Test Plan

Upgrade known failure cases and see if the changes described here fix the behaviour.

DealWithMaintainerScriptFailures (last edited 2008-08-06 16:30:09 by localhost)