HardyInitramfsErrorHandling

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

There is not enough error handling when building new initrds. The order of operations is flawed, which leaves the system in a potentially unbootable state.

Additionally, at boot-time, there is no useful information about dealing with a system where the root filesystem cannot be found.

Release Note

Full-disk (and related) situations are now correctly handled by update-initramfs and update-manager. Systems should not be left in an unbootable state. Additionally, more information is available at boot-time when the root filesystem is missing.

Rationale

There is no reason update-initramfs should leave a system unbootable. Additionally, at runtime, there needs to be a better way to customize the reporting of a failed boot.

Use Cases

  • Amy has installed a new kernel. Her /boot partition is filled with prior development kernels, and update-initramfs runs out of disk space. It reports the error, but does not break the prior initrds, leaving the system bootable.
  • Jeff has part of a RAID fail. On boot-up, he is greeted with a detailed report of what has failed, and his options to repair the situation.

Assumptions

  • Fixing update-initramfs is simple.
  • More information at boot-time is useful.

Design

The tool update-initramfs needs to expect to run out of disk space, and perform defensive atomic linking/moving of the new initrd. When installing new kernels, update-manager needs to guess at available size in /boot and complain when there will not be enough space.

A basic error handler for mountroot panics needs to be defined in scripts/functions. Scripts in scripts/init-top/ or scripts/${BOOT}-premount/ should be able to define mountroot failure handlers, which will be called in prereq order. Handlers should be informational only -- they should not attempt to actually do anything beyond querying the state of the failed-to-boot system. Booting with no-panic-handler should skip all handlers and go directly to the regular panic prompt.

Implementation

Code Changes

  • update-initramfs needs to change the initrd build logic:
    •       ln -f $outfile $outfile.bak
            mkinitramfs $outfile.new $version
            mv -f $outfile.new $outfile
  • update-manager needs additional logic to detect the expected size of an initrd
  • update-manager should respect a "Minimum Disk Space" control field, which can be added to the kernel package
  • scripts/functions needs add_mountroot_failure_handler

  • scripts/functions needs to have panic know when mountroot is running to call the registered failure_handlers (or to skip when "no-panic-handler" is defined) after usplash is disabled.

  • scripts/local needs to move the error messages into its own panic handler
  • mdadm needs to add scripts/local-premount/mdadm to define a sane panic handler
  • lvm2 needs to add scripts/local-premount/lvm2 to define a sane panic handler

Ideas of what to add to the mdadm/lvm2 panic handler: https://lists.ubuntu.com/archives/ubuntu-devel/2007-September/024221.html

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during CD testing, and to show off after release.

This need not be added or completed until the specification is nearing beta.

  • Attempt to update an initrd when /boot is full -- system remains bootable.
  • Attempt to install a kernel update with update-manager when /boot is full -- it should fail gracefully.
  • Boot with a failed raid disk -- the mdadm hook should report detailed information and instructions.
  • Boot with a partial LVM -- the lvm2 hook should report detailed information and instructions.

Outstanding Issues

  • None known

BoF agenda and discussion

Handling build-time initramfs failures

  • Want to make sure we have enough free space in the filesystem before installing the package, as initramfs failures are sometimes caused by not enough room in /boot to generate the initramfs
    • Add special case to update-manager to guess sizes needed
    • Handle with a control-file extention specifying minimum free space
    • More general solution would be to download the package, and examine it to determine free space requirements.
  • Proper error handling during initramfs creation in postinst for the kernel (update-initramfs)
    •       ln -f $outfile $outfile.bak
            mkinitramfs $outfile.new $version
            mv -f $outfile.new $outfile

Handling run-time initramfs failures

  • Failures for having missing or failed root devices are not handled very well
    • LTSP
    • degraded RAID devices
  • investigate current busybox fallback method used in initramfs
  • add a new function hook (like "mountroot") for the scripts/${BOOT} scripts to call when they fail
  • details on the degraded RAID logic: https://lists.ubuntu.com/archives/ubuntu-devel/2007-September/024221.html


CategorySpec

HardyInitramfsErrorHandling (last edited 2008-08-06 16:15:09 by localhost)