GrubDiskMapSanity

Summary

Find some way to make grub know which BIOS disk numbers (0x80, 0x81, etc.) correspond to which Linux whole-disk block devices (/dev/hda, /dev/sda, etc.), or make this information not be required in order to boot.

Problem description

Currently many people's systems cannot boot reliably, or break randomly during updates, because of a combination of these factors:

  • Linux IDE disk devices were renamed from edgy to feisty (from hd* to sd*)
  • The order in which devices are detected by Linux, and hence the device names assigned by Linux, is not predictable (and this has got a lot worse in practice recently)
  • The BIOS often sees a different set of disks to Linux, due to some devices not being supported properly in one or both environments.
  • Not too sure but I believe: grub-update, simulate grub, but on the Linux, it does not use the INT 13h as it does while really booting, and this is what makes it being detected in bad order

This means that the use of Linux device names in grub for the boot disk and various other things breaks.

Solutions

There are I think four possible solutions to this:

  • 1 Make Linux assign predictable names to devices which are critical to system operation. In my (iwj's) personal opinion this is the correct answer but it is far too far away from kernel upstream's philosophy. 2 Propagate disk map information to userland: Provide grub with some way to find the mapping between Linux disks and BIOS disks, and make it use it appropriately. 3 Have the user specify explicitly which mount points correspond to which BIOS drives in their grub configuration. Mount points correspond stably to actual drives because nowadays we use the contents to tell which disk is which. It is far from clear where this information would come from in the installer. 4 Have grub attempt to scan all BIOS drives searching for the second stage, third stage, kernel and initrd. This is almost certainly unworkable.

    5 Use Grub fallback: http://www.gnu.org/software/grub/manual/html_node/Booting-fallback-systems.html --Paul Dufresne 6 Make the grub code that check the mapping really use the same INT 13h code as it would if run on bare hardware,

    maybe by using the same thechnique used in vbetool to call BIOS: lrmi http://packages.ubuntu.com/gutsy/libs/libx86-1 --Paul Dufresne

Use Cases

Users with multiple disk controllers would like their systems to boot.

Scope

grub2 would be nice of course but I don't think it will solve this problem.

Design

The kernel team will enable the Linux kernel module which fetches this information from the BIOS EDD and makes it available via sysfs. This ought to be sufficient for grub provided it doesn't (a) make lots of machines break and (b) EDD is implemented in a big enough proportion of BIOSes.

Failing that we will see about putting a checksum or magic identifier in the boot sector (mbr or /boot partition).

Failing that we will change the grub configuration to make more use of device paths so that at the very least if you don't move your boot disk it will still work the next time.

Outstanding Issues

This spec is essentially "what if". The first attempt - EDD - is underway.

Release Note

It is not possible at this time to write a coherent release note as we don't yet know which approach is to be chosen.

Test plan

  • Set up a system with three SATA hard disks and at least three SATA ports as follows:
  • By plugging disks into the SATA ports and experimenting (eg, comparing sizes and/or serial numbers), determine the most common device detection order found by Linux, and which port the BIOS will boot from. Let us call these ports BIOS and B, C (where B, C are two ports other than BIOS in the normal Linux detection order).
  • Connect disks to BIOS, B, C.
  • Install Ubuntu from the d-i CD. Choose to erase disk C, but ask to install grub in the mbr of BIOS.
  • Check that the system boots both before and after update-grub.
  • Disconnect the unused disk on port B.
  • Check that the system still boots before and after update-grub.

Alternatively, if the test system exhibits unpredictable device discovery order under Linux, simply install a working setup using multiple disks check that it boots every time, and that update-grub always works and leaves a booting system.


CategorySpec

GrubDiskMapSanity (last edited 2008-08-06 16:21:49 by localhost)