BootDegradedRaid

Differences between revisions 9 and 10
Revision 9 as of 2008-05-30 20:45:09
Size: 2552
Editor: 70-2-70-49
Comment: added design items
Revision 10 as of 2008-05-30 20:51:18
Size: 2814
Editor: 70-2-70-49
Comment: added bootloader design points
Deletions are marked like this. Additions are marked like this.
Line 34: Line 34:
  * grub-install needs to detect (/boot on an md?) or be configured to install grub to multiple devices, thus rendering multiple disks bootable
  * should probably also document that manual BIOS changes may be required for disk boot failover to occur properly

Summary

This specification defines a methodology for enhancing Ubuntu's boot procedures to configure and support booting a system dependent on a degraded RAID1 device.

Rationale

Ubuntu's installer currently supports installation to software RAID1 targets for /boot and /. When one of the mirrored disks fails, and mdadm marks the RAID degraded, it becomes impossible to reboot the system in an unattended manner.

Booting Ubuntu with a failed device in a RAID1 will force the system into a recovery console.

In some cases, this is the desired behavior, as a local system administrator would want to backup critical data, cleanly halt the system, and replace the faulty hardware immediately.

In other cases, this behavior is highly undesired--particularly when the system administrator is remotely located and would prefer a system with redundant disks tolerate a failed device even on reboot.

Use Cases

  • Software RAID is often a less expensive alternative to Hardware RAID, and is always available on any Ubuntu system with multiple disk devices
    • This is particularly useful on low-end and small-form-factor servers without built-in Hardware RAID, such as blades and 1-U rack mount systems
  • RAID1 (mirroring) is currently a convenient mechanism for providing runtime failover of hard disks in Ubuntu
  • Remotely administered systems where the owners have taken the initiative to use dedundant disks in a RAID1 configuration expect to be able to boot even after a RAID degradation event

Scope

The scope of this specification is to solve this problem within Ubuntu's software raid support and default bootloader within the Intrepid Ibex development cycle.

Design / Work Items

  • Bootloader

    • grub-install needs to detect (/boot on an md?) or be configured to install grub to multiple devices, thus rendering multiple disks bootable
    • should probably also document that manual BIOS changes may be required for disk boot failover to occur properly
  • MD Error Handling

  • Root Filesystem Wait

Implementation

Outstanding Issues

BoF agenda and discussion

References

BootDegradedRaid (last edited 2010-04-21 10:02:37 by 188-194-18-172-dynip)