BootDegradedRaid

Differences between revisions 3 and 10 (spanning 7 versions)
Revision 3 as of 2008-05-30 19:47:22
Size: 586
Editor: 70-2-70-49
Comment: added summary section
Revision 10 as of 2008-05-30 20:51:18
Size: 2814
Editor: 70-2-70-49
Comment: added bootloader design points
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
Ubuntu's installer currently supports installation to software RAID1 targets for /boot and /. When one of the mirrored disks fails, and mdadm marks the RAID ''degraded'', it becomes impossible to reboot the system in an unattended manner.

Booting Ubuntu with a failed device in a RAID1 will force the system into a recovery console.

In some cases, this is the desired behavior, as a local system administrator would want to backup critical data, cleanly halt the system, and replace the faulty hardware immediately.

In other cases, this behavior is highly undesired--particularly when the system administrator is remotely located and would prefer a system with redundant disks tolerate a failed device even on reboot.
Line 13: Line 21:

 * Software RAID is often a less expensive alternative to Hardware RAID, and is always available on any Ubuntu system with multiple disk devices
  * This is particularly useful on low-end and small-form-factor servers without built-in Hardware RAID, such as blades and 1-U rack mount systems
 * RAID1 (mirroring) is currently a convenient mechanism for providing runtime failover of hard disks in Ubuntu
 * Remotely administered systems where the owners have taken the initiative to use dedundant disks in a RAID1 configuration expect to be able to boot even after a RAID degradation event
Line 16: Line 29:
== Design == The scope of this specification is to solve this problem within Ubuntu's software raid support and default bootloader within the Intrepid Ibex development cycle.

== Design / Work Items ==

 * '''Bootloader'''
  * grub-install needs to detect (/boot on an md?) or be configured to install grub to multiple devices, thus rendering multiple disks bootable
  * should probably also document that manual BIOS changes may be required for disk boot failover to occur properly
 * '''MD Error Handling'''
 * '''Root Filesystem Wait'''
Line 25: Line 46:

 * https://blueprints.launchpad.net/ubuntu/+spec/boot-degraded-raid
 * https://wiki.ubuntu.com/UDS-Intrepid/Report/Server#head-75c995bdf63bb5afe0f08461aba9200b6c95814f
 * https://lists.ubuntu.com/archives/ubuntu-devel/2007-September/024221.html
 * http://www.outflux.net/blog/archives/2006/04/23/grub-yaird-mdadm-and-missing-drives/
 * wiki:Bug:120375
 * wiki:Bug:125471

Summary

This specification defines a methodology for enhancing Ubuntu's boot procedures to configure and support booting a system dependent on a degraded RAID1 device.

Rationale

Ubuntu's installer currently supports installation to software RAID1 targets for /boot and /. When one of the mirrored disks fails, and mdadm marks the RAID degraded, it becomes impossible to reboot the system in an unattended manner.

Booting Ubuntu with a failed device in a RAID1 will force the system into a recovery console.

In some cases, this is the desired behavior, as a local system administrator would want to backup critical data, cleanly halt the system, and replace the faulty hardware immediately.

In other cases, this behavior is highly undesired--particularly when the system administrator is remotely located and would prefer a system with redundant disks tolerate a failed device even on reboot.

Use Cases

  • Software RAID is often a less expensive alternative to Hardware RAID, and is always available on any Ubuntu system with multiple disk devices
    • This is particularly useful on low-end and small-form-factor servers without built-in Hardware RAID, such as blades and 1-U rack mount systems
  • RAID1 (mirroring) is currently a convenient mechanism for providing runtime failover of hard disks in Ubuntu
  • Remotely administered systems where the owners have taken the initiative to use dedundant disks in a RAID1 configuration expect to be able to boot even after a RAID degradation event

Scope

The scope of this specification is to solve this problem within Ubuntu's software raid support and default bootloader within the Intrepid Ibex development cycle.

Design / Work Items

  • Bootloader

    • grub-install needs to detect (/boot on an md?) or be configured to install grub to multiple devices, thus rendering multiple disks bootable
    • should probably also document that manual BIOS changes may be required for disk boot failover to occur properly
  • MD Error Handling

  • Root Filesystem Wait

Implementation

Outstanding Issues

BoF agenda and discussion

References

BootDegradedRaid (last edited 2010-04-21 10:02:37 by 188-194-18-172-dynip)