BootDegradedRaid

Differences between revisions 1 and 12 (spanning 11 versions)
Revision 1 as of 2008-05-30 19:25:36
Size: 303
Editor: 70-2-70-49
Comment: initial creation
Revision 12 as of 2008-05-30 20:57:06
Size: 3132
Editor: 70-2-70-49
Comment: added link to anaconda/yaboot raid work
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
 * '''Created''': [[Date(2008-05-30)]]  * '''Created''': 2008-05-30
Line 4: Line 4:
 * '''Packages affected''': mdadm, grub  * '''Packages affected''': mdadm, grub, initramfs, udev, lvm2
Line 8: Line 8:
This specification defines a methodology for enhancing Ubuntu's boot procedures to configure and support booting a system dependent on a degraded RAID1 device.
Line 9: Line 11:

Ubuntu's installer currently supports installation to software RAID1 targets for /boot and /. When one of the mirrored disks fails, and mdadm marks the RAID ''degraded'', it becomes impossible to reboot the system in an unattended manner.

Booting Ubuntu with a failed device in a RAID1 will force the system into a recovery console.

In some cases, this is the desired behavior, as a local system administrator would want to backup critical data, cleanly halt the system, and replace the faulty hardware immediately.

In other cases, this behavior is highly undesired--particularly when the system administrator is remotely located and would prefer a system with redundant disks tolerate a failed device even on reboot.
Line 12: Line 22:
 * Software RAID is often a less expensive alternative to Hardware RAID, and is always available on any Ubuntu system with multiple disk devices
  * This is particularly useful on low-end and small-form-factor servers without built-in Hardware RAID, such as blades and 1-U rack mount systems
 * RAID1 (mirroring) is currently a convenient mechanism for providing runtime failover of hard disks in Ubuntu
 * Remotely administered systems where the owners have taken the initiative to use dedundant disks in a RAID1 configuration expect to be able to boot even after a RAID degradation event
Line 14: Line 29:
== Design == The scope of this specification is to solve this problem within Ubuntu's software raid support and default bootloader within the Intrepid Ibex development cycle.

== Design / Work Items ==

 * '''Bootloader'''
  * grub-install needs to detect (/boot on an md?) or be configured to install grub to multiple devices, thus rendering multiple disks bootable
  * should probably also document that manual BIOS changes may be required for disk boot failover to occur properly
 * '''MD Error Handling'''
  * more verbose error hook messages
  * teach md error handler how to bring up md in degraded mode
 * '''Root Filesystem Wait'''
  * reduce rootfs wait timeout to 30 seconds (DONE)
  * option to abort rootfs wait, seems non-trivial, but quite handy

== Implementation ==

''Will be documented here as implementation occurs''

== Outstanding Issues ==

== References ==

 * https://blueprints.launchpad.net/ubuntu/+spec/boot-degraded-raid
 * https://wiki.ubuntu.com/UDS-Intrepid/Report/Server#head-75c995bdf63bb5afe0f08461aba9200b6c95814f
 * https://lists.ubuntu.com/archives/ubuntu-devel/2007-September/024221.html
 * http://www.outflux.net/blog/archives/2006/04/23/grub-yaird-mdadm-and-missing-drives/
 * http://osdir.com/ml/linux.redhat.anaconda.devel/2005-05/threads.html
 * wiki:Bug:120375
 * wiki:Bug:125471

Summary

This specification defines a methodology for enhancing Ubuntu's boot procedures to configure and support booting a system dependent on a degraded RAID1 device.

Rationale

Ubuntu's installer currently supports installation to software RAID1 targets for /boot and /. When one of the mirrored disks fails, and mdadm marks the RAID degraded, it becomes impossible to reboot the system in an unattended manner.

Booting Ubuntu with a failed device in a RAID1 will force the system into a recovery console.

In some cases, this is the desired behavior, as a local system administrator would want to backup critical data, cleanly halt the system, and replace the faulty hardware immediately.

In other cases, this behavior is highly undesired--particularly when the system administrator is remotely located and would prefer a system with redundant disks tolerate a failed device even on reboot.

Use Cases

  • Software RAID is often a less expensive alternative to Hardware RAID, and is always available on any Ubuntu system with multiple disk devices
    • This is particularly useful on low-end and small-form-factor servers without built-in Hardware RAID, such as blades and 1-U rack mount systems
  • RAID1 (mirroring) is currently a convenient mechanism for providing runtime failover of hard disks in Ubuntu
  • Remotely administered systems where the owners have taken the initiative to use dedundant disks in a RAID1 configuration expect to be able to boot even after a RAID degradation event

Scope

The scope of this specification is to solve this problem within Ubuntu's software raid support and default bootloader within the Intrepid Ibex development cycle.

Design / Work Items

  • Bootloader

    • grub-install needs to detect (/boot on an md?) or be configured to install grub to multiple devices, thus rendering multiple disks bootable
    • should probably also document that manual BIOS changes may be required for disk boot failover to occur properly
  • MD Error Handling

    • more verbose error hook messages
    • teach md error handler how to bring up md in degraded mode
  • Root Filesystem Wait

    • reduce rootfs wait timeout to 30 seconds (DONE)
    • option to abort rootfs wait, seems non-trivial, but quite handy

Implementation

Will be documented here as implementation occurs

Outstanding Issues

References

BootDegradedRaid (last edited 2010-04-21 10:02:37 by 188-194-18-172-dynip)