BootDegradedRaid
586
Comment: added summary section
|
7809
Points to adjust
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from BootDegradedRAID | |
Line 12: | Line 13: |
Ubuntu's installer currently supports installation to software RAID1 targets for /boot and /. When one of the mirrored disks fails, and mdadm marks the RAID ''degraded'', it becomes impossible to reboot the system in an unattended manner. Booting Ubuntu with a failed device in a RAID1 will force the system into a recovery console. In some cases, this is the desired behavior, as a local system administrator would want to backup critical data, cleanly halt the system, and replace the faulty hardware immediately. In other cases, this behavior is highly undesired--particularly when the system administrator is remotely located and would prefer a system with redundant disks tolerate a failed device even on reboot. |
|
Line 14: | Line 23: |
* Kim uses software RAID since it is less expensive than hardware RAID, and since it is always available on any Ubuntu system with multiple disk devices. She particularly likes it on low-end and small-form-factor servers without built-in Hardware RAID, such as her arrays of blades and 1-U rack mount systems. She uses RAID1 (mirroring) as a convenient mechanism for providing runtime failover of hard disks in Ubuntu. Kim remotely administers her systems, where she has taken great care to use redundant disks in a RAID1 configuration. She absolutely wants to be able to tolerate a RAID degradation event and continue to boot such systems in an unattended manner until she is able to replace the faulty hardware. After rebooting her system following a primary drive failure, the system automatically boots from the secondary drive and brings up the RAID in degraded mode in a fully functional (though unprotected) configuration, since she specified in her system configuration to boot even if the RAID is degraded. * Steph also uses software RAID1 on her /boot and / filesystems. Steph is ''always'' physically present at the console when she reboots systems, and she always has spare disks on hand. Steph ''never'' wants to boot into a system with a degraded MD array. Steph configures her system to trap booting to a degraded array and instead deliver the system to a recovery console, thus more conservatively protecting her data at the expense of yielding her system unbootable in an unattended manner. * HotplugRaid |
|
Line 16: | Line 28: |
== Design == | The scope of this specification is to solve this problem within Ubuntu's software raid support and default bootloader within the Intrepid Ibex development cycle. == Design / Work Items == * '''Bootloader''' * grub-install needs to detect (/boot on an md?) or be configured to install grub to multiple devices, thus rendering multiple disks bootable * should probably also document that manual BIOS changes may be required for disk boot failover to occur properly * '''MD Error Handling''' * mdadm support, version in 8.04 has the new --incremental assembly option. * more verbose error hook messages * teach md error handler how to bring up md in degraded mode * '''Root Filesystem Wait''' * reduce rootfs wait timeout to 30 seconds (DONE) * option to abort rootfs wait, seems non-trivial, but quite handy |
Line 20: | Line 45: |
''Will be documented here as implementation occurs'' Here are the notes I (ceg) made for this. Use to your advantage. '''Points that will need some adjustments:''' * The mdadm --incremental option does not create device nodes (may rely on the hotplug system to create device notes). Thus udev rules are needed to create those devices or mdadm functionality to create the nodes needs to be reastablished. Furthermore --increment defaults to expect the new device naming scheme that includes partitions. (As a workaround I made the neccesary devices static by "cp -a"ing them to /lib/udev/devices/) * /etc/udev/rules.d/85-mdadm.rules: *The mdadm call needs to be changed to "mdadm --incremantal /dev/%k" *The command "watershed" is not installed by default, at least not in xubuntu 8.04. Is this a build-in of udev? * Rules for removal events needed wiki:Bug:244803 * "mdadm --incremental" will save state in a map file under /var/run/mdadm/map, but during early boot this directory does not yet exist and the state is saved in /var/run/mdadm.map.new (Change man page, it says /var/run/mdadm.map) The early map needs to be moved to the run-time path or something. (Beware of undefined state between mkdir and mv operation for hotplug events?) * The --incremental mode will start only complete arrays. Degraded raids that are necessary during boot need some extra attention: Do not just run any array that may be partially available (mdadm -As), but only those that are needed at a particular point of time. (i.e. it is not a good idea to start for example the /home array in degraded mode when we just need the rootfs to boot first. (External disks don't have be known that early during boot.) In the initramfs: {{{ If rootfs is on a md device, while timeout not reached, do: if root md device has come up, continue booting wait a second done. if timeout has been reached: mdadm --incremental --run <root md device> (But bug #244808 ) }}} (A wiki:Bug:244808 workaround for a single internal member partition is: "mdadm --stop /dev/md0" and a subsequent "mdadm --incremental --run /dev/sdaX") Using --incremental instead of --assemble will start the array in auto-readonly mode. This allows any raid members that may come up later to join smoothly without resyncing as long as nothing has been written to the array. Same approach for any other device that needs to be, or is configured to be, started even degraded: Check if md device has come up in the meantime or for a while, and start just those arrays in degraded mode that have been configured for this (all/whitelist/blacklist). Good old /etc/init.d/mdadm-raid could be the place to run desired arrays in degraded mode like this, "start-on-boot-even-if-degraded-devices" possibly configuable in /etc/default/mdadm. Without this selective approach some partly pluged in array members (removable disks) might get started unintentionally. (It is more concise, when getting the hotplug event "array plugged in" allways means all (active) members are plugged in. wiki:Bug:244810 ) *Desktop integration: * Show unstarted lonely array member partitions as icons with right-click "start degraded" option. Rules to stop md arrays when their filesystem is beeing unmounted, so that if members are removed after unmountig the filesystem they won't get set faulty. * A right-click "remove array member" option, to remove a (mirror) member from a running array. * Raid status monitoring frontend GUI to /proc/mdstat etc. |
|
Line 22: | Line 104: |
== BoF agenda and discussion == | == References == |
Line 24: | Line 106: |
== References == | * https://blueprints.launchpad.net/ubuntu/+spec/boot-degraded-raid * https://wiki.ubuntu.com/UDS-Intrepid/Report/Server#head-75c995bdf63bb5afe0f08461aba9200b6c95814f * https://lists.ubuntu.com/archives/ubuntu-devel/2007-September/024221.html * http://www.outflux.net/blog/archives/2006/04/23/grub-yaird-mdadm-and-missing-drives/ * http://osdir.com/ml/linux.redhat.anaconda.devel/2005-05/threads.html * wiki:Bug:120375 * wiki:Bug:125471 * Compare with HotplugRaid for a common use and test case. |
Launchpad Entry: https://blueprints.edge.launchpad.net/ubuntu/+spec/boot-degraded-raid
Created: 2008-05-30
Contributors: DustinKirkland
Packages affected: mdadm, grub, initramfs, udev, lvm2
Summary
This specification defines a methodology for enhancing Ubuntu's boot procedures to configure and support booting a system dependent on a degraded RAID1 device.
Rationale
Ubuntu's installer currently supports installation to software RAID1 targets for /boot and /. When one of the mirrored disks fails, and mdadm marks the RAID degraded, it becomes impossible to reboot the system in an unattended manner.
Booting Ubuntu with a failed device in a RAID1 will force the system into a recovery console.
In some cases, this is the desired behavior, as a local system administrator would want to backup critical data, cleanly halt the system, and replace the faulty hardware immediately.
In other cases, this behavior is highly undesired--particularly when the system administrator is remotely located and would prefer a system with redundant disks tolerate a failed device even on reboot.
Use Cases
- Kim uses software RAID since it is less expensive than hardware RAID, and since it is always available on any Ubuntu system with multiple disk devices. She particularly likes it on low-end and small-form-factor servers without built-in Hardware RAID, such as her arrays of blades and 1-U rack mount systems. She uses RAID1 (mirroring) as a convenient mechanism for providing runtime failover of hard disks in Ubuntu. Kim remotely administers her systems, where she has taken great care to use redundant disks in a RAID1 configuration. She absolutely wants to be able to tolerate a RAID degradation event and continue to boot such systems in an unattended manner until she is able to replace the faulty hardware. After rebooting her system following a primary drive failure, the system automatically boots from the secondary drive and brings up the RAID in degraded mode in a fully functional (though unprotected) configuration, since she specified in her system configuration to boot even if the RAID is degraded.
Steph also uses software RAID1 on her /boot and / filesystems. Steph is always physically present at the console when she reboots systems, and she always has spare disks on hand. Steph never wants to boot into a system with a degraded MD array. Steph configures her system to trap booting to a degraded array and instead deliver the system to a recovery console, thus more conservatively protecting her data at the expense of yielding her system unbootable in an unattended manner.
Scope
The scope of this specification is to solve this problem within Ubuntu's software raid support and default bootloader within the Intrepid Ibex development cycle.
Design / Work Items
Bootloader
- grub-install needs to detect (/boot on an md?) or be configured to install grub to multiple devices, thus rendering multiple disks bootable
- should probably also document that manual BIOS changes may be required for disk boot failover to occur properly
MD Error Handling
- mdadm support, version in 8.04 has the new --incremental assembly option.
- more verbose error hook messages
- teach md error handler how to bring up md in degraded mode
Root Filesystem Wait
- reduce rootfs wait timeout to 30 seconds (DONE)
- option to abort rootfs wait, seems non-trivial, but quite handy
Implementation
Will be documented here as implementation occurs
Here are the notes I (ceg) made for this. Use to your advantage.
Points that will need some adjustments:
- The mdadm --incremental option does not create device nodes (may rely on the hotplug system to create device notes). Thus udev rules are needed to create those devices or mdadm functionality to create the nodes needs to be reastablished. Furthermore --increment defaults to expect the new device naming scheme that includes partitions.
- (As a workaround I made the neccesary devices static by "cp -a"ing them to /lib/udev/devices/)
- /etc/udev/rules.d/85-mdadm.rules:
- The mdadm call needs to be changed to "mdadm --incremantal /dev/%k"
- The command "watershed" is not installed by default, at least not in xubuntu 8.04. Is this a build-in of udev?
Rules for removal events needed wiki:244803
- "mdadm --incremental" will save state in a map file under /var/run/mdadm/map, but during early boot this directory does not yet exist and the state is saved in /var/run/mdadm.map.new (Change man page, it says /var/run/mdadm.map) The early map needs to be moved to the run-time path or something. (Beware of undefined state between mkdir and mv operation for hotplug events?)
- The --incremental mode will start only complete arrays. Degraded raids that are necessary during boot need some extra attention:
- Do not just run any array that may be partially available (mdadm -As), but only those that are needed at a particular point of time. (i.e. it is not a good idea to start for example the /home array in degraded mode when we just need the rootfs to boot first. (External disks don't have be known that early during boot.) In the initramfs:
If rootfs is on a md device, while timeout not reached, do: if root md device has come up, continue booting wait a second done. if timeout has been reached: mdadm --incremental --run <root md device> (But bug #244808 )
(A wiki:244808 workaround for a single internal member partition is: "mdadm --stop /dev/md0" and a subsequent "mdadm --incremental --run /dev/sdaX") Using --incremental instead of --assemble will start the array in auto-readonly mode. This allows any raid members that may come up later to join smoothly without resyncing as long as nothing has been written to the array. Same approach for any other device that needs to be, or is configured to be, started even degraded: Check if md device has come up in the meantime or for a while, and start just those arrays in degraded mode that have been configured for this (all/whitelist/blacklist). Good old /etc/init.d/mdadm-raid could be the place to run desired arrays in degraded mode like this, "start-on-boot-even-if-degraded-devices" possibly configuable in /etc/default/mdadm.
Without this selective approach some partly pluged in array members (removable disks) might get started unintentionally. (It is more concise, when getting the hotplug event "array plugged in" allways means all (active) members are plugged in. wiki:244810 )
- Do not just run any array that may be partially available (mdadm -As), but only those that are needed at a particular point of time. (i.e. it is not a good idea to start for example the /home array in degraded mode when we just need the rootfs to boot first. (External disks don't have be known that early during boot.) In the initramfs:
- Desktop integration:
- Show unstarted lonely array member partitions as icons with right-click "start degraded" option. Rules to stop md arrays when their filesystem is beeing unmounted, so that if members are removed after unmountig the filesystem they won't get set faulty.
- A right-click "remove array member" option, to remove a (mirror) member from a running array.
- Raid status monitoring frontend GUI to /proc/mdstat etc.
Outstanding Issues
References
https://blueprints.launchpad.net/ubuntu/+spec/boot-degraded-raid
https://wiki.ubuntu.com/UDS-Intrepid/Report/Server#head-75c995bdf63bb5afe0f08461aba9200b6c95814f
https://lists.ubuntu.com/archives/ubuntu-devel/2007-September/024221.html
http://www.outflux.net/blog/archives/2006/04/23/grub-yaird-mdadm-and-missing-drives/
http://osdir.com/ml/linux.redhat.anaconda.devel/2005-05/threads.html
wiki:120375
wiki:125471
Compare with HotplugRaid for a common use and test case.
BootDegradedRaid (last edited 2010-04-21 10:02:37 by 188-194-18-172-dynip)