ReliableRaid

Revision 6 as of 2009-12-02 16:52:41

Clear message

RAIDs (Redundant arrays of independent disks) allow systems to keep runing even if some parts fail. With hardware raids you just plug more then one disk side by side. A buzzer or email watchdog will notify you if a disk fails and you have to plug a new (spare) one to up the redundancy again.


Unfortunately ubuntu's md (software) raid setup seems to suffer from a little incompleteness.

The mdadm assembling has been moved to the hotplug system (udev rules).

However:

  • The proper mdadm --incremental option does not work in initramfs (not creating device nodes) 251663

  • The proper command to start only selected hotplugable raids degraded (i.e. rootfs after timeout from initramfs) is not available. 251646

  • The corresponding command "mdadm --incremental --scan --run" to start all remaining hotplugable raids degraded (something to execute manually) does not start anything. 244808

  • Using the legacy method to start an array degraded breaks later --incremental (re)additions.
  • mdadm still reads a static /etc/mdadm/mdadm.conf file containing UUIDs and refuses to assemble other arrays hotpluged into the system. (It does not just go assembling matching superblocks and running arrays (only) if they are complete.) This breaks any actual hotpluging of (complete) md arrays to another machine. 252345

  • The initramfs boot process is not (a state machine) capable of assembling the base system from devices appearing in any order and starting necessary raids degraded if they are not complete after some time.
    • 251164 boot impossible due to missing initramfs failure hook integration

    • 136252 mdadm, initramfs missing ARRAY lines

    • 247153 encrypted root initialisation races/fails on hotplug devices (does not wait)

    • 488317 installed system fails to boot with degraded raid holding cryptdisk

  • For degraded regular (non-rootfs) arrays there is no init script at all to start/run them degraded. 259145 non-root raids fail to run degraded on boot

  • The ubuntu server manual says and claims that "If the array has become degraded, due to the chance of data corruption, by default Ubuntu Server Edition will boot to initramfs after thirty seconds. Once the initramfs has booted there is a fifteen second prompt giving you the option to go ahead and boot the system, or attempt manual recover." However, if a drive failed when the system was powered up, it will always reboot degraded afterwards *without* stoping the boot process for something (adding a new drive to the raid) that is designed to be done on live systems (quite a good thing to do as default) Reason: 244810 inconsistency with the --no-degraded option.