UdevLvmMdadmEvmsAgain

Revision 9 as of 2007-05-15 17:47:52

Clear message

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

In feisty we changed the way LVM, software raid (mdadm), EVMS, et al, interact with udev, to try to improve the situation. This was not a success and the resulting arrangements are racy.

We now propose an alternative approach which we believe will reduce (and hopefully eliminate) races.

Principle

For userspace-provided devices such as dm-* and md-* (hereafter, synthetic devices) the relevant userspace tools will be responsible for device node creation and also for triggering consequential scanning.

The intent is that synthetic devices will only be processed automatically if they have been published explicitly by the tool which creates them.

Design

  • udev will be made to ignore kernel-generated events for devmapper and md devices.
  • the udev rendezvous functionality will be removed from libdevmapper; instead, libdevmapper will once again create device nodes itself (and we will set the default permissions to 600 root.root)
  • udev will be modified to support userspace-generated events (aka synthetic events; see below)
  • the userspace tools (lvm, mdadm, and evms, but not dmsetup) will be modified to generate synthetic udev events for each "top level" device when that device has been initialised and ready for I/O, and another event when the device is to be removed just before it becomes unavailable for I/O.
  • udev will respond to the synthetic event by overwriting the device node generated libdevmapper (or whatever) with one of its own with the correct ownership and permissions, using rename(2) to install it; it will also do normal vol_id scanning of the resulting device.

Details of synthetic device publication conditions

Publication of a synthetic device means that that device should be processed further by automatic scanning tools. Currently publication only affects udev rules (see Problem unaddressed, below).

No synthetic device publication will be processed unless atoi(getenv("UDEV_SYNTHPUB_BLOCK")). Synthetic device removal will always be recorded as withdrawal of any matching synthetic device (and will be idempotent if there is no such published synthetic device).

lvm will publish all non-snapshot logical volumes, when activation of that LV is complete, and withdraw them before deactivating them.

mdadm will publish all MD devices, when each device has been properly started and is ready for I/O. It will withdraw each device before stopping it.

dmsetup will publish any device itself. User tools (scripts) which use dmsetup will need to be modified if automatic processing (mounting etc.) is desired. A utility will be provided to make publication and withdrawal easy (see below).

cryptsetup will publish a device when it successfully sets the keys and makes the encrypting dm-* device available for IO, and withdraw it before deactivating it.

evms will publish "logical volumes" (ie storage objects which contain mountable filesystems) when they are activated, and withdraw them before they are deactivated.

The udev rules which invoke the block device identification and activation tools (ie, the ones which run when physical block devices are detected and which also now run on the synthetic device creation) will set UDEV_SYNTHPUB_BLOCK.

Rationale

This design keeps internal block devices and messings-about used by lvm and other tools, internal to those tools. It notifies the rest of the system of the availability of a new block device iff that device was automatically discovered and should be automatically used.

Udev design

A new entrypoint in libvolume_id and a corresponding command-line tool (in the volumeid package) will be made available to allow programs to publish and withdraw synthetic devices. This librarylet entrypoint will be responsible for checking UDEV_SYNTHPUB_BLOCK (and likewise neither it nor the utility will fail if udev is not running, so that they can be used unconditionally).

This new function will work by sending udev a control messsage instructing it to create or remove the device in its device database. udevd will be made to send appropriate events in response, and will also record in its database that the device is synthetic.

udevtrigger will be made to notify udevd as well as the kernel; udevd will then regenerate the creation events for the synthetic devices in its database.

Problem unaddressed

The lvm and evms userspace tools, and mdadm in many modes, scan all block devices looking for physical volumes to consume and construct into logical volumes (or appropriate other terminology). This means that synthetic block devices not intended for publication may be opened and even processed.

To prevent this fact from causing trouble during normal boot-time device detection, we will use a single lock wrapping up all of the boot-time lvm, evms, mdadm, and cryptsetup scans. This will ensure that at any one time only one such scanning process can be running - and it is those very scanning processes which are doing the activation. This addresses problems with races although it does not prevent unwanted scanning of unpublished devices; be believe the latter to be largely harmless (since it happens quite a bit in pre-udev systems anyway).

If some other process on the system causes multiple simultaneous or near-simultaneous synthetic device creations, there may still be a race. The correct fix would be to fix all of the device-scanning systems not to scan synthetic block devices which have not been properly published. This would be straightforward to implement with a suitable entrypoint to libvolume_id.

Alternative approach - Publication list stored in kernel

The use of udevctrl to generate synthetic events for the published synthetic devices is not the only possible way to record and process this information.

As an alternative, it would be possible to allow user programs to create uevent structures via a suitable kernel interface. This would result in real kernel udev events and would avoid modifying udev. Instead, the userland utility for publishing synthetic devices would publish them by talking to this new kernel interface (and would provide mechanisms for listing and deleting entries as well, so that state kernel data can be removed).

The rest of the design remains largely unchanged.


CategorySpec