UdevLvmMdadmEvmsAgain

Differences between revisions 5 and 6
Revision 5 as of 2007-05-15 16:15:24
Size: 4181
Editor: chiark
Comment: specify the librarylet to use
Revision 6 as of 2007-05-15 17:04:28
Size: 5966
Editor: chiark
Comment: more discussion
Deletions are marked like this. Additions are marked like this.
Line 16: Line 16:
For userspace-provided devices such as dm-* and md-* the relevant userspace tools will be responsible for device node creation and also for triggering consequential scanning. For userspace-provided devices such as dm-* and md-* (hereafter, ''synthetic'' devices) the relevant userspace tools will be responsible for device node creation and also for triggering consequential scanning.

The intent is that synthetic devices will only be processed automatically if they have been ''published'' explicitly by the tool which creates them.
Line 26: Line 28:
== Details of synthetic event generation conditions == == Details of synthetic device publication conditions ==
Line 28: Line 30:
No synthetic device creation will be recorded unless atoi(getenv("UDEV_SYNTHGEN_BLOCK")). Synthetic device removal will always be recorded (and will be idempotent if there was no previously recorded synthetic device creation). Publication of a synthetic device means that that device should be processed further by automatic scanning tools. Currently publication only affects udev rules (see ''Problem unaddressed'', below).
Line 30: Line 32:
lvm will record synthetic device creation for all non-snapshot logical volumes, when activation of that LV is complete. No synthetic device publication will be published unless atoi(getenv("UDEV_SYNTHPUB_BLOCK")). Synthetic device removal will always be recorded as withdrawal of any matching synthetic device (and will be idempotent if there is no such published synthetic device).
Line 32: Line 34:
mdadm will record synthetic device creation for any MD device, when the device has been properly started and is ready for I/O. lvm will publish all non-snapshot logical volumes, when activation of that LV is complete, and withdraw them before deactivating them.
Line 34: Line 36:
dmsetup will never record synthetic device creation itself. User tools (scripts) which use dmsetup will need to be modified if automatic processing (mounting etc.) is desired. A utility will be provided to make this easy (see below). mdadm will publish all MD devices, when each device has been properly started and is ready for I/O. It will withdraw each device before stopping it.
Line 36: Line 38:
cryptsetup will record synthetic device creation when it successfully sets the keys and makes the encrypting dm-* device available for IO. dmsetup will publish any device itself. User tools (scripts) which use dmsetup will need to be modified if automatic processing (mounting etc.) is desired. A utility will be provided to make publication and withdrawal easy (see below).
Line 38: Line 40:
evms will record synthetic device creation for "logical volumes" (ie storage objects which contain mountable filesystems). cryptsetup will publish a device when it successfully sets the keys and makes the encrypting dm-* device available for IO, and withdraw it before deactivating it.
Line 40: Line 42:
The udev rules which invoke the block device identification and activation tools (ie, the ones which run when physical block devices are detected and which also now run on the synthetic device creation) will set UDEV_SYNTHGEN_BLOCK. evms will publish "logical volumes" (ie storage objects which contain mountable filesystems) when they are activated, and withdraw them before they are deactivated.

The udev rules which invoke the block device identification and activation tools (ie, the ones which run when physical block devices are detected and which also now run on the synthetic device creation) will set UDEV_SYNTHPUB_BLOCK.
Line 48: Line 52:
A new entrypoint in libvolume_id and a corresponding command-line tool (in the volumeid package) will be made available to allow programs to record the creation and removal of synthetic devices. This librarylet entrypoint will be responsible for checking UDEV_SYNTHGEN_BLOCK (and likewise neither it nor the utility will fail if udev is not running, so that they can be used unconditionally). A new entrypoint in libvolume_id and a corresponding command-line tool (in the volumeid package) will be made available to allow programs to publish and withdraw synthetic devices. This librarylet entrypoint will be responsible for checking UDEV_SYNTHPUB_BLOCK (and likewise neither it nor the utility will fail if udev is not running, so that they can be used unconditionally).
Line 54: Line 58:
== Problem unaddressed ==

The lvm and evms userspace tools, and mdadm in many modes, scan all block devices looking for physical volumes to consume and construct into logical volumes (or appropriate other terminology). This means that synthetic block devices not intended for publication may be opened and even processed.

To prevent this fact from causing trouble during normal boot-time device detection, we will retain the single lock wrapping up all of the boot-time lvm, evms, mdadm, and cryptsetup scans. This will ensure that at any one time only one such scanning process can be running - and it is those very scanning processes which are doing the activation. This addresses problems with races although it does not prevent unwanted scanning of unpublished devices; be believe the latter to be largely harmless (since it happens quite a bit in pre-udev systems anyway).

If some other process on the system causes multiple simultaneous or near-simultaneous synthetic device creations, there may still be a race. The correct fix would be to fix all of the device-scanning systems not to scan synthetic block devices which have not been properly published. This would be straightforward to implement with a suitable entrypoint to libvolume_id.

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

In feisty we changed the way LVM, software raid (mdadm), EVMS, et al, interact with udev, to try to improve the situation. This was not a success and the resulting arrangements are racy.

We now propose an alternative approach which we believe will reduce (and hopefully eliminate) races.

Principle

For userspace-provided devices such as dm-* and md-* (hereafter, synthetic devices) the relevant userspace tools will be responsible for device node creation and also for triggering consequential scanning.

The intent is that synthetic devices will only be processed automatically if they have been published explicitly by the tool which creates them.

Design

  • udev will be made to ignore kernel-generated events for devmapper and md devices.
  • the udev rendezvous functionality will be removed from libdevmapper; instead, libdevmapper will once again create device nodes itself (and we will set the default permissions to 600 root.root)
  • udev will be modified to support userspace-generated events (aka synthetic events; see below)
  • the userspace tools (lvm, mdadm, and evms, but not dmsetup) will be modified to generate synthetic udev events for each "top level" device when that device has been initialised and ready for I/O.
  • udev will respond to the synthetic event by overwriting the device node generated libdevmapper (or whatever) with one of its own with the correct ownership and permissions, using rename(2) to install it; it will also do normal vol_id scanning of the resulting device.

Details of synthetic device publication conditions

Publication of a synthetic device means that that device should be processed further by automatic scanning tools. Currently publication only affects udev rules (see Problem unaddressed, below).

No synthetic device publication will be published unless atoi(getenv("UDEV_SYNTHPUB_BLOCK")). Synthetic device removal will always be recorded as withdrawal of any matching synthetic device (and will be idempotent if there is no such published synthetic device).

lvm will publish all non-snapshot logical volumes, when activation of that LV is complete, and withdraw them before deactivating them.

mdadm will publish all MD devices, when each device has been properly started and is ready for I/O. It will withdraw each device before stopping it.

dmsetup will publish any device itself. User tools (scripts) which use dmsetup will need to be modified if automatic processing (mounting etc.) is desired. A utility will be provided to make publication and withdrawal easy (see below).

cryptsetup will publish a device when it successfully sets the keys and makes the encrypting dm-* device available for IO, and withdraw it before deactivating it.

evms will publish "logical volumes" (ie storage objects which contain mountable filesystems) when they are activated, and withdraw them before they are deactivated.

The udev rules which invoke the block device identification and activation tools (ie, the ones which run when physical block devices are detected and which also now run on the synthetic device creation) will set UDEV_SYNTHPUB_BLOCK.

Rationale

This design keeps internal block devices and messings-about used by lvm and other tools, internal to those tools. It notifies the rest of the system of the availability of a new block device iff that device was automatically discovered and should be automatically used.

Udev design

A new entrypoint in libvolume_id and a corresponding command-line tool (in the volumeid package) will be made available to allow programs to publish and withdraw synthetic devices. This librarylet entrypoint will be responsible for checking UDEV_SYNTHPUB_BLOCK (and likewise neither it nor the utility will fail if udev is not running, so that they can be used unconditionally).

This new function will work by sending udev a control messsage instructing it to create or remove the device in its device database. udevd will be made to send appropriate events in response, and will also record in its database that the device is synthetic.

udevtrigger will be made to notify udevd as well as the kernel; udevd will then regenerate the creation events for the synthetic devices in its database.

Problem unaddressed

The lvm and evms userspace tools, and mdadm in many modes, scan all block devices looking for physical volumes to consume and construct into logical volumes (or appropriate other terminology). This means that synthetic block devices not intended for publication may be opened and even processed.

To prevent this fact from causing trouble during normal boot-time device detection, we will retain the single lock wrapping up all of the boot-time lvm, evms, mdadm, and cryptsetup scans. This will ensure that at any one time only one such scanning process can be running - and it is those very scanning processes which are doing the activation. This addresses problems with races although it does not prevent unwanted scanning of unpublished devices; be believe the latter to be largely harmless (since it happens quite a bit in pre-udev systems anyway).

If some other process on the system causes multiple simultaneous or near-simultaneous synthetic device creations, there may still be a race. The correct fix would be to fix all of the device-scanning systems not to scan synthetic block devices which have not been properly published. This would be straightforward to implement with a suitable entrypoint to libvolume_id.


CategorySpec

UdevLvmMdadmEvmsAgain (last edited 2008-08-06 16:31:00 by localhost)