UdevLvm

Differences between revisions 7 and 8
Revision 7 as of 2006-12-07 18:20:04
Size: 5437
Editor: chiark
Comment: no handling of removal
Revision 8 as of 2006-12-08 11:35:36
Size: 4822
Editor: chiark
Comment: answer to the race question
Deletions are marked like this. Additions are marked like this.
Line 61: Line 61:
The discussion regarding the "race between vgchange creating the /dev/VGNAME/LVNAME symlinks and udev receiving the device-mapper block event" is unclear. The nature of the race is not stated. It is true that the order of symlink creation and block device handling by udev is arbitrary. The "race between vgchange creating the /dev/VGNAME/LVNAME symlinks and udev receiving the device-mapper block event" is that the udev rules for processing the dm device for the lv might run before the symlinks are created.
Line 63: Line 63:
IIRC the argument was that in our setup udev will be creating the block device nodes, which might therefore not exist when vgchange wants to make the symlinks and which therefore would disrupt vgchange. However, vgchange does not attempt to stat the devmapper device during setup, and the symlinks themselves are harmless. So this does not appear to be a problematic race. In fact, this is addressed as follows: the new rule for running vgchange as a result new block devices will apply both to the pv device and to the lv dm device (set up by vgchange run as a result of the pv vgchange). In both cases, vgchange will take out the vg lock. This means that the second vgchange (which does nothing very much) cannot exit until the first has finished (ie, the symlink is present). Putting the vgchange udev rule earlier in the sequence than other rules is sufficient to avoid the race.
Line 65: Line 65:
An alternative argument is that something triggered by the udev devmapper event will rely on the symlink. However in general everything using the device, eg automatic mounting, should use the devmapper device name directly. This is a stable name and avoids confusion which can occur when a symlink is used eg in fstab. So I think there should be no problem of this kind either. '''Limitation''': The system, as designed above, will not cope at all with device removal events.
Line 67: Line 67:
I will make a local test setup and arrange for testing that the symlink creation is unreasonably delayed, to check that things still work if it loses the race (ie, that the above analysis is correct).

''This discussion is not quite right but I'm still investigating. -iwj''

 -- IanJackson 5.12.2006

'''Limitation''': The system, as designed above, will not cope at all with device removal events. --IanJackson 7.12.2006
 -- IanJackson 8.12.2006

Please check the status of this specification in Launchpad before editing it. If it is Approved, contact the Assignee or another knowledgeable person before making changes.

Summary

This specification details how to make udev and LVM play nicely together, in particular ensuring that udev events are issued for LVM volumes and that UUIDs are correctly exported and do not conflict.

Rationale

LVM is used by system administrators to collect block devices together into Volume Groups and then split them into Logical Volumes which can be readily resized and adjusted within the group without needing complicated work. In order to support event-based mounting of these filesystems, we need reliable events from the block subsystem and no race conditions.

Use cases

  • Fabio uses a combination of LVM and RAID for his root filesystem, he would like this to continue to be supported.

Scope

The scope of this specification is limited to the interaction between udev and LVM2; other specifications address similar concerns with device-mapper (on which LVM is based), RAID, etc.

Design

vgchange -ay is the command run to iterate all block devices on the system (filtered by CD-ROM, etc. devices) and combine them into volume groups, and activate logical volumes. If the necessary components of a group cannot be found, it is not activated.

This can be run directly from a udev rule whenever an LVM block device is added to the system. vol_id can be used to determine whether a block device is an LVM Physical Volume or not.

As LVM builds Logical Volumes using device-mapper, a further block event will be issued when the logical volume has been activated.

There is a race between vgchange creating the /dev/VGNAME/LVNAME symlinks and udev receiving the device-mapper block event for the device that those should point to. For this reason, vgchange will be instructed not to create these symlinks and instead vgmknodes will be called from a udev rule for device-mapper block devices.

When mirroring is involved, it's possible for a logical volume to be mounted even though it's not yet complete. We will offer the option of forcing a partial volume group mount after a timeout if it has not yet been activated.

As with device-mapper, the Volume Group and Logical Volume name, and thus the device path, consitutes a unique identifier; there is no need for UUID or LABEL support for these block devices. We will continue to ignore them.

Implementation

  • Patch vgchange to accept an option to inhibit device symlink creation.

  • Add a udev rule to call vgchange when block devices are added:
    • SUBSYSTEM=="block", RUN+="watershed /sbin/vgchange -ay --no-symlinks"
  • Add a udev rule to call vgmknodes when device-mapper block devices are added:
    • SUBSYSTEM=="block", KERNEL=="dm-[0-9]*", RUN+="watershed /sbin/vgmknodes"
  • Modify the mountroot script to give the option of attempting vgchange -P after a timeout.

The watershed command used in the rules above is a tool to ensure that vgchange is run as many times as are necessary to process the incoming events. It works by locking a known filename, clearing a state file, and then running the command. If it cannot lock, it writes to the state file, and exits. When the command finishes, it checks the state file, and if it exists it loops and runs the command again.

This means that if two events come in hours apart, it is run twice. If one hundred come in in rapid sequence, it will be run at least twice, but usually not 100 times.

Notes from the actual implementation

The "race between vgchange creating the /dev/VGNAME/LVNAME symlinks and udev receiving the device-mapper block event" is that the udev rules for processing the dm device for the lv might run before the symlinks are created.

In fact, this is addressed as follows: the new rule for running vgchange as a result new block devices will apply both to the pv device and to the lv dm device (set up by vgchange run as a result of the pv vgchange). In both cases, vgchange will take out the vg lock. This means that the second vgchange (which does nothing very much) cannot exit until the first has finished (ie, the symlink is present). Putting the vgchange udev rule earlier in the sequence than other rules is sufficient to avoid the race.

Limitation: The system, as designed above, will not cope at all with device removal events.


CategorySpec

UdevLvm (last edited 2008-08-06 16:35:51 by localhost)