Launchpad Entry: etc-under-revision-control
Created: 2008-11-19 by ThierryCarrez
Packages affected: bzr, etckeeper
Keeping /etc under a revision control system allows to maintain traceability over changes made to configuration files. This should be made easy enough and non-intrusive enough so as to make a good candidate for integration in default Server installs, as part of our "Best Practices integration" theme for Ubuntu Server.
Ubuntu Server ships by default with a new traceability feature, built on etckeeper and bzr, integrated together using a specific bzr plugin. This transparently allows to keep an history of configuration changes made to your system, whether those changes are made by package updates or manually.
Applying a revision control paradigm to configuration files has been a known best practice for Linux sysadmins for a long time, however everyone uses different mechanisms to achieve that result. It is often seen as an intrusive and cumbersome process that might not be worth the hassle of setting it up and sticking to it.
One of the themes we are pushing in Ubuntu Server is integration of industry best practices : make powerful concepts and tools easy enough to use so that we can include them by default and make them part of what "Ubuntu Server" is.
We can solve the perceived drawbacks by integrating a comprehensive revision control system that hooks into the system to provide a comprehensive view on the package installed and the configuration changes applied during the life of the server. It should maintain, even if the sysadmin doesn't commit his configuration changes, a usable history. If successful, it can be made a default feature of Ubuntu Server, which you could even ignore until you need it.
- Alan just did a package update and wants to compare the configuration changes before and after the installation, including permissions and ownership changes
- Betty wants to extract a complete history of the packages installed and the configuration changes from all her Ubuntu servers
- Chuck needs to know what the two other admins, Daniel and Emmet, might have been doing on that box that would explain this performance drop he noticed since the last week
- Fiona doesn't know about revision control, she wants to be able to ignore it completely and doesn't want her box to be significantly slower or cruftier because this was made a default as part of a "Best Practice integration" campaign
- Gunther accidentally changed permissions on a directory and want to revert those files to the last committed revision to clean up the mess
- Provide init/commit/diff/log/revert/purge commands (commit being a add+commit since we always want to add the missing files)
- Versioning of file contents, permissions and ownership: it is necessary to keep track of all those changes since they are all significant for configuration files. This information should be as integrated as possible in the tool output/action.
- Comprehensive and usable without user interaction: although the admin is free to commit specific changes every time he changes configuration files, the system should provide usable traceability even if he doesn't. That means autocommitting before and after package installation, and autocommitting once a day to get day-by-day uncommitted changes.
- By-user traceability: whenever possible, the revisions should show the userid of the user who committed the change (daily autocommits being a notable exception).
- Non intrusive: the system should be as unnoticeable as possible. That means for example keeping the VCS metafiles in a specific directory, or reducing messages that get printed out on pre- and post-package-install autocommits.
- Reduced footprint: try to reduce the footprint on the system by making the VCS database as small as possible and choose the options that make the VCS as fast as possible.
etckeeper, written by Joey Hess and already in universe, provides most of the needed features. It is written in shell script and is easily extensible. It features:
- Compatibility with multiple VCS
- Autocommit hooks before and after using Apt
- Maintenance of a /etc/.etckeeper file describing specific ownership/permissions
- Static list of file patterns to ignore
- bzr "etckeeper" plugin to handle pre-commit actions (regeneration of the permissions database)
Unfortunately, at this point etckeeper doesn't integrate the ownership/permissions change in the output and result of the VCS tool: differences in ownership/permissions are shown as a diff in the .etckeeper file, and revert operations must be followed by the appropriate manual ownership/permissions restoration.
It can therefore be a little tricky to use for less experienced users, which are the target of this blueprint. To solve this issue, we can leverage bzr extensibility to provide an improved bzr "etckeeper" plugin that will:
- Integrate ownership/permissions differences at file level in diff output
- Ensure proper ownership/permissions restoration in case of reversion
We also need to ensure install-time initialization of etckeeper: running "apt-get install etckeeper" should give us a working system (rather than requiring post configuration actions).
etckeeper core improvements
The following new features and bugfixes should be added to etckeeper:
By-user traceability (LP:#321406)
Optional daily autocommit cronjob, to catch uncommitted work automatically -- on by default ? (LP:#321409)
Automatically initialize etckeeper if a single VCS is present and no repository is present (LP:#297920)
Other interesting new features could be:
- "etckeeper purge" for repository clean-up
- ignore.d directory so that packages can document their own /etc weirdness (rather than a static ignore list in etckeeper)
bzr etckeeper plugin improvements
The plugin should support the following new features:
Display ownership/permissions changes in diff output (LP:#322327)
Restore ownership/permissions on tree transforms (LP:#322339)
Small changes to bzr core could help a lot in implementing this. In particular, bzr is currently missing the right plugin hooks to make the bzr-etckeeper plugin painless.
Example: add a hook for new files created during a tree transform so that we can plug the right ownership/permissions.
bzr currently pulls (through "recommends") things that we probably don't want on a server: bzr recommends bzrtools, which recommends graphviz, which depends on a number of X libraries.
To make this a suitable server target, we might need to clean up some of these dependencies.
Was filed as LP:#321852
Target for the Jaunty cycle would be to provide this feature as an option that can be enabled, simply by running a single install command.
For the next development cycle, enabling this feature by default for all server installs would be considered. This leaves enough time to iron out the remaining issues before the next LTS.
(This need not be added or completed until the specification is nearing beta.)
(This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.)
BoF agenda and discussion
An alternative to implementing permissions/ownership into bzr is to implement a framework for storing generic metadata into bzr repository, with hooks to set/get the metadata. After discussion with the bzr team, the advantages over the separate repository + wrapper approach (option 2) are not obvious. By allowing direct bzr access to metadata you run the risk of unsyncing the files and their metadata. -- ThierryCarrez
Storing pristine and local states on DVCS branches
A crucial feature for a DVCS-based /etc is to store local changes in a separate branch from the packaged pristine state. This lets the admin do a VCS diff operation to figure out how their current state differs from the default. Furthermore, when a package upgrade changes the default config, the local changes can be merged in using a VCS merge operation.
This would be a huge improvement over the current situation where dpkg basically forces you to choose between the new pristine config or the old locally-modified config, and merge any differences manually with no information about which differences are which (a 2-way merge instead of a 3-way merge). -- AndersKaseorg
Interesting feature, which IIUC depends on dpkg to be able to commit original configuration files to a "pristine" branch. It is on the etckeeper roadmap. -- ThierryCarrez
Other design options
Option 1: Using etckeeper
etckeeper provides the following commands (with bzr):
- etckeeper init: runs bzr nick + init, or restores permissions from /etc/.etckeeper. Builds a static .bzrignore file.
- etckeeper pre-commit: warns about special files, rebuilds /etc/.etckeeper
- etckeeper commit: runs bzr add + commit. A bzr plugin is injected to automatically call pre-commit before commit (though that doesn't seem to work out of the box currently)
- etckeeper pre-install: Pre-Install-Pkgs apt hook, caches the list of packages installed before apt runs, autocommits uncommitted changes
- etckeeper post-install: Post-Invoke apt hook, commit changes with a message that shows packages installed/removed since pre-install
It would need to be improved/completed in the following areas:
By-user traceability (see Debian bug #498739)
- Make sure the bzr plugin pre-commit thing works out-of-the-box
- Make it less noisy to keep a low profile
Implementation would provide a metapackage that depends on etckeeper and provides the following extra features:
- Pick a VCS
- Run etckeeper init / commit at install time to be ready for use automatically
- Add a cron job for daily autocommits to get an usable automatic history of changes
- etckeeper only provides a few commands (init, commit), for the rest (diff/log/revert) you have to use bzr directly. Knowing when to use etckeeper and when to use bzr directly can be tricky for the less-experienced user
- No integrated diff output: Ownership/permissions changes are only tracked by the diff in the /etc/.etckeeper file, which is less intuitive than a properties change shown at the file level in the diff
- Will break if you use "bzr revert" to restore files without manually pushing back the corresponding permissions
Option 2: A bzr wrapper ("Captain's Log")
We might prefer to provide a simpler tool, less likely to break your system, and that doesn't require prior VCS knowledge. This new Python tool (which I call "Captain's log") wraps around bzrlib and tightly integrates an ownership/permissions database into the output. By exposing a limited subset of VCS functionality, it makes sure you don't enter disruptive VCS commands. It would provide the following features:
- Maintenance of a specific ownership/permissions datastore
- Autocommit hooks before and after using Apt, with versionning of a "package-installed" list
- Expose and wrap-around specific bzr commands to integrate permission/ownership changes (and packages installed/removed) in the result: captainslog commit, captainslog diff, captainslog log (aka "captainslog"), captainslog revert...
- A daily autocommit cronjob to catch any uncommitted work automatically
- Prevent calling "bzr" directly on the repository to avoid accidental damage
The wrapper approach allows more advanced features like:
- A dynamic ignore.d directory where packages with weird /etc files can document their weirdness
- Lower privilege model where captainslog would drop root if/when possible
- Expose a pass-through command that would proxy the direct bzr command with appropriate warnings
Future extensions could go beyond the "easy traceability" goal to reach more branching/sharing features: this would be done by proxying more bzr commands.
- Need to write and maintain a new tool, in the spirit of python-vm-builder or ufw
Option 3: Use an enhanced bzr natively
To avoid the risky direct bzr commands on the repository, the bzr wrappers in Option 2 (and 2.1) prevent direct usage of bzr over the repository, but that restricts what you can do with the tool. So another option is to implement permissions/ownership directly into bzr, to make sure all options support it in the same way they already support the "x" bit. This feature would make bzr the clear VCS choice for putting anything other than sources under revision control.
Implementation would then just provide:
- Autocommit hooks before and after using Apt
- optional daily autocommit cronjob to catch any uncommitted work automatically
Lots of work in bzr core, as this won't fit into the plugin API
- Losing the simplicity of a specific interface (need to learn the corresponding VCS commands)
- Less flexibility to implement features like the ignore.d directory, lower privilege security model or "packages-installed" list
See bug 67589 for an existing bzr feature request about this.
Option 4: A wrapper (or enhanced etckeeper) around an enhanced bzr
This option takes the best of options 2 [.1] and 3. It provides a specific interface for the less-experienced user while still allowing direct usage of a permissions-aware bzr.