Launchpad Entry: https://launchpad.net/distros/ubuntu/+spec/replacement-init
Packages affected: sysvinit, initscripts
Replace the init daemon from the sysvinit package with a modern event-based system that is better able to guarantee a robust boot process and deal with the events from the modern kernel and removable hardware.
The move to the 2.6 kernel and all the "hotplug" goodness that it provides has left us with several problems in dapper. Because the kernel can support hardware coming and going, and due to the increase in removable hardware, it's no longer possible to guarantee that particular devices are available at a particular point in the boot process.
My usual example is that dapper cannot mount USB disks in /etc/fstab because it is not guaranteed that the block device exists at the point in the mount process where that happens.
Another example is that of a network-mounted /usr; the network device needs to be detected, firmware loaded if necessary, any security layer on the connection negotiated and an IP address arranged before the NFS mount can occur. There are work-arounds to this, such as dapper which sleeps in the boot process until /usr is mounted, but they are hacky and an elegant solution is desired.
There are many other reasons to replace the init system, described in the use cases below. The specified design is intended to be able to fulfil the most important ones for edgy and be extended to support the rest during future release cycles.
Why NIH our own?
Before writing this specification, a comprehensive review of the existing replacement init systems was performed and each one tested to discover whether it was able to solve our problems. Most of them by far were not fit, in fact only four passed the most basic test of being maintained by their author and suitable for production use.
The four candidates were [http://opensolaris.org/os/community/smf/ Solaris SMF], [http://developer.apple.com/macosx/launchd.html Apple's launchd], the LSB initserv/chkconfig tools and [http://www.initng.org/ initNG].
The first two of these suffer from unescapable licence problems, which is relatively unfortunate as both have features that are somewhat appealing. Whatever system we use being adopted as a Linux-wide standard would not be possible if we chose either of these two systems.
The LSB standard tools sadly do not come anywhere near the use cases we have, and certainly do not solve the problems we have experienced. They are tools for automatically choosing the order of the boot sequence and possibly introducing the ability to run multiple scripts simultaneously at a given point. They do not even begin to tackle the problem of running scripts due to events occurring externally to the init system, such as hardware insertion.
Finally there's initNG which sadly also does not tackle the problems we have been facing; again it is a system for ordering a pre-determined boot sequence rather than being able to handle a boot sequence that is determined as we go. The code base was also evaluated for suitability for modification to suit our purposes, and it was decided that the cost of doing that would be greater than the cost of beginning from scratch and that it would be more likely to introduce bugs.
Arguably, any new init system should start with a copy and paste of the sysvinit code anyway as that has already solved the interesting cases of being spawned by the kernel and spawning new processes from there. So that's the base we've chosen, and our own functionality will grow from there.
- Fabian is a power user who wishes to use a USB disk for part of his filesystem. This currently frequently fails because the USB disk sometimes takes longer to initialise than the boot process takes to get to the point where it mounts the filesystem. He would rather the boot process was robust, and the disk was mounted when initialised.
- Corey is the administrator of a number of servers, and has problems with certain daemons that frequently crash. He would prefer the daemons to be automatically restarted if this happens, to avoid loss of service.
- Orli owns an iPod and uses a popular piece of software to download podcasts onto it. He currently has to start the software when he plugs his iPod in, and remember to stop it afterwards. He would rather the system started and stopped the software automatically based on the presence of his iPod.
- Ethan is a software developer. He has a script that he wishes to run hourly, provided that the script is not still running from before. He would rather the task scheduler could take care of that for him, than have to reinvent a lock around the task.
- Ryan is a database administrator. He wishes the database to be automatically backed up whenever the server is shutdown, whether for upgrade or system reboot. There is currently no way for him to set a task to be run when a service is stopped.
- Justin is an ordinary user with a low-end system. He would rather services and hardware handlers were started only when needed, rather than on all systems.
- David is a system administrator. He needs to be able to tell which services failed to start on boot, examine why, and see which services are currently running.
- Thomas is a system administrator. He frequently gets frustrated that there is no consistency to how tasks are added to the system. A script to perform a task at shutdown must be written and activated completely differently to one performed when the system is shut down.
- Englebert is a security consultant. He has discovered several problems with processes that run task scripts not providing a consistent environment, including potential problems such as leaving file descriptors open.
- Hugo is an ordinary user and has to frequently reboot his computer. He would prefer that shutting down and booting up took as little time as possible.
- Sayid is an experienced UNIX user, with multiple years of experience. He does not wish to have to relearn that which he has learned already, and would rather continue using the tools that he is used to and only learn the newer ones when necessary.
- Matthieu is a distribution developer who maintains several packages that provide services or perform tasks. He does not want to have to update his packages until he is ready to take advantage of new features or abilities, his existing scripts should continue to work unmodified in their original locations.
While this specification proposes a new init system, it is not expected that any other services need to be modified immediately; backwards compatibility should be ensured. This limits the affected parts of the distribution to just a replacement for sysvinit and, if there is time, initscripts.
Also, while the eventual design includes the potential for replacing cron, at, inetd, etc. with the single daemon, this is not a goal for the edgy release.
This limitation of scope should make the goal attainable in the necessary time frame.
As the primary focus of this specification is dealing with modern hardware and its "coming and going" nature, neither of the two traditional designs of init systems are appropriate. The linear execution model fails because it becomes necessary to sleep and wait during the process for hardware to be available and the dependency-based model fails because services cause their dependencies to be started, rather than get started because their dependencies have been.
This design is best described as an event-based init system; services and tasks are started and stopped because an event they were listening for occurs. Services waiting for /usr to be mounted are started once that event has occurred and are stopped when there's a need to unmount /usr again. The event that causes /usr to be mounted would be the necessary block device appearing, or generated when the root-filesystem is mounted read-write (another event) if there is no separate partition.
In order to allow for the maximum flexibility, the init daemon does not restrict the set of events that can be triggered; external processes are permitted to trigger events that the daemon was not previously aware of. Simple events are therefore just a string that describes them, e.g. "startup"; events with a value are also permitted, so that "default-route" can be "up" or "down" depending on whether a default route is set or not. A service or task can then indicate it should be run while default-route is up, causing it to be automatically stopped before the network goes away.
Have you thought at all about what event values will be needed, e.g. whether any services are likely to need more structured event values - even just a list and syntax in while specifications for dealing with them? -- ColinWatson
There is very little here about how services specify the events they're waiting for. This seems reasonably complex: you have "on event", "while event", "on event if not arbitrary-condition" (see /usr example above), and so on. Are you going to reuse an existing language or invent a new mini-language? If a mini-language, how will it deal with complex questions like "is any /usr partition listed in /etc/fstab"? Where will services declare all this? etc. -- ColinWatson
The set of services and tasks are also not restricted by the init daemon, and may also be registered by external processes, including by non-root users (they'll be started as the registering user). This allows for future compatibility with other init systems by having a small utility to parse their configuration files and register the events with the daemon with the same semantics.
This seems like it might potentially allow unprivileged users to play havoc with the system by registering clashing services; also presumably if unprivileged users are allowed to register services then they can also generate events which would allow them to cause root-owned services to start and stop. Perhaps some kind of service/event namespacing or at least discussion of the security implications and what will be done to mitigate them would be appropriate. It may be appropriate to defer providing facilities to non-root users until a full security analysis of the code has been done; vulnerabilities or denial of service attacks on init would not be fun. -- ColinWatson
All services waiting on an event are normally started at the same time. This may not always be desirable, so services are also permitted to depend on others having previously started; if that has not yet happened, they are held until the dependencies are running or any event causing the service to stop again occurs. Services can indicate that they wish to be started if anything depending on them is waiting for them (just another event), thus providing a form of dependency-init functionality as well.
Again, I'd like a bit more detail on how this is to be configured. -- ColinWatson
The init daemon's job therefore is simply to hold a list of waiting and running services and adjust their state depending on the events that are received. Full-duplex communication with the rest of userspace is maintained so that both events and services can be queried for their state, registered and triggered manually.
Presumably by this you mean that userspace can query init for the state of events and services, rather than init querying the rest of userspace. Seems obvious on the third reading but not so much on the first. Also, do you intend to extend the System V telinit interface to provide query/register/trigger facilities, or start afresh with a new interface? -- ColinWatson
Obviously this is a potentially invasive change to the system that needs to be undertaken carefully so that no regressions occur; therefore the following implementation plan will be followed:
Development of the new init binary's core functionality, and testing locally and for other interested parties.
Development of core companion tools such as shutdown.
Replacement of the sysvinit binary package with the new package, configured to run /etc/init.d/rc at appropriate times so that no existing init scripts need be modified.
This point must be reached before FeatureFreeze with no regressions, or the change will be reverted and deferred to edgy+1.
Replacement of the initscripts binary package and the scripts therein with new scripts that take advantage of the new system. The existing init scripts from other packages will still be run by keeping /etc/init.d/rc.
Further plans will wait until edgy+1, and any spare time will be spent on testing and bug fixes rather than attempting to implement additional things that may not be as mature.
There are certain bugs in dapper that are waiting for implementation of this specification in order to be resolved (e.g. your USB disks in /etc/fstab example). In the event that this change is reverted and deferred to edgy+1, can anything (presumably hacky) be done about any of these bugs? If so, it would be worth mentioning that here by way of a Plan B. -- ColinWatson
The core init daemon and companion tools are to be written in C and be as safe as is humanly possible. It is suggested that the code be reviewed by multiple people such as MartinPitt to ensure security and the general advantage of new eyes on the code.
Data preservation and migration
No other packages need to be modified because the existing /etc/init.d/rc script will be retained; the new daemon will be configured to call this with the appropriate arguments at startup, shutdown and reboot. Run levels will be maintained through compatibility configuration such that init 3 would issue an event causing /etc/init.d/rc 3 to be run.
Packages for which there is an advantage to using the features of the new system may be modified, though that is not part of this specification.