MultiarchSpec

Revision 19 as of 2009-06-08 04:23:16

Clear message

Summary

This should provide an overview of the issue/functionality/change proposed here. Focus here on what will actually be DONE, summarising that so that other people don't have to read the whole spec. See also CategorySpec for examples.

Release Note

This section should include a paragraph describing the end-user impact of this change. It is meant to be included in the release notes of the first release in which it is implemented. (Not all of these will actually be included in the release notes, at the release manager's discretion; but writing them is a useful exercise.)

It is mandatory.

Rationale

The current handling of 32-bit software on the amd64 architecture is unwieldly in the extreme. A handful of libraries are packaged as "biarch" packages, building -386 variants using gcc -m32; most do not, and as a result are only available if they're included in ia32-libs, a monstrous source package that has to be updated to be kept in sync with each change to one of the component libraries, and now contains so many libraries that its "source" package (consisting primarily of copies of the i386 binary packages) weighs in as a 550MB tarball. Furthermore, many of these libraries have to be patched to specially handle running in a 64-bit environment because they load various plugins from a system path that is already occupied by the 32-bit library, resulting in added complexity in the code due to special-casing.

In all cases, the libraries (and some executables) have to be repackaged as amd64 packages, because dpkg and apt do not support sensible handling of cross-installation of i386 packages. This consumes archive space and developer time on an ongoing basis.

User stories

  • Phil wants to run VMware server on his 64-bit Ubuntu install, but it is only available as a 32-bit package. He enables use of the 32-bit repositories on his system in the Software Sources configuration, then selects the vmware i386 package from the Add/Remove menu. The dependencies on the i386 library packages are automatically resolved, including libpam-modules, and the packages are installed for him.

Assumptions

  • The multiarch directory scheme required in order to make library packages co-installable will be a target for FHS/LSB standardization in the future; but even if it does not become a standard, multiarch is preferable to the status quo, which also uses non-standard library directories for amd64 and is much more complex at the packaging level.
  • Since Debian Policy prohibits declaring shared libraries as Essential: yes, it is assumed that any undeclared dependencies on Essential packages on the part of multiarch packages is satisfied by the binaries from the native architecture, and there is no need to multi-arch-ify any of the Essential packages except where other packages declare versioned dependencies on them.

  • Dependencies, Build-Dependencies, and Recommends within an existing architecture foo will continue to remain closed over the set of packages declaring either Architecture: foo or Architecture: all.

  • The existence of flag days for self-contained sets of packages related to individual language interpreters is considered an acceptable constraint, because language interpreters are not significant targets for multiarch.

Design

Filesystem layout

In order to seamlessly accomodate more than one ELF ABI on a system, the library for each (soname,ABI) tuple must have a unique path on the filesystem. The FHS attempts to address this for the amd64 case by requiring that /usr/lib be reserved for 32-bit libraries, with 64-bit libraries located in /usr/lib64. This design has a number of shortcomings:

  • This is not forwards-compatible with any future ABI changes, which would require further design work and further modification of packages to handle the addition of new paths. (Indeed, this is already a concern for the MIPS architecture, which has three distinct ABIs in use in parallel.)
  • The amd64 architecture must be special-cased in the library packaging, as the only architecture that uses /usr/lib64 instead of /usr/lib. (Two pre-existing 64-bit Linux ports, Alpha and IA-64, also used /usr/lib for their 64-bit libs.) This is unnecessary added complexity.
  • It does not address emulation use cases, such as qemu, which could integrate much better and more efficiently with the system if the packages for a qemu arch were co-installable.

The current design used by Debian and Ubuntu also fails on a key point where the FHS layout does not:

  • The path for x86 and x86-64 libraries varies depending on whether the system is natively 32-bit or 64-bit, so translating paths at install time is insufficient for the general case because some libraries need to embed plugin paths in the binaries themselves.

Multiarch seeks to address all of these issues, at the expense of a one-time transition, by migrating libraries to subdirectories that include the architecture triplet as part of the path:

   /lib/i486-linux-gnu
   /lib/x86_64-linux-gnu
   /usr/lib/i486-linux-gnu
   /usr/lib/x86_64-linux-gnu

Further rationale for this layout can be found at http://err.no/debian/amd64-multiarch-3.

Binary package control fields

Although a simple change to a configuration option (dpkg --force-architecture) is sufficient to permit installation of .debs built for another architecture, more information is needed if we want package managers to intelligently resolve the dependencies for these packages. Some dependencies on other packages, such as on ELF libraries, can only be satisfied by a package of the same architecture ("x->x dependency"); others, such as dependencies on an interpreter used by a maintainer script, can be satisfied by a package of any architecture as long as it's functional ("x->any dependency").

This spec introduces a new binary package field, to be set on any package that is intended to be co-installable with copies of itself from other architectures.

Multi-Arch: same

If a package has this field set, the package manager must treat this package as not satisfying the dependency of any package of a different arch.

The same field with a separate value indicates a package should be allowed to satisfy the dependencies of a package of a different arch.

Multi-Arch: foreign

If the field is not set, then the default behavior is the same as today: dependency resolution is restricted to packages with the same architecture (with the exception of Architecture: all packages), and a package of the same name but a different architecture is assumed to not be co-installable.

If a package is declared Multi-Arch: foreign, preference should be given to a package for the native architecture if available; if it is not available, the package manager may automatically install any available package, regardless of architecture, or it may choose to make this an option controlled by user configuration.

Setting the Multi-Arch field on a package which is Architecture: all is considered an error. dpkg-deb must refuse to generate a .deb with this combination of values. Behavior when trying to install such a package is undefined.

Because the handling of packages without a Multi-Arch field remains the same, it is possible to introduce multiarch support incrementally to packages, starting at the bottom of the dependency tree and working upwards, with no flag days or requirements for shlibs bumps affecting non-multiarch packages:

  • A package for a foreign architecture is only installable if all of its (recursive) dependencies are either marked as multiarch or do not have corresponding packages installed for the native architecture. An incomplete multiarch conversion for a given dependency tree is equivalent to the status quo.
  • Attempting to install a foreign arch package with a pre-multiarch package manager will fail (unless --force-arch is specified); but dependencies within existing architectures will remain closed over the set of packages for that architecture (including Architecture: all packages), so multiarch does not impose any requirement to upgrade the package manager first before upgrading between releases. Multiarch packages will only be pulled in upon manually configuring the package manager to use them, following an upgrade, and therefore will not be needed to satisfy dependencies during a dist-upgrade.

  • Packages for the native architecture which have the Multi-Arch field set will always have their dependencies satisfiable by other packages within that architecture; therefore installing a package for the native architecture with a package manager which is not multiarch-aware will continue to give correct results, even when that package declares itself to be multiarch.

Extended semantics of per-architecture package relationships

Whether a dependency needs to be satisfied within a single arch or can be satisfied by a package of a foreign architecture is a property of the dependency relationship, not of the package fulfilling that dependency. In general dependencies on a given package will all be of one type or the other, so we optimize for the common case by allowing the depended-on package to declare this via its use (or not) of the Multi-Arch: foreign value.

However, there will be cases where a single package has reverse-dependencies of both sorts. An example of this is the python package: there are packages which depend on it to use it as an interpreter (a foreign-arch relationship), and there are packages which depend on it because they provide loadable, ELF DSO language extensions for the language (a same-arch relationship). A single field on the depended-on python package is insufficient to express both relationships.

To address this case, an extension to package relationship fields is introduced:

Depends: python [same]

This value in the binary stanza of a debian/control file declares that a package's dependency on python must be satisfied by a package of the same architecture, even if the python package declares itself as Multi-Arch: foreign. This reuses the existing bracket notation ([]) for architecture-specific dependencies, combined with a reserved "architecture" value of same. Unlike existing uses of this bracket notation, which are used by dpkg-gencontrol to filter the field by architecture when generating the binary package control file, this special value will be passed through to the binary package control file for interpretation by dpkg at package install time.

Only the special case of [same] will be output by dpkg-gencontrol and understood by dpkg. No other bracketed values, including values matching known architectures, are addressed by this implementation, and behavior when encountering such values in a binary package's control file is undefined. (It is recommended that dpkg and apt treat such package relationships as unsatisfiable.) This means that declaring relationships on packages of a specific foreign architecture is currently unsupported.

It is worth noting that existing package management tools will be unable to interpret and satisfy package relationships of this format, even when the desired package is available. Consequently, it is recommended to defer use of such package relationships in the archive for a full release cycle following the package management implementation. Furthermore, since there is no way to express that a package declaring Multi-Arch: foreign does not satisfy a given dependency without this extended syntax, it is also recommended that language interpreters that support dynamically loadable language extensions not declare themselves as Multi-Arch: foreign for a full release cycle to prevent accidental installation of incompatible package combinations. This restriction is not expected to be a significant impediment to the deployment of multiarch packages, because most packages which are targets of interest for multiarch don't depend on interpreters outside of the Essential set.

Since partial upgrades of systems where a language interpreter package foolang is declared as Multi-Arch: foreign but its language extensions have not yet been updated to declare Depends: foolang [same] will incorrectly permit language extension packages to be installed with incompatible interpreters, this approach will require a flag day for each intepreter package which will be declared Multi-Arch: foreign. It is recommended to address this by first transitioning all extension packages for a given foolang to use Depends: foolang [same], and afterwards transition the interpreter package to Multi-Arch: foreign. Because most such language intepreters are very portable, and are therefore not major targets for cross-installation, it is considered acceptable to complete such a transition within the span of a single release cycle in spite of the small risk of incorrect partial upgrades that results.

For the cases of common language intepreters in the base system (perl, python), it is expected that the available package helpers will facilitate conversion of these dependencies, which are already generated at package build time.

Handling of architecture-independent files in multiarch packages

Today, there are a number of reasons for which library packages ship files that should be shared across ABIs (i.e., "architecture-independent" files):

  • config files
  • documentation files (/usr/share/doc/<package>/{copyright,changelog} )

  • data files

Debian/Ubuntu policy already states that files whose names do not change with each soname change should not be included in the shared library package; so in general it is already wrong to ship config files and data files in a shared library package, though the practical impact of this varies. (For instance, the soname of glibc is not expected to change any time in the future, so the libc6 package currently unapologetically ships helper binaries, config files, and man pages in the shared library package.) However, /usr/share/doc/<package> is expected to be provided by every package installed on the system, so a general solution is needed for multiarch packages that must be co-installable while shipping architecture-independent files.

Implementing this involves an implicit Replaces: ${self}:any (<< ${binary:Version}) in all multiarch packages. In addition, multiarch packages are required to be kept in lockstep; i.e., an implicit Breaks: ${self}:any (!= ${binary:Version}). If more than one architecture of a package is present on the system, this will prevent either package from being configured unless they are all at the same version.

In addition, dpkg will implement an internal checksum database for files it installs, and reference counting for files shared by multiarch packages. Muliarch packages with differing versions of any file will also be treated as declaring reciprocal Breaks.

The future of bi-arch packages

A number of existing packages in the archive have an Architecture: field declaring one architecture, while containing code compiled for another architecture. This is most commonly the case for those architectures which have natural 32-bit/64-bit complements; e.g., 32-bit x86 code packaged as Architecture: amd64, or 64-bit x86 code packaged as Architecture: i386.

The vast majority of such packages will be obsoleted by multiarch. However, there are a very small number of cases, such as bootloaders, where these cross-architecture packages have legitimate reverse-dependencies; so in order to maintain dependency closure within the architecture, it is necessary to continue cross-building these packages.

This means, in particular, that the gcc-multilib package must continue to remain available on such architectures, along with biarch versions of any libraries it depends on. The set of libraries is limited to those built from gcc source, plus libc6-dev-i386 (or equivalent). The gcc-multilib package is also relevant to third-party developers who wish to be able to compile code using gcc -m32; while gcc -m32 could also be implemented using multiarch packages, the existing multilib solution already addresses this use case without the need for additional design.

Where packages that contain cross-built code of this nature do not have reverse dependencies in the archive, consideration should be given to dropping the package for the architecture in question and supporting its installation as a multiarch package instead.

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

UI Changes

Should cover changes required to the UI, or specific UI that is required to implement this

Code Changes

Support for the Multi-Arch field should be added to dpkg ASAP so that it will be propagated to the Packages files without the need for a time-consuming transition from XB-Multi-Arch to Multi-Arch. (Cf. the Vcs-* fields, for which there are still a number of packages declaring X-Vcs-* in the archive.)

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

The following issues have been identified as out-of-scope for the present specification and may be considered targets for future spec work. Each of these are dependent on having a base implementation of multiarch, but do not impact the design or implementation of multiarch in the package manager and are therefore not considered blockers for the present spec.

Co-installable -dev packages

It would be useful for developers working in a cross-build environment to be able to use library packages for the target architecture on their build systems without modification, and to be able to install these packages without removing the corresponding native packages (e.g.: libc6-dev, which is a dependency of gcc itself). This requires an architecture-qualified filesystem layout specification for architecture-dependent headers, and support for said layout in the toolchain, but otherwise imposes no requirements on the package manager beyond those identified in the present spec.

Co-installable packages for executables

Co-installation of executables would potentially make it possible to reuse a single disk image on systems of multiple architectures with no modification. This could be implemented on top of multiarch using architecture-qualified paths for executables, but would require an additional mechanism (such as kernel support, or boot-time symlinking) to implement PATH handling.

Autodetection of supported ABIs

This spec does not define a mechanism for indicating which foreign ABIs are supported on a given system. For the initial implementation, it is assumed that users will manually designate the architectures that they wish to opt in to. In the future, this may be extended to permit packages to indicate that they provide an implementation of a given ABI (e.g., Package: qemu-sparc, Implements: sparc) to support autoconfiguration of the package manager, but this is not required for the first implementation.

Outstanding issues with this spec

The following issues still need to be addressed before this spec can be considered complete.

  • Conflicts: do we need to support Conflicts: $package [same], to allow packages to limit a conflict to packages of the same arch? (Assumes that by default, Conflicts will apply in a cross-architecture manner)

BoF agenda and discussion

A number of other designs were considered at various points, during UDS and elsewhere. They are documented here for reference, including the rationale for discarding them (if known).

Allow official packages to have cross-architecture deps

In some cases, it would reduce package duplication in the archive if packages were allowed to explicitly depend on packages from another architecture. This option was rejected in favor of maintaining a small number of biarch packages in the archive, because:

  • The number of affected packages is small.
  • There would be a significant implementation cost on the archive side to replicate the existing support for archive integrity checks (britney and/or dak, soyuz and/or checkrdepends, and many other tools would need to become multiarch-aware).
  • Trivial implementations of this would involve significant bloat to the Packages files, and non-trivial implementations would likely add significant cost to the already-expensive Packages file generation.

Permit packages without Multi-Arch set to satisfy foreign dependencies

An earlier draft suggested that packages without the Multi-Arch field set should be interpreted as satisfying dependencies in a cross-architecture manner. This was unsatisfactory because it would require a flag day to ensure that, when a package's dependency did need to be satisfied by a package of the same architecture, it was not incorrectly satisfied by an older, pre-multiarch version of the package.

Require separate -common packages for all multiarch packages

Other options for handling of architecture-independent data associated with library packages have been proposed in the past, which require minimal changes to dpkg in order to deal with file conflicts. One of these options which has gained some traction in the past is to require each multiarch lib package to have a separate Architecture: all "-common" package to contain the shared files. This would require no support at the dpkg level for coping with file conflicts since there would be none, but it has significant disadvantages:

  • It requires large and potentially fragile per-package changes to implement.
  • It increases the size of the Packages file for all users.

Since dpkg must be modified in any case in order to support the Multi-Arch field, it is considered preferable to also special-case the handling of file conflicts in dpkg instead of requiring intrusive per-package changes.


CategorySpec