MultiarchCross

Revision 19 as of 2012-03-29 02:32:50

Clear message

Summary

Extensions to the MultiarchSpec necessary for automated cross compilation and toolchain builds.

Motivation

Many people want cross-compiling to work easily on Debian systems. This requires the availability of cross-toolchains, and the ability to install cross-dependencies before building a package. The emdebian project provides precompiled binaries for several architectures, for some of which it is not possible or sensible to compile natively, but it is not currently possible to build and distribute these tools within Debian itself. Building these packages in the archive, with autobuilders, requires cross-architecture build dependencies, and for same-arch only build dependencies to be specifiable.

For cross-building packages in general the library dependencies of the package need to be satisfied for the DEB_HOST_ARCH architecture and the tool dependencies of the package need to be satisfied for the DEB_BUILD_ARCH architecture.

Current Status

The binutils package can be told to build a cross variant by passing a special environment variable. As there are no dependencies on target packages, this works for any GNU triplet supported by upstream, and can be automated easily.

The gcc package has a mechanism in place that rewrites various files in debian/ so that a set of cross compiler packages is built. The rewritten control file then declares build dependencies on several packages whose names end in a dash, the target Debian triplet and "-cross". These must be generated with the "dpkg-cross" tool from target packages. Automated bootstrap of new architectures is not possible this way, as a C library package compiled for the target is necessary to start building the packaged compiler.

Individual packages can be cross-compiled by passing a "-a" option to dpkg-buildpackage, which presets the environment variables accordingly. The package is responsible for honouring these variables, which for autotools-using packages can be achieved by passing --build and --host parameters to the configure script. The package's build dependencies need to be split into build and host dependencies and the easiest way to do this is to re-use the multiarch specification with some minor extensions.

Prior to multiarch, only packages for the machine architecture can be installed, so dpkg-cross is used to convert library and -dev packages to be of arch all, and move library and header files to a co-installable path under /usr/<triplet>/. Multiarch removes the need to create these -cross packages by having standard packages install to /usr/lib/<triplet> and /usr/include/<triplet>, such that they are co-installable.

Utilizing Multiarch

The layout for libraries used by Multiarch is similar to that used by dpkg-cross in converted packages, and both have the goal of having libraries for different architectures co-installable, so it makes sense to integrate both, and drop dpkg-cross in the long run.

Multiarch already goes most of the way by specifying new paths where libraries are to be found; while the MultiarchSpec lists library -dev packages as unresolved, the transition plan is already pretty specific on making binutils and gcc look into multiarch directories, which is only needed for -dev packages.

A patch exists (217902) to allow pkg-config to support the --host option which is already supported by the autoconf and m4 macros. This means that pkg-config .pc files will not need to be converted.

What is required in package dependencies is for the depending source package to distinguish between build-dependencies which are satisfiable by any architecture ('tools') from build-dependencies which can only be satisfied by packages of the same architecture ('libraries' generally). This is very similar to the Multi-Arch: field options 'foreign' and 'same', respectively, but the relationship is defined by the depending package, not the depended-on package, because the depended-on package might be use in both modes. This is recognised in the MultiArch spec with the Multi-arch: option 'allowed' and the Depends: package:any syntax.

Despite the relationship being 'from the wrong end', in practice it is almost always right to use the Multi-arch field to decide if the build or host version (or both) of a package should be installed. By only marking the exceptions to this rule in a packages' build-dependencies we minimise the package metadata changes needed (most packages will need no change in this regard).

We propose that in build dependencies, the special qualifier ":native" can be appended to names of packages marked "Multi-Arch: same" or "Multi-Arch: allowed", to signify that this build dependency should be installed for DEB_BUILD_ARCH rather than DEB_HOST_ARCH. When not cross-compiling, it can and should be treated as if it were not present. The ":any" qualifier would have the same meaning as for regular dependencies, i.e. it would allow the dependency to be met by any package with an architecture that can be executed on the host, regardless of whether that is the same architecture as the current DEB_BUILD_ARCH. A table is given below.

So the way this is expected to work is that a cross-building tool will normally rely on the Multi-Arch field in order to decide if something is a 'tool' (install for DEB_BUILD_ARCH) or a 'library' (install for DEB_HOST_ARCH), but the Build-Depends of a package can be qualified to specify exceptions to the library default (using package:native, see the table below under 'phase 2'). ':native' is not implemented, and indeed the spec (for 'Depends') says to explicitly fail any dependency qualifier which are not recognised. This is needed for the case where a 'library' package needs to be installed for DEB_BUILD_ARCH, rather than the usual case of DEB_HOST_ARCH. This would occur where a package builds a tool to be run during the build and that tool depends on an external library (which of course needs to be DEB_BUILD_ARCH). There are a small number of such packages already in the archive.

Header files

The current Multiarch spec only covers library files. To be useful for cross-building then include files must also be put into arch-specific locations so that cross-builds can find them. This is quite easy to do. In practice most header files are actually architecture independent and thus can be left in /usr/include. Only arch-dependent headers need to move to /usr/include/<triplet>.

This mean only packages which have arch-dependent headers need to change. There will be issues during transition when headers will be found (because the native package is installed) even though the cross-dependency is not installed. But finding the wrong headers happens a lot already so is probably survivable, but if cross-toolchains look there they will currently also find the architecture-dependent files of the build architecture for packages which have not yet been multiarched.

An alternative is to move all headers into /usr/include/<triplet>. This is simple to change in each package (but does need chaning everywhere). This avoids issues of finding the 'wrong' headers when some packages are multiarach and some aren't. Once the transition is complete we can revisit this question and start moving arch-independent headers. Header files are relatively small so the duplication is not important (and is also how things work with dpkg-cross prior to multi-arch).

Executables in -dev packages

In order be able to install the the host arch version of a -dev package under multiarch the package needs to be marked M-A: same (or allowed). However many -dev packages contain files in /usr/bin which vary with architecture. Some are elf binaries which definately differ between arches. Others are scripts which may or may not vary with arch. Files which are identical across arches are not a problem, as dpkg will just deal with them.

Many of these binaries are foo-config utilities to do the same job as pkg-config in terms of providing path and flag information on how to build. Actually using pkg-config for this job would remove the need for a package-specific binary, and this may be appropriate in many cases, but needs to be done upstream, not just in Debian. In more complex cases it may be appropriate to split the -dev package into -dev and -dev-bin packages, where the -dev-bin package is M-A: foreign, and the -dev package M-A: same, but this should be avoided unless really necessary.

MultiarchCrossExecutables contains an analysis of -dev packages to determine the size of this issue.

Cross-toolchain packages

Cross compiler generation is currently being changed so it can be automated in the current environment, by utilizing the "binutils-source", "gcc-source", "glibc-source"/"uclibc-source", "kernel-source" and "gdb-source" binary packages together with a framework package containing build scripts and several small helper packages that can be fed to autobuilders. Later on, when Multiarch enabled apt is able to resolve dependencies on packages of a specified architecture (and this is supported in the archive?), toolchain builds can be switched from a full bootstrap to building individual packages separately; this is not part of phase 1.

Terminology and semantics of package relationships

Cross-build and cross-arch terminology is always confusing. We need package maintainers to be able to get this right, when annotating dependencies, without becoming deep experts. What names to use for the modifiers?

':native' has been suggested for saying 'I need the build arch version of this library, not the target(host) arch'. However using ':native' is potentially confusing: Does it mean 'runs on the build host', or does it mean 'is the same as the target arch'. People use it both ways. (xdeb uses the latter sense, this, and the multi-arch spec uses it in the former sense). The former sense is more usual, as used in this spec and bug #558095, but perhaps it's best avoided.

':build' is perhaps clearer in that it clearly indicates the build architecture version is wanted.

We could use the ':any' used in multiarch is perhaps less confusing (and add :both). But on the other hand maybe we shouldn't be using the same strings as the multi-arch field because that's the relationship in the opposite direction? Good choices are not obvious, but it's important to be as clear as possible.

Transition

It is suggested in the multiarch spec that the extra dependency annotation will not be implemented until a whole cycle after the base Multi-Arch: field features, because tools will not understand the syntax. This would prevent the feature being used now by cross-dependency satisfaction tools like xdeb and xapt.

Can we in fact do this now without causing major problems? dpkg-checkbuilddeps needs to change. Other packages that parse dependencies and build-dependencies directly and thus need to grok this are: xdeb, xapt, pbuilder, sbuild, apt, aptitude. There are no doubt others. Simply ignoring these dependency annotations allows packages to be uploaded.

Phase 1

  • apt is changed to handle ":native" and ":any" in build dependencies. #558103 done.

Phase 2 (before wheezy releases)

  • dpkg is taught to handle ":native" and ":any" in build dependencies. #558095

  • the Debian archive starts accepting packages with qualified build dependencies in unstable. #558104

  • emdebian tools and dpkg are taught to use multiarch field and handle "native"-qualified build dependencies
  • emdebian converts "Build-Depends-Tools" based /target build dependency distinction to new format, submits patches.

At this point, it is possible cross-compile all up-to-date Debian packages, but toolchains still need some manual attention, which is acceptable.

Build Dependencies are resolved according to this table:

Build-Depends: foo

Build-Depends: foo:any

Build-Depends: foo:native

no Multi-Arch field

DEB_HOST_ARCH

as foo

DEB_BUILD_ARCH

Multi-Arch: same

DEB_HOST_ARCH

disallowed

DEB_BUILD_ARCH

Multi-Arch: foreign

DEB_BUILD_ARCH

disallowed

disallowed

Multi-Arch: allowed

DEB_HOST_ARCH

any, preferably DEB_BUILD_ARCH

DEB_BUILD_ARCH

Using this table, build dependencies on not-yet converted libraries cannot be handled; this is equivalent to the status quo. Accepting ":native" references to those packages allows for reverse dependencies to be updated ahead of time; this can be switched to "disallowed" after the transition of library packages is complete.

Phase 3

  • apt gains support for installing target build dependencies
  • apt and dpkg learn about full architecture qualification

This will allow firmware files and boot blocks to be generated on any host, eliminating the need for architecture-independent files that can be built only on a single host.

Unresolved Issues

Arch Qualification vs Toolchain Builds

Toolchain builds require going back and forth between several source trees, most notably gcc and libc:

Step

Source

Prerequisite

Action

0

linux

none

build linux-libc-dev headers

1

binutils

none

build binutils

2

gcc

none

build "freestanding" gcc

3

gcc

1,2

build static libgcc

4

libc

0,1,2,3

build intermediate libc, statically linked against libgcc

5

gcc

1,2,4

build shared libgcc, linked against intermediate shared libc

6

libc

0,1,2,5

build final libc

7

gcc

1,2,5,6

build final "hosted" gcc

8

gcc

1,5,6,7

build language support libraries

For this, architecture qualification of build dependencies is not really useful as the users are caught in the middle of the dependency loop. There are some ideas how to handle this in the autobuilders, but these are outside of the scope of this document.