MultiarchCross

Revision 12 as of 2011-01-18 18:45:20

Clear message

Summary

Extensions to the MultiarchSpec necessary for automated cross compilation and toolchain builds.

Motivation

The emdebian project aims to provide precompiled binaries for several embedded architectures, for some of which it is not possible or sensible to compile natively. For these, we need to build cross compilers and subsequently compile individual packages using these. Building these packages in the archive, with autobuilders, requires cross-architecture build dependencies, and for same-arch only build dependencies to be specifiable.

Current Status

The binutils package can be told to build a cross variant by passing a special environment variable. As there are no dependencies on target packages, this works for any GNU triplet supported by upstream, and can be automated easily.

The gcc package has a mechanism in place that rewrites various files in debian/ so that a set of cross compiler packages is built. The rewritten control file then declares build dependencies on several packages whose names end in a dash, the target Debian triplet and "-cross". These must be generated with the "dpkg-cross" tool from target packages. Automated bootstrap of new architectures is not possible this way, as a C library package compiled for the target is necessary to start building the packaged compiler.

Individual packages can be cross-compiled by passing a "-a" option to dpkg-buildpackage, which presets the environment variables accordingly. The package is responsible for honouring these variables, which most of the time can be achieved by passing --build and --host parameters to the configure script. The package's build dependencies need to be split into host and target dependencies manually, which precludes automation. An extended control file syntax that uses a "Build-Depends-Tools" field to list build dependencies that should not be translated is used in emdebian. This could be used in Debian proper but re-using the multiarch specification with some minor additions seems neater.

Utilizing Multiarch

The layout for libraries used by Multiarch is similar to that used by dpkg-cross in converted packages, and both have the goal of having libraries for different architectures co-installable, so it makes sense to integrate both, and drop dpkg-cross in the long run.

Multiarch already goes most of the way by specifying new paths where libraries are to be found; while the MultiarchSpec lists library -dev packages as unresolved, the transition plan is already pretty specific on making binutils and gcc look into multiarch directories, which is only needed for -dev packages.

Some extra work will be required to handle pkg-config .pc files (by teaching pkg-config about the new directories) and architecture-dependent headers (which are installed below /usr/lib by several packages). These can be dealt with by patching individual packages, but is probably best done by have arch-specific pkg-config-<arch> commands.

As it is for the (current) Multiarch approach, only libraries and library development files are interesting for the cross-compilation case. dpkg-cross strips all other files from the generated packages, so there is no change required there either.

What is required in package dependencies is for the depending source package to distinguish between build-dependencies which are satisfiable by any architecture ('tools') from build-dependencies which can only be satisfied by packages of the same architecture ('libraries' generally). This is very similar to the Multi-Arch: field options 'foreign' and 'same', respectively, but the relationship is defined by the depending package, not the depended-on package, because the depended-on package might be use in both modes. This is recognised in the MultiArch spec with the Multi-arch: option 'allowed' and the Depends: package:any syntax. However it is suggested that this extra dependency annotation will not be implemented until a whole cycle after the base Multi-Arch: field features, because tools will not understand the syntax. This would prevent the feature being used now by cross-dependency satisfaction tools like xdeb and xapt.

Can we in fact do this now without causing major problems? dpkg-checkbuilddeps needs to change. Other packages that parse dependencies directly and thus need to grok this are: xdeb, xapt, dpkg-cross, pbuilder, sbuild, apt, aptitude. There are no doubt others. Simply ignoring these dependency annotations allows packages to be uploaded.

We would like to take up the opportunity to introduce a host/target distinction in build dependencies in Debian at the same time, even if Debian and Ubuntu themselves would not use that information. This would be cheap to do at this stage, as the dependency handling code needs to be adapted anyway to handle architecture-qualified dependencies; we propose that in build dependencies, the special qualifier ":native" can be appended to names of packages marked "Multi-Arch: same" or "Multi-Arch: allowed", to signify that this build dependency should be installed for DEB_BUILD_ARCH rather than DEB_HOST_ARCH. When not cross-compiling, it can and should be treated as if it were not present. The ":any" qualifier would have the same meaning as for regular dependencies, i.e. it would allow the dependency to be met by any package with an architecture that can be executed on the host, regardless of whether that is the same architecture as the current DEB_BUILD_ARCH.

So the way this is expected to work is that a cross-building tool will normally rely on the Multi-Arch field in order to decide if something is a 'tool' (install for DEB_BUILD_ARCH) or a 'library' (install for DEB_HOST_ARCH), but the Build-Depends of a package can be qualified to specify exceptions to those defaults (using package:any and package:native, see the table below under 'phase 2'). ':any' is already in the Multiarch spec for Depends, so adding it for Build-Depends is a small step. ':native' is not implemented, and indeed the spec says to explicitly fail any dependency qualifier which are not recognised. This is needed for the case where a 'library' package needs to be installed for DEB_BUILD_ARCH, rather than the usual case of DEB_HOST_ARCH. This would occur where a package builds a tool to be run during the build and that tool depends on an external library (which of course needs to be DEB_BUILD_ARCH). Do we have such packages?

Cross compiler generation is currently being changed so it can be automated in the current environment, by utilizing the "binutils-source", "gcc-source", "glibc-source"/"uclibc-source" and "gdb-source" binary packages together with a framework package containing build scripts and several small helper packages that can be fed to autobuilders. Later on, when Multiarch enabled apt is able to resolve dependencies on packages of a specified architecture, toolchain builds can be switched from a full bootstrap to building individual packages separately; this is not part of phase 1.

Terminology

Cross-build and cross-arch terminology is always confusing. We need package maintainers to be able to get this right, when annotating dependencies, without becoming deep experts. Using ':native' is potentially confusing: Does it mean 'runs on the build host', or does it mean 'is the same as the target arch'. People use it both ways. (xdeb uses the latter sense, this, and the multi-arch spec uses it in the former sense). The former sense is more usual, as used in this spec and bug #558095, but perhaps it's best avoided. The ':same' and ':any' used in multiarch is perhaps less confusing (and add :both). But on the other hand maybe we shouldn't be using the same strings as the multi-arch field because that's the relationship in the opposite direction? Good choices are not obvious, but it's important to be as clear as possible.

Transition

Phase 1 (before Debian squeeze release)

  • dpkg is taught to handle ":native" in build dependencies. #558095

  • apt is changed to handle ":native" in build dependencies. #558103 done.

This should happen before the release, so qualification can start afterwards without introducing compatibility problems.

Phase 2 (after Debian squeeze release)

  • the Debian archive starts accepting packages with qualified build dependencies in unstable. #558104

  • emdebian tools and dpkg are taught to handle "native"-qualified build dependencies
  • emdebian converts "Build-Depends-Tools" based host/target build dependency distinction to new format, submits patches.

At this point, it is possible cross-compile all up-to-date Debian packages, but toolchains still need some manual attention, which is acceptable.

Build Dependencies are resolved according to this table:

Build-Depends: foo

Build-Depends: foo:any

Build-Depends: foo:native

no Multi-Arch field

DEB_BUILD_ARCH

disallowed

DEB_BUILD_ARCH

Multi-Arch: same

DEB_HOST_ARCH

disallowed

DEB_BUILD_ARCH

Multi-Arch: foreign

DEB_BUILD_ARCH

disallowed

disallowed

Multi-Arch: allowed

DEB_HOST_ARCH

any, preferably DEB_BUILD_ARCH

DEB_BUILD_ARCH

Using this table, build dependencies on not-yet converted libraries cannot be handled; this is equivalent to the status quo. Accepting ":native" references to those packages allows for reverse dependencies to be updated ahead of time; this can be switched to "disallowed" after the transition of library packages is complete.

Phase 3 (hopefully before squeeze+1 release)

  • apt gains support for installing target build dependencies
  • apt and dpkg learn about full architecture qualification

This will allow firmware files and boot blocks to be generated on any host, eliminating the need for architecture-independent files that can be built only on a single host.

Unresolved Issues

Arch Qualification vs Toolchain Builds

Toolchain builds require going back and forth between several source trees, most notably gcc and libc:

Step

Source

Prerequisite

Action

0

linux

none

build linux-libc-dev headers

1

binutils

none

build binutils

2

gcc

none

build "freestanding" gcc

3

gcc

1,2

build static libgcc

4

libc

0,1,2,3

build intermediate libc, statically linked against libgcc

5

gcc

1,2,4

build shared libgcc, linked against intermediate shared libc

6

libc

0,1,2,5

build final libc

7

gcc

1,2,5,6

build final "hosted" gcc

8

gcc

1,5,6,7

build language support libraries

For this, architecture qualification of build dependencies is not really useful as the users are caught in the middle of the dependency loop. There are some ideas how to handle this in the autobuilders, but these are outside of the scope of this document.