The general idea is to exclude locale-specific data (e.g., /usr/share/locale) from packages at built time, instead stashing it in a centralized database. This data can then be used to create a single package per locale, with all of the relevant localisation data.

Data extraction

The current consensus is to modify the source packages to generate debs without translation data, as opposed to stripping the data from the debs after the build. This will guarantee reproducible results and avoid any problems with apt and dpkg, which may choke with debs that have the same version number, but a different size.

We will create an additional debhelper script (dh_extracttranslations or something similar) which we call in debian/rules. This will require a significant effort since we need to modify every supported source package, but will be the most robust solution. However, we should ensure that packages which only have this change are merged and uploaded automatically.

We should aim to not require the presence of this helper script; if debian/rules checks for its presence before calling it, we might even convince Debian to adopt these changes. Whether the extraction is done or not then purely depends on the debhelper version on t

This debhelper script should remove all translation-related files (files in /usr/share/locale/ for a start) and put them into a pool, which is then fed into Rosetta.

Shipping the translation data


  1. language packs should be compatible with older Ubuntu and Debian packages (i. e. must not conflict filewise)
  2. the impact on buildds and mirrors must not be unbearable
  3. users should not be forced to download too much redundant data
  4. packages must not break if translations are out of sync
  5. dpkg should not be reinvented, we want to use its capabilities to manage the translation data


There are two major approaches to this:

  • The push principle: the buildds regularly build updated language packs from Rosetta data and upload them into our normal archive. If we have one translation deb per language, this will violate 2 and 3, if we have one translation deb per package, the number of packages explodes (however, this could be alleviated by providing one archive category per language, i. e. main-en, main-de etc.).

  • The pull principle: we only build premade translation packages at a release and ship them; afterwards an user can download new translation stuff directly from Rosetta (only for the packages and languages he actually wants and only new translations) and build a new translation deb on the fly (which should be one deb per language for the whole system then). This work should be done by a simple frontend (update-translations, maybe with a panel button). The generated deb would then be installed normally; dpkg will care for the housekeeping (Requirement 5). This will require more work on the client computer, but avoids download redundancy and mirror killing (Requirements 2 and 3). Users can always get up-to-date translations without waiting for fixed update intervals.

We should introduce a second gettext hierarchy in /usr/share/translations/ and make gettext choose the newer version of a file. This will ensure compatibility with older and Debian packages. However, this requires to not only change the gettext package itself, but also all packages which statically link gettext. However, packages which are not yet converted to this system will not break, they will just not use the new translations (Requirement 1).

Similarly, packages will not break if translations are out of sync, they are just ignored then (Requirement 4).

MataroSessionsWorkshops/LanguagePacks (last edited 2008-08-06 16:37:18 by localhost)