LaunchpadTranslationsUnderTheHood

Revision 6 as of 2010-01-25 09:30:44

Clear message

UbuntuDeveloperWeek session

by AdiRoiban and HenningEggers on Tuesday, 20 Jan 2010, 17h

Intended audience

  • Developers wanting to contribute to Launchpad Translations but are not yet familiar with the internal structure of the application.
  • Interested maintainers of translations in Launchpad and translators that want to have a better understanding of how and why Launchpad Translations does what it does.

Required knowledge

  • GNU gettext system for internationalization of software
  • Python coding
  • A general understanding of how a web application works
  • Knowledge of zope is not required but a bonus

Goals of the session

Session attendees have a good understanding of

  • how translation data is stored in LP translation (db schema)
  • how the import and approval process works (translationimportqueue)
  • how permissions and translation groups work (translation groups)
  • how review and suggestion handling works (POFile:+translate)

It is not the goal of this session to introduce the attendees to Launchpad development in general. That will be covered in a different session by Karl Fogel

The session text will be used as developer documentation on the Launchpad development wiki so this is a change for us to gather input from the community.

Session text

Today we want to help you understand the inner workings of the Launchpad Translations application (Rosetta) and take you for a walk through the source code. We hope that this will enable you to scratch your own itches you have about Rosetta and to contribute to its source.

Gettext basics

You need to understand how gettext ist used to internationalize computer software. You should be familiar gettext documenation but we will give you a short run-through of those parts that are important for Rosetta.

PO files

Gettext stores translations in so-called portable object files, abbriviated as PO files. They contain data sets of msgid and msgstr, the former containing the English orignal string, the latter containing the translation of that string. They may be prepended by special comments that convey information about the string that is being translated, like in which source file it was found. Here is an example:

#: src/coding.c:123
msgid "Thank you"
msgstr "Merci"

Gettext states that msgid could be anything to indentify the string in the source code and not necessarily the English original string. Using the full English original string as the msgid, though, has proven to be the most convenient way to work on translations and is the only form that is fully supported by Rosetta.

The first msgid in a PO file is empty and its msgstr contains meta information about the file. The minimum information here is the MIME Content-type of the file but usually a lot of other information is included, too.

msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-26 12:28+0000\n"
"PO-Revision-Date: 2009-01-26 12:28+0000\n"
"Last-Translator: Foo Bar <foo.bar@canonical.com>\n"
"Language-Team: French <fr@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

The standard naming convention for PO files is to use the language code, so in this case fr.po.

Translation templates

When translatable strings are extracted from source code using xgettext or intltool, they are put into a file which is commonly referred to as the translation template. Its format is identical to that of a PO file but all the msgstr lines do not contain any translations. These files are intended to be be used to create new PO files, so they also contain the header information but with most fields left with empty or generic values.

Since a PO template is not really a separate file format it does not find much mention in the gettext documentation. Also, because its content can be generated from source any time (like during a build), most projects don't include it in their repository. Only PO files contain valuable information for a project, the translations themselves, and are therefore included in the source code repository.

The standard naming convention for PO templates is to use the translation domain with the extension .pot, for example myproject.pot.

Gettext workflow

To start a translation into a new language for a project, the following steps are necessary.

  1. Either the project maintainer or the translater creates a template from source code.
  2. The translator fills out the template with the translations for each msgid.

  3. The translator saves the file in the source tree as ll.po (see above), ususally in a directory called po.

  4. The translator or somebody with commit rights commits the file to the repository.
  5. Whenever the package is built, the translations are processed so that they are available at run-time (out of scope here).

To change translations, the steps are simpler.

  1. The translator checks out the PO file from the repositry.
  2. The translator changes whatever translations they find necessary.
  3. The translator or somebody with commit rights commits the file to the repository.
  4. ... (see above)

When ever the source code changes, a special command from the gettext suite is used to merge any new English strings into all PO files (i.e. all existing translations) or remove those that where removed. Translators can then check out the files and translate the new strings.

Launchpad workflow

When using Launchpad to translate a project, the steps are slightly different because the PO files are kept in Launchpad for the translators to work with. From Launchpad they are mirrored into the source tree to be used at build time.

  1. The project maintainer uploads the PO template file to Launchpad.
  2. Translators go to Launchpad to translate the English strings that now appear in the web interface.
  3. The project maintainer downloads all PO files whenever they want to, usually to prepare a release of the software.

Nowadays the upload and download can happen automatically from and to Bazaar branches in Launchpad so that the maintainer always has a mirror of the latest translations in the branch, while changes to the PO template are automatically propagated to Launchpad. The next step will be automatic generation of PO templates from the source code in a Bazaar branch.

Mapping gettext in the Launchpad database

  1. How this has been mapped to the db schema. (jtv's great schema diagram goes here.)

Hands-on development

Since it is not expected that many of the attendees have a ready-to-use LP development setup, this will have to be a prepared example of fixing a small bug in LP translations that gets presented in some way. Suggestions?

  • Bug suggestions
  • Idea 1
    • do a bzr export and provide those sources for download
    • create a pdf presentation describing fixing a bug, step by step
    • use Lernid to integrate the slide with the IRC session
  • Idea 2
    • pick a simple bug that will touch only touch a view and it's template
    • open the files in etherpad http://etherpad.com/

    • export the LP session from a local system (port forwarding... etc)