UbuntuDeveloperWeek session

by AdiRoiban and HenningEggers on Wednesday, 27 Jan 2010, 17.00 UTC

Introduction

Henning

Today we want to help you understand the inner workings of the Launchpad Translations application (Rosetta) and take you for a walk through the source code. We hope that this will enable you to scratch your own itches you have about Rosetta and to contribute to its source.

Intended audience

Required knowledge

Goals of the session

Session attendees have a good overview of

It is not the goal of this session to introduce the attendees to Launchpad development in general. That will be covered in a different session by Karl Fogel.

The session text will be used as developer documentation on the Launchpad development wiki so this is a change for us to gather input from the community.

This session does not use slides for Lernid but will have some references to sourceode on Launchpad that should still pop up in Lernid. The source code for Launchpad is found here: http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/files

Gettext basics

Adi

You need to understand how gettext ist used to internationalize computer software. You should be familiar gettext documenation but we will give you a short run-through of those parts that are important for Rosetta.

PO files

Gettext stores translations in so-called portable object files, abbriviated as PO files. They contain data sets of msgid and msgstr, the former containing the English orignal string, the latter containing the translation of that string. They may be prepended by special comments that convey information about the string that is being translated, like in which source file it was found. Here is an example:

#: src/coding.c:123
msgid "Thank you"
msgstr "Merci"

Gettext states that msgid could be anything to indentify the string in the source code and not necessarily the English original string. Using the full English original string as the msgid, though, has proven to be the most convenient way to work on translations and is the only form that is fully supported by Rosetta.

The first msgid in a PO file is empty and its msgstr contains meta information about the file. The minimum information here is the MIME Content-type of the file but usually a lot of other information is included, too.

msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-26 12:28+0000\n"
"PO-Revision-Date: 2009-01-26 12:28+0000\n"
"Last-Translator: Foo Bar <foo.bar@canonical.com>\n"
"Language-Team: French <fr@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

The standard naming convention for PO files is to use the language code, so in this case fr.po.

Translation templates

When translatable strings are extracted from source code using xgettext or intltool, they are put into a file which is commonly referred to as the translation template. Its format is identical to that of a PO file but all the msgstr lines do not contain any translations. These files are intended to be be used to create new PO files, so they also contain the header information but with most fields left with empty or generic values.

Since a PO template is not really a separate file format it does not find much mention in the gettext documentation. Also, because its content can be generated from source any time (like during a build), most projects don't include it in their repository. Only PO files contain valuable information for a project, the translations themselves, and are therefore included in the source code repository.

The standard naming convention for PO templates is to use the translation domain with the extension .pot, for example myproject.pot.

Gettext workflow

To start a translation into a new language for a project, the following steps are necessary.

  1. Either the project maintainer or the translater creates a template from source code.
  2. The translator fills out the template with the translations for each msgid.

  3. The translator saves the file in the source tree as ll.po (see above), ususally in a directory called po.

  4. The translator or somebody with commit rights commits the file to the repository.
  5. Whenever the package is built, the translations are processed so that they are available at run-time (out of scope here).

To change translations, the steps are simpler.

  1. The translator checks out the PO file from the repositry.
  2. The translator changes whatever translations they find necessary.
  3. The translator or somebody with commit rights commits the file to the repository.
  4. ... (see above)

When ever the source code changes, a special command from the gettext suite is used to merge any new English strings into all PO files (i.e. all existing translations) or remove those that where removed. Translators can then check out the files and translate the new strings.

Launchpad workflow

When using Launchpad to translate a project, the steps are slightly different because the PO files are kept in Launchpad for the translators to work with. From Launchpad they are mirrored into the source tree to be used at build time.

  1. The project maintainer uploads the PO template file to Launchpad.
  2. Translators go to Launchpad to translate the English strings that now appear in the web interface.
  3. The project maintainer downloads all PO files whenever they want to, usually to prepare a release of the software.

Nowadays the upload and download should happen automatically from and to Bazaar branches in Launchpad so that the maintainer always has a mirror of the latest translations in the branch, while changes to the PO template are automatically propagated to Launchpad. The next step will be automatic generation of PO templates from the source code in a Bazaar branch.

Mapping gettext in the Launchpad database

Henning

The gettext structure of PO templates and PO files has been mapped into the Launchpad database with the following goals in mind:

Message sharing has been introduced over the last year or so and is a huge benefit for the users but makes the database schema and its handling a lot more complex. Because of the vast amount of data some of it is still being migrated to conform to message sharing.

Database schema of Launchpad Translations

You can see the main tables used for Launchpad Translations in the digram. PO templates are mapped into the database using four tables.

The other four tables are used to store the actual translations and are therefore a mapping of PO files.

To export a PO file from Launchapd, all tables in the diagram have to be queried to find out, which TranslationMessage entries belong into that file and to extract the actual strings that will be stored in the file.

Code structure

Adi

The source code for the Rosetta application is found in the Launchpad source tree at lib/lp/translations. The layout follows that of all Launchpad applications.

Model

View

Utilities

Tour of the code

Henning

Now we'll look around at some exemplary places in the code to see how things work together.

Implementation notes

Some things to note to not get confused when reading the source code. These show that the code and the terms it uses have evolved.

IPOTemplate

We take a look at one of the key interfaces, IPOTemplate which is found here: http://bazaar.launchpad.net/%7Elaunchpad-pqm/launchpad/devel/annotate/head%3A/lib/lp/translations/interfaces/potemplate.py#L121

You see a number of attributes defined, most of which relate to a database column but not all. Pleae note these three attributes: productseries, distroseries, sourcepackagename. An IPOTemplate is always related to either a productseries (e.g. "Evolution trunk") or a combination of distroseries and source packagename (e.g. "Ubuntu lucid, evolution"). http://bazaar.launchpad.net/%7Elaunchpad-pqm/launchpad/devel/annotate/head%3A/lib/lp/translations/interfaces/potemplate.py#L175

Further down you'll find methods to retrieve the IPOTMsgSet objects that this template contains. http://bazaar.launchpad.net/%7Elaunchpad-pqm/launchpad/devel/annotate/head%3A/lib/lp/translations/interfaces/potemplate.py#L335

Finally, methods to access the IPOFile objects that hold translations for this templates. http://bazaar.launchpad.net/%7Elaunchpad-pqm/launchpad/devel/annotate/head%3A/lib/lp/translations/interfaces/potemplate.py#L397

In the same file is IPOTemplateSet which gives access to IPOTemplateSubset objects. Note how getSubset takes the three filtering parameters for the subset, as mentioned earlier. http://bazaar.launchpad.net/%7Elaunchpad-pqm/launchpad/devel/annotate/head%3A/lib/lp/translations/interfaces/potemplate.py#L620

Translation groups and permission

For quality assurance, in Rosetta, we have translation groups and translation permission.

A translation group and permission is attached to each Project, Product and Distribution object.

Distributions and projects can only have one translation group and permission. Products can have their on translation group and permission, but they also inherit them from the project containing the product.

You can read more about translation groups and translation permission implementation and usage on Launchpad Help wiki.

Hands on session

First you need to have Launchpad running on your computer.

Getting and running Launchpad on your computer is pretty straightforward and well documented on the Launchpad Development Wiki

You can also use a Virtual Box harddisk image for a fully functional Launchpad instance. Extract all 7zip files and you will have the HDD image. Create a Virtual Box machine based on this HDD image. User: developer , password: d3v3l0p3r

To get started with Launchpad development, take a look at the ''trivial'' bugs from Rosetta. They are a good opportunity to discover Rosetta and Launchpad development process.

UbuntuDeveloperWeek/Sessions/LaunchpadTranslationsUnderTheHood (last edited 2010-01-27 17:06:58 by h194-54-129-79)