LaunchpadTranslationsUnderTheHood

Differences between revisions 11 and 12
Revision 11 as of 2010-01-26 04:14:58
Size: 13837
Editor: h194-54-129-79
Comment:
Revision 12 as of 2010-01-26 04:22:14
Size: 13950
Editor: h194-54-129-79
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
by AdiRoiban and HenningEggers on Wednesday, 20 Jan 2010, 17h by AdiRoiban and HenningEggers on [[http://www.timeanddate.com/worldclock/fixedtime.html?day=27&month=1&year=2010&hour=17&min=0&sec=0&p1=0|Wednesday, 27 Jan 2010, 17.00 UTC]]

UbuntuDeveloperWeek session

by AdiRoiban and HenningEggers on Wednesday, 27 Jan 2010, 17.00 UTC

Intended audience

  • Developers wanting to contribute to Launchpad Translations but are not yet familiar with the internal structure of the application.
  • Interested maintainers of translations in Launchpad and translators that want to have a better understanding of how and why Launchpad Translations does what it does.

Required knowledge

  • GNU gettext system for internationalization of software
  • Python coding
  • A general understanding of how a web application works
  • Knowledge of zope is not required but a bonus

Goals of the session

Session attendees have a good understanding of

  • how translation data is stored in LP translation (db schema)
  • how the import and approval process works (translationimportqueue)
  • how permissions and translation groups work (translation groups)
  • how review and suggestion handling works (POFile:+translate)

It is not the goal of this session to introduce the attendees to Launchpad development in general. That will be covered in a different session by Karl Fogel

The session text will be used as developer documentation on the Launchpad development wiki so this is a change for us to gather input from the community.

Session text

Today we want to help you understand the inner workings of the Launchpad Translations application (Rosetta) and take you for a walk through the source code. We hope that this will enable you to scratch your own itches you have about Rosetta and to contribute to its source.

Gettext basics

You need to understand how gettext ist used to internationalize computer software. You should be familiar gettext documenation but we will give you a short run-through of those parts that are important for Rosetta.

PO files

Gettext stores translations in so-called portable object files, abbriviated as PO files. They contain data sets of msgid and msgstr, the former containing the English orignal string, the latter containing the translation of that string. They may be prepended by special comments that convey information about the string that is being translated, like in which source file it was found. Here is an example:

#: src/coding.c:123
msgid "Thank you"
msgstr "Merci"

Gettext states that msgid could be anything to indentify the string in the source code and not necessarily the English original string. Using the full English original string as the msgid, though, has proven to be the most convenient way to work on translations and is the only form that is fully supported by Rosetta.

The first msgid in a PO file is empty and its msgstr contains meta information about the file. The minimum information here is the MIME Content-type of the file but usually a lot of other information is included, too.

msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-26 12:28+0000\n"
"PO-Revision-Date: 2009-01-26 12:28+0000\n"
"Last-Translator: Foo Bar <foo.bar@canonical.com>\n"
"Language-Team: French <fr@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

The standard naming convention for PO files is to use the language code, so in this case fr.po.

Translation templates

When translatable strings are extracted from source code using xgettext or intltool, they are put into a file which is commonly referred to as the translation template. Its format is identical to that of a PO file but all the msgstr lines do not contain any translations. These files are intended to be be used to create new PO files, so they also contain the header information but with most fields left with empty or generic values.

Since a PO template is not really a separate file format it does not find much mention in the gettext documentation. Also, because its content can be generated from source any time (like during a build), most projects don't include it in their repository. Only PO files contain valuable information for a project, the translations themselves, and are therefore included in the source code repository.

The standard naming convention for PO templates is to use the translation domain with the extension .pot, for example myproject.pot.

Gettext workflow

To start a translation into a new language for a project, the following steps are necessary.

  1. Either the project maintainer or the translater creates a template from source code.
  2. The translator fills out the template with the translations for each msgid.

  3. The translator saves the file in the source tree as ll.po (see above), ususally in a directory called po.

  4. The translator or somebody with commit rights commits the file to the repository.
  5. Whenever the package is built, the translations are processed so that they are available at run-time (out of scope here).

To change translations, the steps are simpler.

  1. The translator checks out the PO file from the repositry.
  2. The translator changes whatever translations they find necessary.
  3. The translator or somebody with commit rights commits the file to the repository.
  4. ... (see above)

When ever the source code changes, a special command from the gettext suite is used to merge any new English strings into all PO files (i.e. all existing translations) or remove those that where removed. Translators can then check out the files and translate the new strings.

Launchpad workflow

When using Launchpad to translate a project, the steps are slightly different because the PO files are kept in Launchpad for the translators to work with. From Launchpad they are mirrored into the source tree to be used at build time.

  1. The project maintainer uploads the PO template file to Launchpad.
  2. Translators go to Launchpad to translate the English strings that now appear in the web interface.
  3. The project maintainer downloads all PO files whenever they want to, usually to prepare a release of the software.

Nowadays the upload and download should happen automatically from and to Bazaar branches in Launchpad so that the maintainer always has a mirror of the latest translations in the branch, while changes to the PO template are automatically propagated to Launchpad. The next step will be automatic generation of PO templates from the source code in a Bazaar branch.

Mapping gettext in the Launchpad database

The gettext structure of PO templates and PO files has been mapped into the Launchpad database with the following goals in mind:

  • String sharing: Each string of text is only stored once in the database.
  • Message sharing: If identical English strings appear in different series of the same project or distribution, these should only be stored once for the project and share their translations accross all series.

Message sharing has been introduced over the last year or so and is a huge benefit for the users but makes the database schema and its handling a lot more complex. Because of the vast amount of data some of it is still being migrated to conform to message sharing.

Database schema of Launchpad Translations

You can see the main tables used for Launchpad Translations in the digram. PO templates are mapped into the database using four tables.

  • POMsgID is a look-up table for all the English strings that are being translated.

  • POTMsgSet holds all the data related to an original English string as found in a PO template, one database entry per msgid entry in the file. It refers to the actual English strings only by their IDs in POMsgID. This represent "one paragraph"/one entry from a PO template (msgid, msgid_singular and msgid_plural).

  • POTemplate holds the meta data related to a PO template as it has been imported, most notably the original path name, the translation domain, the original header and a flag if this template is active or not.

  • TranslationTemplateItem is a linking table because of the n:m realtionship between POTMsgSet and POTemplate which message sharing introduces. Not only does a PO template file contain multiple msgid entries, the same msgid may also appear in multiple PO template files, if the same template is used accross different series of a a project.

The other four tables are used to store the actual translations and are therefore a mapping of PO files.

  • POTranslation is a simple look-up table and holds the actual translated strings.

  • TranslationMessage holds all the information about a translation to a specific language, like when it was done and by whom, if it was translated in Launchpad or imported from elsewhere, if it is currently used or just a suggestion, etc. For each POTMsgSet there may be multiple entries in this table, even for the same language, because any translation ever made is stored in the database, even if only the latest is actually used. The actual translation strings are referred to by their id in POTranslation.

  • Language is the set of all languages known in Launchpad. This table is not specific to Launchpad Translations as it is used in other parts of Launchpad, too.

  • POFile represents the set of translations into a certain language for a POTemplate. If it was created by importing a PO file, it also holds some information about that file. It is not linked directly to any translation but this relationship can be derived through the Language table.

To export a PO file from Launchapd, all tables in the diagram have to be queried to find out, which TranslationMessage entries belong into that file and to extract the actual strings that will be stored in the file.

Code structure

The source code for the Rosetta application is found in the Launchpad source tree at lib/lp/translations. The layout follows that of all Launchpad applications.

View

  • browser classes dealing with presentation logic and user interaction.

  • templates contains Zope Page Templates use by the objects from browser.

  • help documentation and help pages integrated with Rosetta.

  • emailtemplates Templates for various emails issued by Rosetta.

  • browser/tests unit tests for code from browser.

  • stories functional testing for code from browser. Written as doctests.

  • windmill high level tests using Windmill Python API.

  • lib/canonical/launchpad/javascript/translations/ YUI 3 javascript code.

Model

  • model objects mapping to relational database

  • doc function tests for code from model. Written as doctests.

  • tests unit tests for code from model.

Utilities

  • interfaces contains the Zope interface definitions and schema for the objects used by the application. You will find interfaces for each of the database tables described earlier. For example, potemplate.py contains IPOTemplate. scripts various helping scripts used in cronjobs or doing other utility and integration jobs. scripts/tests tests for code from scripts.

Implementation notes

Launchpad projects presented in the UI are mapped to IProduct, while projects groups from UI are mapped to IProject.

When dealing with Gettext translation you are handling po files and po templates. The main objects from Rosetta are POFile and POTemplate. These objects can be retrieve from the database using POFileSubset and POTemplateSubset

Translation groups and permission

For quality assurance, in Rosetta, we have translation groups and translation permission.

A translation group and permission is attached to each Project, Product and Distribution object.

Distributions and projects can only have one translation group and permission. Products can have their on translation group and permission, but they also inherit them from the project containing the product.

You can read more about translation groups and translation permission implementation and usage on Launchpad Help wiki.

Hands-on development

Getting and running Launchpad on your computer is pretty straightforward and well documented on the Launchpad Development Wiki

You can also use a Virtual Box harddisk image for a fully functional Launchpad instance. user: developer ,password: d3v3l0p3r

Discussions

Since it is not expected that many of the attendees have a ready-to-use LP development setup, this will have to be a prepared example of fixing a small bug in LP translations that gets presented in some way. Suggestions?

  • Bug suggestions
  • Idea 1
    • do a bzr export and provide those sources for download
    • create a pdf presentation describing fixing a bug, step by step
    • use Lernid to integrate the slide with the IRC session
  • Idea 2
    • pick a simple bug that will touch only touch a view and it's template
    • open the files in etherpad http://etherpad.com/

    • export the LP session from a local system (port forwarding... etc)

UbuntuDeveloperWeek/Sessions/LaunchpadTranslationsUnderTheHood (last edited 2010-01-27 17:06:58 by h194-54-129-79)