LaunchpadTranslationsUnderTheHood

Differences between revisions 4 and 5
Revision 4 as of 2010-01-21 08:08:12
Size: 3958
Editor: h194-54-129-79
Comment:
Revision 5 as of 2010-01-25 09:03:00
Size: 5308
Editor: d210109
Comment:
Deletions are marked like this. Additions are marked like this.
Line 49: Line 49:
The first ''msgid'' in a PO file is empty and its ''msgstr'' contains meta information about the file. The minimum information here is the MIME Content-type of the file but usually a lot of other information is included, too.
Line 50: Line 51:
{{{
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-26 12:28+0000\n"
"PO-Revision-Date: 2009-01-26 12:28+0000\n"
"Last-Translator: Foo Bar <foo.bar@canonical.com>\n"
"Language-Team: French <fr@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
}}}
Line 53: Line 67:
When translatable strings are extracted from source code using xgettext or intltool, they are put into a file which is commonly referred to as the translation template. The format is machine independent, When translatable strings are extracted from source code using xgettext or intltool, they are put into a file which is commonly referred to as the translation template. Its format is identical to that of a PO file but all the ''msgstr'' lines do not contain any translations. These files are intended to be be used to create new PO files, so they also contain the header information but with most fields left with empty or generic values.

Since a PO template is not really a separate file format it does not find much mention in the gettext documentation. Also, because its content can be generated from source any time (like during a build), most projects don't include it in their repository. Only PO files contain valuable information for a project, the translations themselves, and are therefore included in the source code repository.

=== Gettext workflow ===

To start a translation into a new language for a project, a new

UbuntuDeveloperWeek session

by AdiRoiban and HenningEggers on Tuesday, 20 Jan 2010, 17h

Intended audience

  • Developers wanting to contribute to Launchpad Translations but are not yet familiar with the internal structure of the application.
  • Interested maintainers of translations in Launchpad and translators that want to have a better understanding of how and why Launchpad Translations does what it does.

Required knowledge

  • GNU gettext system for internationalization of software
  • Python coding
  • A general understanding of how a web application works
  • Knowledge of zope is not required but a bonus

Goals of the session

Session attendees have a good understanding of

  • how translation data is stored in LP translation (db schema)
  • how the import and approval process works (translationimportqueue)
  • how permissions and translation groups work (translation groups)
  • how review and suggestion handling works (POFile:+translate)

It is not the goal of this session to introduce the attendees to Launchpad development in general. That will be covered in a different session by Karl Fogel

The session text will be used as developer documentation on the Launchpad development wiki so this is a change for us to gather input from the community.

Session text

Today we want to help you understand the inner workings of the Launchpad Translations application (Rosetta) and take you for a walk through the source code. We hope that this will enable you to scratch your own itches you have about Rosetta and to contribute to its source.

Gettext basics

You need to understand how gettext ist used to internationalize computer software. You should be familiar gettext documenation but we will give you a short run-through of those parts that are important for Rosetta.

PO files

Gettext stores translations in so-called portable object files, abbriviated as PO files. They contain data sets of msgid and msgstr, the former containing the English orignal string, the latter containing the translation of that string. They may be prepended by special comments that convey information about the string that is being translated, like in which source file it was found. Here is an example:

#: src/coding.c:123
msgid "Thank you"
msgstr "Merci"

Gettext states that msgid could be anything to indentify the string in the source code and not necessarily the English original string. Using the full English original string as the msgid, though, has proven to be the most convenient way to work on translations and is the only form that is fully supported by Rosetta.

The first msgid in a PO file is empty and its msgstr contains meta information about the file. The minimum information here is the MIME Content-type of the file but usually a lot of other information is included, too.

msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2009-01-26 12:28+0000\n"
"PO-Revision-Date: 2009-01-26 12:28+0000\n"
"Last-Translator: Foo Bar <foo.bar@canonical.com>\n"
"Language-Team: French <fr@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

Translation templates

When translatable strings are extracted from source code using xgettext or intltool, they are put into a file which is commonly referred to as the translation template. Its format is identical to that of a PO file but all the msgstr lines do not contain any translations. These files are intended to be be used to create new PO files, so they also contain the header information but with most fields left with empty or generic values.

Since a PO template is not really a separate file format it does not find much mention in the gettext documentation. Also, because its content can be generated from source any time (like during a build), most projects don't include it in their repository. Only PO files contain valuable information for a project, the translations themselves, and are therefore included in the source code repository.

Gettext workflow

To start a translation into a new language for a project, a new

  1. What are PO files, PO Templates, message ids and message strings? (short review of the gettext system)

  2. How this has been mapped to the db schema. (jtv's great schema diagram goes here.)
  3. ...

Hands-on development

Since it is not expected that many of the attendees have a ready-to-use LP development setup, this will have to be a prepared example of fixing a small bug in LP translations that gets presented in some way. Suggestions?

  • Bug suggestions
  • Idea 1
    • do a bzr export and provide those sources for download
    • create a pdf presentation describing fixing a bug, step by step
    • use Lernid to integrate the slide with the IRC session
  • Idea 2
    • pick a simple bug that will touch only touch a view and it's template
    • open the files in etherpad http://etherpad.com/

    • export the LP session from a local system (port forwarding... etc)

UbuntuDeveloperWeek/Sessions/LaunchpadTranslationsUnderTheHood (last edited 2010-01-27 17:06:58 by h194-54-129-79)