LPTranslate
Dev Week -- Launchpad Translations under the hood -- adiroiban and henninge -- Tue, Jan 26
UTC
1 [17:02] <dholbach> next up are henninge and adiroiban, the dynamic duo that'll explain "Launchpad Translations under the hood" to you!
2 [17:03] <henninge> Hi everybody, thanks for coming!
3 [17:03] <henninge> Today we want to help you understand the inner workings of the Launchpad Translations application (Rosetta) and take you for a walk through the source code. We hope that this will enable you to scratch your own itches you have about Rosetta and to contribute to its source.
4 [17:05] <henninge> So the intendend audience is
5 [17:05] <henninge> * Developers wanting to contribute to Launchpad Translations but are not yet familiar with the internal structure of the application.
6 [17:05] <henninge> * Interested maintainers of translations in Launchpad and translators that want to have a better understanding of how and why Launchpad Translations does what it does.
7 [17:05] <henninge> We hope that by the end of the session you have a good overview of
8 [17:05] <henninge> # how translation data is stored in LP translation (db schema)
9 [17:05] <henninge> # how the source code is organized
10 [17:05] <henninge> # what to expect when diving into the source code
11 [17:05] <henninge> # where to start when trying to hack on Launchpad Translations.
12 [17:06] <henninge> It is not the goal of this session to introduce you to Launchpad development in general. That will be covered in a different session by Karl Fogel.
13 [17:06] <henninge> We encourage you to visit that on Friday.
14 [17:06] <henninge> Thursday, sorry
15 [17:07] <henninge> Anyway, we are happy to take questions. We will keep the session open towards the end to see what questions come up and what your interests are.
16 [17:08] <henninge> We will be taking questions in intervalls, so cjohnston will nicely hold them until we ask for them. Thanks.
17 [17:08] <henninge> Go, adiroiban ;-)
18 [17:08] <adiroiban> Hi
19 [17:08] <adiroiban> I will continue with Gettext Basics
20 [17:08] <adiroiban> You need to understand how gettext is used to internationalize computer software. You should be familiar gettext documenation but we will give you a short run-through of those parts that are important for Rosetta.
21 [17:09] <adiroiban> = PO files =
22 [17:09] <adiroiban> Gettext stores translations in so-called portable object files, abreviated as PO files. They contain data sets of msgid and msgstr, the former containing the English original string, the later containing the translations of that string. They may be prepended by special comments that convey information about the string that is being translated, like in which source file it was found. Here is an example:
23 [17:09] <adiroiban> #: src/coding.c:123
24 [17:09] <adiroiban> msgid "Thank you"
25 [17:09] <adiroiban> msgstr "Merci"
26 [17:10] <adiroiban> Gettext states that msgid could be anything to indentify the string in the source code and not necessarily the English original string. Using the full English original string as the msgid, though, has proven to be the most convenient way to work on translations and is _the only form_ that is fully supported by Rosetta.
27 [17:10] <adiroiban> The first msgid in a PO file is empty and its msgstr contains meta information about the file. The minimum information here is the MIME Content-type of the file but usually a lot of other information is included, too.
28 [17:11] <adiroiban> Here it is
29 [17:11] <adiroiban> mmsgid ""
30 [17:11] <adiroiban> msgstr ""
31 [17:11] <adiroiban> "Project-Id-Version: PACKAGE VERSION\n"
32 [17:11] <adiroiban> "Report-Msgid-Bugs-To: \n"
33 [17:11] <adiroiban> "POT-Creation-Date: 2009-01-26 12:28+0000\n"
34 [17:11] <adiroiban> "PO-Revision-Date: 2009-01-26 12:28+0000\n"
35 [17:11] <adiroiban> "Last-Translator: Foo Bar <foo.bar@canonical.com>\n"
36 [17:11] <adiroiban> "Language-Team: French <fr@li.org>\n"
37 [17:11] <adiroiban> "MIME-Version: 1.0\n"
38 [17:11] <adiroiban> "Content-Type: text/plain; charset=UTF-8\n"
39 [17:11] <adiroiban> "Content-Transfer-Encoding: 8bit\n"
40 [17:11] <adiroiban> The standard naming convention for PO files is to use the language code, so in this case fr.po.
41 [17:11] <adiroiban> = Translation templates =
42 [17:12] <henninge> Questions so far?
43 [17:12] <cjohnston> henninge / adiroiban you aren't using slides are you?
44 [17:13] <adiroiban> no
45 [17:13] <cjohnston> < Navaneeth> QUESTION: Is gettext portable to other OS?
46 [17:13] <adiroiban> no slides
47 [17:13] <adiroiban> sorry
48 [17:13] <adiroiban> the content is on the wiki
49 [17:14] <henninge> Navaneeth: The PO format is machine independent.
50 [17:15] <adiroiban> Gettext is available on many operating systems
51 [17:15] <adiroiban> I don't think you need to worry for gettext portability
52 [17:15] <adiroiban> I will continue with PO Templates
53 [17:15] <adiroiban> When translatable strings are extracted from source code using xgettext or intltool, they are put into a file which is commonly referred to as the translation template. Its format is identical to that of a PO file but all the msgstr lines do not contain any translations. These files are intended to be used for creating new PO files, so they also contain the header information but with most fields left with empty or generic values.
54 [17:16] <adiroiban> Since a PO template is not really a separate file format it does not find much mention in the gettext documentation. Also, because its content can be generated from source any time (like during a build), most projects don't include it in their repository. Only PO files contain valuable information for a project, the translations themselves, and are therefore included in the source code repository.
55 [17:17] <adiroiban> The standard naming convention for PO templates is to use the translation domain with the extension .pot, for example myproject.pot.
56 [17:18] <adiroiban> How translations are done in the simple gettext context:
57 [17:18] <adiroiban> 1) To start a translation into a new language for a project, the following steps are necessary:
58 [17:19] <adiroiban> * Either the project maintainer or the translator creates a template from source code.
59 [17:19] <adiroiban> * The translator fills out the template with the translations for each msgid.
60 [17:19] <adiroiban> * The translator saves the file in the source tree as languagecode.po (lsee above), ususally in a directory called po.
61 [17:19] <adiroiban> * The translator or somebody with commit rights commits the file to the repository.
62 [17:19] <adiroiban> * Whenever the package is built, the translations are processed so that they are available at run-time (out of scope here).
63 [17:20] <adiroiban> 2) To change translations, the steps are simpler:
64 [17:20] <adiroiban> Sorry. Launchpad Workflow :)
65 [17:20] <adiroiban> When using Launchpad to translate a project, the steps are slightly different because the PO files are kept in Launchpad for the translators to work with. From Launchpad they are mirrored into the source tree to be used at build time.
66 [17:21] <adiroiban> * The project maintainer uploads the PO template file to Launchpad.
67 [17:21] <adiroiban> * Translators go to Launchpad to translate the English strings that now appear in the web interface.
68 [17:21] <adiroiban> * The project maintainer downloads all PO files whenever they want to, usually to prepare a release of the software.
69 [17:21] <adiroiban> * Nowadays the upload and download should happen automatically from and to Bazaar branches in Launchpad so that the maintainer always has a mirror of the latest translations in the branch, while changes to the PO template are automatically propagated to Launchpad. The next step will be automatic generation of PO templates from the source code in a Bazaar branch.
70 [17:21] <henninge> Questions?
71 [17:22] <henninge> cjohnston: Questions? ;)
72 [17:22] <cjohnston> I don't see any
73 [17:22] <henninge> cool
74 [17:22] <cjohnston> one just came
75 [17:22] <cjohnston> < Navaneeth> QUESTION: PO files need to deployed along with the application?
76 [17:23] <henninge> cjohnston: actually, they are converted to a binary format first "MO".
77 [17:23] <adiroiban> You will deploy the MO files (machine object)
78 [17:23] <henninge> Navaneeth: ^
79 [17:23] <henninge> ;)
80 [17:24] <henninge> Next, we'll go into how this structure is mapped into the Launchpad database.
81 [17:24] <henninge> jtv drew up this amazing diagram:
82 [17:24] <henninge> https://wiki.ubuntu.com/UbuntuDeveloperWeek/Sessions/LaunchpadTranslationsUnderTheHood?action=AttachFile&do=get&target=TranslationsSchema.png
83 [17:25] <cjohnston> < Omar871> QUESTION: What's a PO file?
84 [17:25] <henninge> Omar871: the actual translations are stored in PO files. It contains English strings and their translation.
85 [17:25] <henninge> s
86 [17:26] <henninge> That's what translators work with when translating.
87 [17:26] <henninge> Back to database schema.
88 [17:26] <henninge> You can see the main tables used for Launchpad Translations in the digram. PO templates are mapped into the database using four tables.
89 [17:27] <henninge> POMsgID is a look-up table for all the English strings that are being translated.
90 [17:27] <henninge> So it gives each string a numeric ID.
91 [17:27] <henninge> POTMsgSet holds all the data related to an original English string as found in a PO template, one database entry per msgid entry in the file. It refers to the actual English strings only by their IDs in POMsgID. This represent one paragraph/entry from a PO template (msgid, msgid_plural, context, comments).
92 [17:28] <henninge> POTemplate holds the meta data related to a PO template as it has been imported, most notably the original path name, the translation domain, the original header and a flag if this template is active or not.
93 [17:28] <henninge> TranslationTemplateItem is a linking table because of the n:m realtionship between POTMsgSet and POTemplate which message sharing introduces. Not only does a PO template file contain multiple msgid entries, the same msgid may also appear in multiple PO template files, if the same template is used accross different series of a a project.
94 [17:28] <henninge> Questions?
95 [17:28] <henninge> cjohnston: ^
96 [17:29] <cjohnston> < n3rd> QUESTION: Can we not give auto suggestions/transitions like Rosy with the DB and PO ?
97 [17:29] <henninge> n3rd: I don't know Rosy but we do have "external suggestions" in Rosetta.
98 [17:30] <henninge> That is one of the key features of it, actually.
99 [17:30] <henninge> If a string has already been translated in a different package or project, it's will be suggested to the translator.
100 [17:31] <henninge> next question
101 [17:31] <cjohnston> < bullgard45> QUESTION: How do you take care of that the same English phrase needs to be translated differently depending on the program at hand?
102 [17:31] <adiroiban> Just add a new translation
103 [17:31] <henninge> bullgard45: that's why they are just presented as "suggestions". It still takes a reviewer to acutally accept it as a translation to use.
104 [17:32] <adiroiban> bullgard45: a program will use a certain PO file
105 [17:33] <adiroiban> and you can have the same string in 2 PO files
106 [17:33] <adiroiban> and for each PO file to have different translations
107 [17:33] <henninge> in database terms:
108 [17:34] <henninge> You can have multiple TranslationMessage entry related to one POTMsgSet entry. I will come to that in a minute.
109 [17:34] <henninge> next question?
110 [17:34] <cjohnston> < n3rd> QUESTION:Would it not result in overhead when there are many revisions for the same sentence, and finally refined to perfect translation?
111 [17:35] <henninge> n3rd: it does ;-)
112 [17:35] <henninge> n3rd: we have a lot of entries in the POTranslation table.
113 [17:36] <henninge> but since we consider all those refining steps as contributions we keep them for reference's sake.
114 [17:36] <henninge> next questions?
115 [17:36] <cjohnston> < n3rd> QUESTION : So the alternative is ?
116 [17:37] <henninge> n3rd: none I am aware of atm...
117 [17:37] <henninge> We have been cleaning out translations of discontinued Ubuntu series.
118 [17:38] <henninge> Also, for most cases there are not that many iterations, especially if the translations are imported from upstream.
119 [17:38] <henninge> more questions?
120 [17:38] <cjohnston> < Emilien> Question: in a project that is oenly translated, if I modify an already translated string, does it overwrite the old translation, or is it stored as a new entry?
121 [17:38] <henninge> Emilien: new entry.
122 [17:39] <henninge> A new TranslationMessage and POTranslation entry is created.
123 [17:39] <henninge> Let me add my explanations for the remaining four tables that all deal with this:
124 [17:39] <henninge> POTranslation is a simple look-up table and holds the actual translated strings
125 [17:39] <henninge> TranslationMessage holds all the information about a translation to a specific language, like when it was done and by whom, if it was translated in Launchpad or imported from elsewhere, if it is currently used or just a suggestion, etc. For each POTMsgSet there may be multiple entries in this table, even for the same language, because any translation ever made is stored in the database, even if only the latest is actually used.
126 [17:39] <henninge> The actual translation strings are referred to by their id in POTranslation.
127 [17:40] <henninge> Language is the set of all languages known in Launchpad. This table is not specific to Launchpad Translations as it is used in other parts of Launchpad, too.
128 [17:40] <henninge> POFile represents the set of translations into a certain language for a POTemplate. If it was created by importing a PO file, it also holds some information about that file. It is not linked directly to any translation but this relationship can be derived through the Language table.
129 [17:40] <henninge> Questions?
130 [17:40] <cjohnston> < bullgard45> QUESTION: I have seen a lot of po files in the WWW. How are they related to Launchpad?
131 [17:41] <henninge> bullgard45: Can you be more specific about WWW, please?
132 [17:42] <henninge> as in: I did not understand the question ...
133 [17:42] <henninge> another question ?
134 [17:42] <cjohnston> That's it for nwo
135 [17:42] <cjohnston> now
136 [17:42] <henninge> adiroiban, take us into the code, please.
137 [17:43] <adiroiban> http://bazaar.launchpad.net/~launchpad-pqm/launchpad/db-devel/files/head:/lib/lp/translations/
138 [17:43] <adiroiban> OK. we should see the Launchpad Translation folder structure
139 [17:44] <adiroiban> I will not describe them in alphabetic order
140 [17:44] <adiroiban> == Model ==
141 [17:44] <adiroiban> * _interfaces_ contains the Zope interface definitions and schema for the objects used by the application. You will find interfaces for each of the database tables described earlier. For example, potemplate.py contains IPOTemplate.
142 [17:44] <adiroiban> * _doc_ contains function tests for code from model. Written as doctests.
143 [17:44] <adiroiban> * _model_ contains objects mapping to relational database using storm.
144 [17:44] <adiroiban> * _tests_ contains unit tests for code from model.
145 [17:45] <adiroiban> Next we have the folder containing the view layer
146 [17:45] <adiroiban> == View ==
147 [17:45] <adiroiban> * _browser_ contains classes dealing with presentation logic and user interaction.
148 [17:45] <adiroiban> * _browser/tests_ contains unit tests for code from browser.
149 [17:45] <adiroiban> * _emailtemplates_ contains templates for various emails issued by Rosetta.
150 [17:46] <adiroiban> * _help_ contains documentation and help pages integrated with Rosetta.
151 [17:46] <adiroiban> * _stories_ constains functional tests for code from browser. Written as doctests.
152 [17:46] <adiroiban> * _templates_ contains Zope Page Templates use by the objects from browser.
153 [17:46] <adiroiban> * _windmill_ contains tests for javascript code using Windmill Python API.
154 [17:46] <adiroiban> * _lib/canonical/launchpad/javascript/translations/_ contains YUI 3 javascript code.
155 [17:46] <adiroiban> and finaly some utilities
156 [17:46] <adiroiban> == Utilities ==
157 [17:46] <adiroiban> * _scripts_ contains various helping scripts used in cronjobs or doing other utility and integration jobs.
158 [17:46] <adiroiban> * _scripts/tests_ contains tests for code from scripts.
159 [17:46] <adiroiban> * _utilities_ contains utility classes used in model and browser code, mostly data conversion related.
160 [17:47] <adiroiban> Hope you have noticed that almoust all code has the coresponding tests
161 [17:48] <henninge> Questions?
162 [17:48] <cjohnston> 12:43 < bullgard45> hennige: I have seen many .po files having English text of GNOME programs and their German translations. I could make good use of some of them. I do not have a particular URL at hand just now but I can provide one in a moment.
163 [17:48] <cjohnston> 12:43 < bullgard45> hennige: I have seen many .po files having English text of GNOME programs and their German translations. I could make good use of some of them. I do not have a particular URL at hand just now but I can provide one in a moment.
164 [17:49] <henninge> cjohnston: you can find those files officially on the Damned LIes website of gnome.
165 [17:49] <henninge> http://l10n.gnome.org/
166 [17:50] <henninge> http://l10n.gnome.org/teams/de
167 [17:50] <henninge> They are all imported into Launchpad to be included in the Ubuntu language packs.
168 [17:50] <henninge> We only edit them in Launchpad Translations if we need to differ from standard Gnome.
169 [17:51] <henninge> More questions?
170 [17:51] <cjohnston> < Emilien> QUESTION: (but maybe keep it for later, as it's not relevant right now): When there is only one series, and one translation template, why couldn't we get the template page as the default
171 [17:51] <cjohnston> translations page. Example: with project BaShare, you have 3 translation pages: root https://translations.launchpad.net/bashare , series
172 [17:51] <cjohnston> https://translations.launchpad.net/bashare/trunk/+translations and template https://translations.launchpad.n
173 [17:51] <cjohnston> < Emilien> et/bashare/trunk/+pots/bashare . The last page is IMHO the most useful, as it contains links to untranslated, need review or changed strings. This could be the default page when there is
174 [17:51] <cjohnston> only one series and one template... What is your opinion about that (and sorry for long question)
175 [17:52] <henninge> Emilien: that sounds like a good idea as long as it is clear to viewer what is going on.
176 [17:53] <henninge> Emilien: You can describe that in a bug, discuss it with us and even go about and fix it ... ;-)
177 [17:53] <henninge> More questions?
178 [17:53] <cjohnston> < bullgard45> QUESTION: hennige: An example for a po file that I just have found in the WWW, is http://www.mail-archive.com/xfce4-commits@xfce.org/msg08020.html
179 [17:54] <henninge> That is actually a diff of PO file ... ;)
180 [17:55] <henninge> This should be imported into Launchpad if they'd want to use it for translations. https://edge.launchpad.net/xfce
181 [17:55] <henninge> But AFAIR they decided against using Launchpad for translations.
182 [17:56] <henninge> Another question?
183 [17:56] <cjohnston> < Emilien> Question: I like how all the parts have their own tests. Are they automatically executed in some way?
184 [17:56] <henninge> Ah, the great Launchpad test suite, Emilien!
185 [17:57] <henninge> Run "make check" to run the whole test suite. Takes a few hours.
186 [17:57] <henninge> Each branch that is submitted to the LP tree needs to have had this run.
187 [17:58] <henninge> Then we have the buildbot that continously runs the test suite on the trunk with every new commit.
188 [17:58] <henninge> One more question?
189 [17:58] <cjohnston> < bullgard45> QUESTION: I understood that you considered the translations provided by Launchpad as proposals or a help for a human translator. This somewhat contradicts the attitude of a German
190 [17:58] <cjohnston> translator team to admit or restrict would-be human helpers. (But I cannot provide the document that proves such a censorship.) I do understand that a high standard of qualtiy
191 [17:58] <cjohnston> translations must be maintained.
192 [17:59] <henninge> bullgard45: I think what you are referring to is this:
193 [18:00] <henninge> Anybody can submit suggestions in Launchpad but only those on the translation team are reviewers and can actually accept them to be used.
194 [18:00] <henninge> This requeires some deal of organization to keep the translations consistent.
195 [18:01] <henninge> But actually, bullgard45, I don't see a question in there ... ;-)
196 [18:01] <henninge> But, we have run out of time.
197 [18:01] <cjohnston> Thanks henninge and adiroiban !
198 [18:01] <henninge> Thank you all for coming. Come and talk to us on #launchpad any time. Thanks.
MeetingLogs/devweek1001/LPTranslate (last edited 2010-01-29 10:06:25 by i59F765F3)