Translations

Revision 1 as of 2010-07-13 19:01:31

Clear message

Dev Week -- «I Don't Know Anything About Translations» -- dpm -- Tue, Jul 13th, 2010

(01:01:39 PM) dpm: hi everyone
(01:01:57 PM) dpm: and thanks shadeslayer for a great session
(01:02:31 PM) dpm: so, welcome everyone to this session on translations
(01:02:46 PM) dpm: In the next hour we'll be learning about some basic concepts concerning natural language support, how translations work in Ubuntu at the technical level and how they work for other projects hosted in Launchpad.
(01:03:17 PM) dpm: This is a very broad subject and there are lots of resources to learn from on the net. My intention on this session is just to give you an overview of the basic concepts and concentrate on the main technologies and tools for making Ubuntu translatable.
(01:03:58 PM) dpm: I'll leave some additional time for questions, but feel free to ask your questions in between as well.
(01:04:08 PM) dpm: So without further ado, let's get the ball rolling.
(01:04:17 PM) dpm: == Why Translations ==
(01:04:31 PM) dpm: While this might seem obvious to some people, I'd like to start highlighting once more the importance of translations -or natural language support to be more precise.
(01:04:50 PM) dpm: One of the principles that unite the Ubuntu community is providing an Operating System for human beings.
(01:05:07 PM) dpm: Some of these human beings might understand and speak English, which is the original language in which the OS is developed.
(01:05:30 PM) dpm: However, there is still a large number of users who need Ubuntu to be available in their own language to be able to use it at all.
(01:05:53 PM) dpm: If you are an English speaker, you can think about it the other way round to get an idea:
(01:06:07 PM) dpm: imagine your operating system would be developed in a language you don't know - let's take Japanese.
(01:06:25 PM) dpm: Would you be able to choose the right menus in a foreign language, or even understand the messages the OS is showing you?
(01:06:46 PM) dpm: If you provide internationalization support to your applications, more people will be able to translate them and to actually use them
(01:07:04 PM) dpm: Setting up an application for internationalization is easier than you might think.
(01:07:22 PM) dpm: It is generally a one-off process and it's best done from the moment you start creating your application.
(01:07:46 PM) dpm: The rest is simply maintenance - exposing translatable strings to translators and fetching translations.
(01:08:02 PM) dpm: What prompted me to run such a session was precisely the many times I've heard the session's title from developers:
(01:08:18 PM) dpm:     «I don't know anything about translations»
(01:08:31 PM) dpm: So let's try to cast some light on that and hopefully change the statement so that next time someone brings the subject we hear something more along the lines of
(01:08:42 PM) dpm:     «Translations are awesome»
(01:08:51 PM) dpm: Yeah, that'll do :)
(01:09:08 PM) dpm: == Basic concepts ==
(01:09:24 PM) dpm: Let's continue with some basic concepts
(01:09:41 PM) dpm: I'll quickly run through them, so I won't go into details, but please, feel free to interrupt if you've got any questions.
(01:10:07 PM) dpm: * Internationalization (i18n): is basically the process of making your application multilingual. This is something you as a developer will be doing while hacking at your app. It is mostly a one-off process, and in most cases it simply involves initializing the technologies used for this purpose.
(01:10:10 PM) elpasmo1 is now known as ElPasmo
(01:10:40 PM) dpm: * Localization (l10n): that's what translators will be doing, which is adapting internationalized software to a specific region and language. Most of the work here goes into actually translating the applications
(01:11:17 PM) dpm: * Gettext: that's the underlying framework to make your applications translatable. It provides the foundations and it is the most widely used technology to enable translations of Open Source projects. In addition, it defines a standard file format for translators to do their work and for the application to load translations, as well as providing tools to work with these.
(01:11:50 PM) dpm: Related to gettext, we've also got:
(01:12:18 PM) dpm: * PO files: these are text files with a defined format basically consisting of message pairs - the first one the original string in English and the next one the translation. E.g:
(01:12:41 PM) dpm: msgid "Holy cow, is that a truck coming towards me?"
(01:12:41 PM) dpm: msgstr "Blimey, is that a lorry coming towards me?"
(01:13:20 PM) dpm: they are often simply referred to as "translations", and are what translators work with, either with a text editor, a dedicated PO file editor, or with an online interface such as Launchpad Translations. They are named after language codes (e.g. en_GB.po, ca.po, hu.po) and are kept in the code as the source files to generate MO files.
(01:14:09 PM) dpm: * MO files: binary files created at build time from PO files and installed in a particular system location (e.g. /usr/share/locale). These are where the applications will actually load translations from.
(01:14:42 PM) dpm: * POT files: also called templates, have got the same format as PO files, but the messages containing the translations are empty. Developers provide the templates with the latest messages from the applications, on which the PO files will be based on. There is generally one template (POT file) and many translations (PO files), and it usually carries the name of the application (mycoolapp.pot)
(01:15:31 PM) dpm: so assuming you've got all your translations-related files under a 'po' directory, it would look like:
(01:15:40 PM) dpm: po/mycoolapp.pot
(01:15:47 PM) dpm: po/ca.po
(01:15:52 PM) dpm: po/es.po
(01:15:59 PM) dpm: po/it.po
(01:16:01 PM) dpm: ...
(01:16:43 PM) dpm: so you can see how from a single POT file translators (or Launchpad) create the PO files for their particular language
(01:17:09 PM) dpm: oh, and in a POT file the message pairs will look like this:
(01:17:13 PM) dpm: msgid "Holy cow, is that a truck coming towards me?"
(01:17:13 PM) dpm: msgstr ""
(01:17:59 PM) dpm: You can see a real one here to get an idea: http://l10n.gnome.org/POT/evolution.master/evolution.master.pot
(01:18:44 PM) dpm: And a particular translation: http://l10n.gnome.org/POT/evolution.master/evolution.master.ca.po
(01:19:09 PM) dpm: * Translation domain: this is a name which will be used to build a unique URI where to fetch the translations from. E.g. /usr/share/locale/<langcode>/LC_MESSAGES/<domain>. It will be set in the code or as a build sysem variable and generally be the name of the application in lowercase. The POT template will also be generally named after the domain.
(01:20:08 PM) ClassBot: umang asked: gettext is a gnu software. Can I use it in a PyQt application, say?
(01:20:31 PM) dpm: yes, you'll be able to use it
(01:20:43 PM) dpm: but it might be tricky to set up
(01:21:08 PM) dpm: since all the makefile rules related to gettext are geared towards autotools
(01:21:15 PM) dpm: but it is definitely possible
(01:21:51 PM) ClassBot: devcando85 asked: What .PO and .MO stands for?
(01:22:05 PM) dpm: PO stands for Portable Object
(01:22:14 PM) dpm: MO ... err...
(01:22:23 PM) dpm: I'd have to look it up :)
(01:22:35 PM) dpm: Message Object perhaps
(01:22:49 PM) ClassBot: csigusz asked: If I didn't translate one message, than what will be appear in the application?
(01:23:18 PM) dpm: the application will show the original English message if there is no translation
(01:23:25 PM) dpm: that will always be the fallback
(01:23:47 PM) ClassBot: zyga asked: will there be a section specific to working with web applications? such as django-based web applications? Often distributing and installing such applications is done differently and gettext with its strict rules as to where to find translations is annoying to work with
(01:24:16 PM) dpm: I haven't planned this for this session, but that'd be a great idea for another (full one)
(01:24:42 PM) dpm: some web apps use either gettext or their own implementation (full or partial) of the gettext api
(01:25:01 PM) dpm: so many of the concepts (po files, mo files, domain, etc) still apply
(01:25:45 PM) dpm: ok, let's continue
(01:25:54 PM) dpm: I'll try to answer the rest of the questions later on
(01:26:57 PM) dpm: ah, Rhonda tells me that MO stands for Machine Object. There you go, thanks :)
(01:27:38 PM) dpm: Let's go on with a final couple of basic concepts/tools:
(01:27:44 PM) dpm: * Intltool: it's a tool that provides a higher level interface to gettext and allows it handling file formats otherwise not supported (.desktop files, .policy files, etc.)
(01:28:07 PM) dpm: * Launchpad Translations: collaborative online translation tool for Open Source projects, part of Launchpad and available at https://translations.launchpad.net. It allows translating Operating Systems such as Ubuntu, as well as single projects. For translators, it hides the technical complexity associated with file formats and tools, and allows them easily translationg applications online without prior technical knowledge. For developers, it provide
(01:28:07 PM) dpm: s code hosting integration, which greatly facilitates the development workflow
(01:28:28 PM) dpm: There are more technologies associated with other i18n aspects - font rendering, to mention an important one - but we'll not be looking at them today.
(01:29:19 PM) dpm: From those concepts, technologies and tools, the main ones to retain for this session are gettext and Launchpad Translations
(01:29:30 PM) dpm: Another important concept is the translation workflow. Traditionally, this has been as follows:
(01:29:47 PM) dpm: 1. Some time before release (e.g. 2 weeks), the developer announces a string freeze and, release date and produces a POT template with all translatable messages. This allows translators to start doing their work with stable messages
(01:30:01 PM) dpm: 2. Translators do the actual translations and sent them back to the project (either committing them, sending them per e-mail or simply saving them in Launchpad)
(01:30:15 PM) dpm: 3. Before release, the developer makes sure the latest translations (the PO files) are in the source tree and releases the tarball
(01:30:29 PM) dpm: Launchpad make some of those steps less rigid and easier both for translators and developers - the online interface and automatic translation commits ensures that translations get to the project automatically and nearly immediately. Automatic template generation allows the templates to be always up to date. More on that later on.
(01:30:39 PM) dpm: == Ubuntu Translations ==
(01:30:50 PM) dpm: Ubuntu is translated in Launchpad at https://translations.launchpad.net/ubuntu by the Ubuntu translators, which work in per-language translation teams that constitute the heart of the Ubuntu Translations community.
(01:31:06 PM) dpm: You can see all teams here:
(01:31:07 PM) dpm:   https://translations.launchpad.net/+groups/ubuntu-translators
(01:31:25 PM) dpm: Each team has their own communication method, and they coordinate globally through the ubuntu-translators mailing list.
(01:31:43 PM) dpm: So if as a developer you need to announce anything to translators, or ask a question, the mailing list at https://lists.ubuntu.com/mailman/listinfo/ubuntu-translators is the place to go to.
(01:31:59 PM) dpm: All Ubuntu applications -and Ubuntu-specific documentation- can thus be translated from a central location and with an easy to use online interface that greatly owers the barrier to contribution.
(01:32:24 PM) dpm: Let's get a bit more technical and talk about the workflow of Ubuntu translations
(01:32:33 PM) dpm: A couple of important point first:
(01:32:44 PM) dpm: * Ubuntu is translated in Launchpad at https://translations.launchpad.net/ubuntu
(01:32:55 PM) dpm: * This only applies to Ubuntu packages in the main and restricted repositories
(01:33:06 PM) dpm: * Translations are shipped independently from the applications in dedicate packages called language packs. There is a set of language packs for each language.
(01:33:21 PM) dpm: * Language packs allow separation between application and translations and shipping separate updates without the need to release new package versions.
(01:33:47 PM) dpm: Ok, let's have a look at the Ubuntu translations lifecycle:
(01:33:54 PM) dpm: It all starts with an upstream project being packaged and uploaded to the archive
(01:34:15 PM) dpm: If that package is either in main or restricted, it will be translatable in Ubuntu and will go through this whole process
(01:34:28 PM) dpm: Upon upload, the package will be built and its translations (the PO files from the source package plus the POT template) will be extracted and put into a tarball
(01:34:45 PM) dpm: The pkgbinarymangler package takes care of doing this
(01:34:59 PM) dpm: This tarball will then be imported into Launchpad, entering the translations import queue for some sanity checking before approval. It is important at this point that the tarball contains a POT template, otherwise it will not be imported.
(01:35:43 PM) dpm: here's what the imports queue looks like
(01:35:46 PM) dpm:   https://translations.launchpad.net/ubuntu/lucid/+imports?field.filter_status=NEEDS_REVIEW&field.filter_extension=pot&batch=90
(01:35:53 PM) dpm: (for Lucid)
(01:36:09 PM) dpm: After approval, both the template and the translations will be imported and exposed in Launchpad, making them available from an URL such as:
(01:36:21 PM) dpm: https://translations.launchpad.net/ubuntu/<distrocodename>/+source/<sourcepackage>/+pots/<templatename>
(01:36:37 PM) dpm: Here is for example how it looks like for the evolution source package:
(01:36:49 PM) dpm:   https://translations.launchpad.net/ubuntu/lucid/+source/evolution/+pots/evolution
(01:37:13 PM) dpm: From this point onwards, after translations have been exposed, translators can do their work.
(01:37:28 PM) dpm: While they are doing this, and on a periodical basis, translations are exported from Launchpad in a big tarball containing all languages and fed to a program called langpack-o-matic
(01:37:44 PM) dpm: Langpack-o-matic takes the translations exported as sources and creates the language packs, one set for each language. These are the packages which contain the translations in binary form and will ultimately be shipped to users, finally closing the translation loop.
(01:38:02 PM) dpm: So that was it. Basically, for an application to be translatable in Ubuntu:
(01:38:13 PM) dpm: * It must have internationalization support
(01:38:21 PM) dpm: * It must be either in main or restricted
(01:38:33 PM) dpm: * Its package must create a POT template during build (here's how: https://wiki.ubuntu.com/UbuntuDevelopment/Internationalisation/Packaging#TranslationTemplates)
(01:38:57 PM) dpm: If you want to learn more about this, you'll find more info here as well:
(01:39:11 PM) dpm:   https://wiki.ubuntu.com/Translations/TranslationLifecycle
(01:39:11 PM) dpm:   https://wiki.ubuntu.com/Translations/Upstream
(01:39:11 PM) dpm:   https://wiki.ubuntu.com/UbuntuDevelopment/Internationalisation/Packaging
(01:39:11 PM) dpm:   https://wiki.ubuntu.com/MaverickReleaseSchedule
(01:39:48 PM) dpm: == Translation of Projects ==
(01:40:07 PM) dpm: So we've seen how an Operating System such as Ubuntu can be translated in Launchpad
(01:40:14 PM) dpm: But what about individual projects? How can they be internationalized and localized?
(01:40:29 PM) dpm: There are many programming languages, build systems and possible configurations, so let's try to see a general overview on the steps for adding i18n support to an app and getting it translated.
(01:40:50 PM) dpm: * Gettext initialization - the code will have to add a call to the gettext initialization function and set the translation domain. This generally means adding a few lines of code to the main function of the program. Here's a simple example in Python:
(01:41:05 PM) dpm:   import gettext
(01:41:06 PM) dpm:   _ = gettext.gettext
(01:41:06 PM) dpm:   gettext.install('myappdomain', '/usr/share/locale')
(01:41:24 PM) dpm: This is a very basic setup. Depending on your build system -if you are using one-, you might have to modify some other files as well
(01:41:49 PM) dpm: * Marking translatable strings - you'll then need to mark strings to be translated. This will be as simple as enclosing the strings with _(), which is simply a wrapper for the gettext function
(01:42:07 PM) dpm: * Create a 'po' folder to contain translations (po files) and a template (pot file)
(01:42:22 PM) dpm: (remember the layout I was mentioning earlier on)
(01:42:46 PM) dpm: Roughly, up to here the package will have internationalization support. Let's now see how we can make it translatable for translators to do their work
(01:43:06 PM) dpm: * Updating the .pot template - the translatable strings will need to be extracted from the code and put into the POT template to be given to translators. There are several ways to do this:
(01:43:26 PM) dpm: a) you can use the gettext tools directly (calling the xgettext program)
(01:43:44 PM) dpm: b) you can invoke intltool directly -if you are using it- with 'intltool-update -p -g mycoolapp'
(01:43:57 PM) dpm: c) using a make rule to do this for you: with autotools you can use 'make $(DOMAIN).pot-update' or 'make dist'; with python-distutils-extra you can use ./setup.py -n build_i18n
(01:44:16 PM) dpm: I'd recommend the latest, as having a build system will greatly simplify maintenance
(01:44:36 PM) dpm: If you are using intltool in a standard layout, you can even let Launchpad do the work for you and build the templates automatically
(01:44:54 PM) dpm: check out this awesome feature here: http://blog.launchpad.net/translations/automatic-template-generation
(01:45:27 PM) dpm: The best integration and workflow is achieved when your project's code is hosted in Launchpad and using bzr, as either committing a new template or letting Launchpad generate it for you will automatically expose it to translators
(01:45:55 PM) dpm: in a location such as https://translations.launchpad.net/ubuntu/<distrocodename>/+source/<sourcepackage>/+pots/<templatename>
(01:46:27 PM) dpm: see the Getting Things GNOME translations for a real example:
(01:46:32 PM) dpm: https://translations.launchpad.net/gtg/trunk/+pots/gtg
(01:47:52 PM) dpm: Setting up a project for translations in Launchpad involves enabling translations, activating the template you (or Launchpad) have created and optionally enabling the bzr integration features
(01:48:09 PM) dpm: These are fairly easy steps
(01:48:45 PM) dpm: so I'll just direct you to https://help.launchpad.net/Translations/YourProject/BestPractices and leave the last few mins for questions
(01:49:03 PM) ClassBot: arjunaraoc asked: what is the minimum translation required to include the language in boot options for Ubuntu?
(01:49:53 PM) dpm: I believe there is not a minimum for the bootloader package. The minimum is the translation coverage of the debian-installer package
(01:50:35 PM) dpm: I'd recommend you check https://wiki.ubuntu.com/Translations/KnowledgeBase/DebianInstaller or ask on the ubuntu-translators mailing list
(01:50:56 PM) dpm: Moomoc: Is documentation in Ubuntu always translated with gettext? Isn't this a bit arduous?
(01:51:35 PM) dpm: actually, translating using with gettext isn't arduous, but rather more comfortable for translators. The tricky part of translating documentation
(01:51:59 PM) dpm: is converting from the documentation format to the gettext format, which is the one translators are used to
(01:52:16 PM) dpm: fortunately, there are several tools to make this easier:
(01:52:28 PM) dpm: xml2po or po4all are two good ones
(01:52:43 PM) ClassBot: inquata asked: Are there plans to support http://open-tran.eu/ by providing Ubuntu strings?
(01:53:16 PM) dpm: There aren't right now, but if you've got an idea on how this could be implemented, a blueprint would be most welcome
(01:53:40 PM) dpm: Remember that Launchpad is Open Source: https://dev.launchpad.net/
(01:53:56 PM) dpm: and any contributions are really welcome
(01:54:12 PM) ClassBot: umang asked: I seem to have missed something about gettext. Are the .po /.mo files accessed at runtime depending on the user's language or are they integrated into separate builds of the same program? If I've understood correctly it's the former.
(01:54:42 PM) dpm: .po files are source files, so they aren't used at run time
(01:54:58 PM) dpm: the .mo files are generated at build time from the .po files
(01:55:20 PM) dpm: then installed in the system (generally at /usr/share/locale ...)
(01:55:43 PM) dpm: and applications using gettext pick them up at runtime to load the translations from them
(01:56:14 PM) dpm: Rhonda also tells me: One important thing to note about MO: Even though it's byte encoded and can be big-endian and little-endian, gettext is sane enough to be able to use _both_ no matter what system it runs on. So the MO format actually is still architecture independent even though the data isn't really.
(01:56:49 PM) dpm: We've got time for one or two questions still, anyone?
(01:58:32 PM) ClassBot: arjunaraoc asked: debian-installer has not been setup in Launchpad so far for Telugu. Is it better to do translation outside?
(01:59:05 PM) dpm: it is in Launchpad, but yes, I'd recommend doing translations upstream in Debian for that particular one
(01:59:37 PM) dpm: it is a complex package and does not use a conventional layout
(02:00:06 PM) dpm: A final note Rhonda mentioned to me as well: The package python-polib helps with respect to using gettext catalogues in python
(02:00:17 PM) dpm: ok, so that was it!
(02:00:31 PM) dpm: Thanks a lot for listening and for the interesting questions