TranslationProcess

TranslationProcess

Status

Introduction

Translation of open source software has traditionally been a collection of community efforts. One of Canonical's focuses is to offer a complete translation of the Ubuntu desktop as a service. A pilot project, translating Ubuntu into the South African language Xhosa, was carried out from January - March 2005. The learning from this pilot project as well as requirements and recommendations for future translation projects are discussed in this document.

Rationale

In order to offer professional translation services, processes need to be agreed upon and enforced to ensure that translations can be completed on time for the Ubuntu release for which they are intended. The open source landscape is extremely dynamic, which results in constant fluctuation of the translation work to be done for a particular release. The purpose of this exercise is to determine an implementable process that can provide:

  • reliable estimates of the resources required for translation work (for planning and budgeting purposes)
  • a reliable and up-to-date source of Ubuntu translation templates
  • enforceable freeze dates for a more stable and predictable translation environment
  • additional tools to aid translation (e.g. profiling and validation)
  • smooth integration with Rosetta for offline translating

Scope and Use Cases

This document, and modifications of processes that it recommends, is intended for internal Canonical use, to aid in the planning and execution of future translation projects. However, improvements in Canonical's processes will automatically benefit community Ubuntu translation efforts too.

Implementation Plan

Each of the points mentioned in the Rationale is discussed separately in this section.

Reliable Translation Work Estimates

For the purposes of planning and budgeting, it is necessary to have a method for obtaining reliable estimates of the amount of work to be done to complete any given translation. Typically, translation estimates are given as a number of strings, where strings can have any number of words. Also, typically, professional translation companies manage their quotes and planning based on the number of words.

Currently, getting an accurate estimate for the number of strings is tricky because although it is theoretically possible to generate a list of packages that officially constitute the desktop, the versions of packages that will be used in the release may change over time, as well as individual strings in each package (strings are added, removed, or changed over time). In the pilot project, this was dealt with by adding an error margin of about 20%.

Estimating the number of words, rather than strings, adds an extra complication that not all words should be included in the estimate (such as newline characters and variable names). It was suggested that we should work on an automated way of generating a reasonable estimate for the number of words to be translated, taking into heuristics for ignoring non-translatable words, including:

  • escaped characters (newlines, tabs, etc.)
  • variables (which appear differently in Gnome, OOo, Mozilla)
  • xml code
  • pathnames (words beginning with a slash, i.e. corresponding to the regular expression /\/\S+/)

MartinPitt has agreed to develop some scripts which regularly produce and publish translation statistics on http://people.ubuntu.com/~pitti/translation-statistics/.

Official Source for Ubuntu Translations

During the pilot project, Rosetta was still in development, so there was no official repository for Ubuntu translations. This resulted in mismatched translations as POT files were obtained from upstream sources. In addition, some Ubuntu-specific strings were not translated.

For future translation projects, Rosetta should be considered as the official repository of Ubuntu translation templates. This will ensure that templates:

  • are up-to-date with respect to the Ubuntu sources
  • include Ubuntu-specific strings (by regenerating POT files during builds, see LanguagePackRoadmap)

In addition, Rosetta should include non-gettext packages, such as OpenOffice.org and the Mozilla products. This future functionality is planned for Rosetta (see LanguagePackRoadmap and OpenOfficeLocalisation).

Enforceable Freeze Dates

Successful translation requires that freeze dates are planned and enforced. Learning from the Gnome Translation Project suggests that the three milestones of Upstream version freeze, String notification period and String freeze are critical to a translation effort. The Ubuntu release schedule currently caters for an upstream version freeze date. However, for the Hoary release, this freeze was not properly enforced.

In ReleaseCycle it was agreed to:

  • Introduce an Ubuntu string freeze five weeks before the final release.
  • Introduce a string freeze on the English help documents three weeks before the final release.
  • Introduce an Ubuntu string change announcement period three weeks before the string freeze.
  • Perform automatic string change detection for string freeze violation checking and automated string change notification.

Additional Translation Tools

The following additional tools were suggested:

  1. An automated system that compares the two most recent sets of translatable strings of packages and notifies the uploader about any changes. It is then up to the uploader to decide whether to revert the change or notify the translation and documentation teams (assigned to MartinPitt).

  2. Profiling of applications to determine the translation priority of strings, i.e.
    • String that constitute the UI
    • Strings that are rarely seen (a result of rare error cases, etc.)
    • Anything in-between
    • Due to the time involved in planning and development, further discussions regarding this tool will be postponed until another Canonical translation project is undertaken.
  3. An Ubuntu glossary. The pilot project made use of the Gnome glossary, which was found to be fairly useful and reasonably sized. A glossary for Ubuntu is a planned part of Rosetta development. This will help to consistently translate technical terms by nontechnical translators.
  4. Adding more validation checks to strings in Rosetta (such as tests available in the translate toolkit):
    • double-spacing
    • inconsistent punctuation (periods, commas, colons, brackets, quotes, etc.)
    • inconsistent spacing (such as newlines and tabs)
    • acronyms
    • accelerators

Rosetta currently supports hard error checking (TranslationValidation spec) -- this includes certain syntax and variable checks. The Rosetta team has agreed to look at supporting additional checks as warnings.

Offline Translating

Offline translation is especially important to consider in countries that are bandwidth challenged and situations where translators are not always connected to the Internet. The pilot project showed that there are sufficient tools to support translation offline (for both Linux and Windows platforms). The main issue is regarding the integration of the translations with Rosetta.

  • PO files should be exportable and importable (currently supported)
  • POT files should be exportable (not currently supported)

  • It was suggested that a soft-locking mechanism (similar to wiki-functionality) be added to Rosetta to allow translators to check out a PO file for a number of hours or days at a time.

The details of this implementation will be discussed by the Rosetta team (see RosettaOneDotZero).

Outstanding Issues

Ideas for the future

  • Develop a custom application for offline translating, which takes care of integration of translations with Rosetta.
  • Discuss how to avoid redundancy introduced by identical strings which appear in different applications.

UbuntuDownUnder/BOFs/TranslationProcess (last edited 2008-08-06 16:21:17 by localhost)