Created: Date(2005-04-26T00:56:15Z) by JaneW
- Contributors: JaneW
- Malone Bug:
Translation of open source software has traditionally been a collection of community efforts. One of Canonical's focuses is to offer a complete translation of the Ubuntu desktop as a service. A pilot project, translating Ubuntu into the South African language Xhosa, was carried out from January - March 2005. The learning from this pilot project as well as requirements and recommendations for future translation projects are discussed in this document.
In order to offer professional translation services, processes need to be agreed upon and enforced to ensure that translations can be completed on time for the release for which it is intended. The open source landscape is extremely dynamic, which results in constant fluctuation of the translation work to be done for a particular release. The purpose of this exercise is to determine an implementable process that can provide:
- reliable translation work estimates (for planning and budgeting purposes)
- a reliable and uptodate source of Ubuntu translation templates
- enforceable freeze dates for a more stable and predictable translation environment
- additional tools to aid translation (e.g. profiling and validation)
- smooth integration with Rosetta for offline translating
Scope and Use Cases
The outcomes of this process will be useful for translation projects of future Ubuntu releases, both commercial and community-based.
Each of the points mentioned in the Rationale is discussed separately in this section.
Reliable Translation Work Estimates
For the purposes of planning and budgeting, it is necessary to have a method for obtaining reliable estimates of the amount of work to be done. Typically, translation estimates are given as a number of strings, where strings can have any number of words. Also, typically, professional translation companies manage their quotes and planning based on the number of words.
Currently, getting an accurate estimate for the number of strings is tricky because, although it is theoretically possible to generate a list of packages that officially constitute the desktop, the versions of packages that will be used in the release may change over time, as well as individual strings in each package (strings are added, removed, or changed over time). In the pilot project, this was dealt with by adding an error margin of about 20%.
Estimating the number of words, rather than strings, adds an extra complication that not all words should be included in the estimate (such as newline characters and variable names). It was suggested that we should work on an automated way of generating a reasonable estimate for the number of words to be translated, taking into heuristics for ignoring non-translatable words, including:
- escaped characters (newlines, tabs, etc.)
- variables (appear differently in Gnome, OOo, Mozilla)
- xml code
- pathnames (everything that starts with a slash until the next whitespace: /\/\S+/
MartinPitt will develop some scripts which regularly produce and publish translation statistics on http://people.ubuntu.com/~pitti/translation-statistics/.
Official Source for Ubuntu Translations
During the pilot project, Rosetta was still in development, so there was no official repository for Ubuntu translations. This resulted in mismatched translations as POT files were obtained from upstream sources. In addition, some Ubuntu-specific strings were not translated.
For future translation projects, Rosetta should be considered as the official repository of Ubuntu translation templates. This will ensure that templates:
- are uptodate with respect to the Ubuntu sources
include Ubuntu-specific strings (by regenerating POT files during builds, see LanguagePackRoadmap)
Enforceable Freeze Dates
Successful translation requires that freeze dates are planned and enforced. Learning from the Gnome Translation Project suggests that the three milestones of Upsteam version freeze, String notification period and String freeze are critical to a translation effort. The Ubuntu release schedule currently caters for an upstream version freeze date. However, for the Hoary release, this freeze was not properly enforced.
In ReleaseCycle it was agreed to:
- Introduce an Ubuntu string freeze five weeks before the final release.
- Introduce a string freeze of the English help documents three weeks before the final release.
- Introduce an Ubuntu string change annoucement period coinciding with the feature freeze.
- Perform automatic string change detection for string freeze violation checking and automated string change notification.
Additional Translation Tools
The following additional tools were suggested:
Automatic system that compares the two most recent set of translatable strings of packages and notifies the uploader about any changes. The upload should then decide whether to revert the change or notify the translation and documentation team. (assigned to MartinPitt).
- Profiling of applications to determine the translation priority of strings, i.e.
- String that constitute the UI
- Strings that are rarely seen (a result of rare error cases, etc.)
- Anything inbetween
- An Ubuntu glossary. The pilot project made use of the Gnome glossary, which was found to be fairly useful and reasonably sized. A glossary for Ubuntu is a planned part of Rosetta development and this should naturally improve Ubuntu translations.
- Adding more validation checks to strings in Rosetta (such as tests available in the translate toolkit):
- inconsistent punctuation (periods, commas, colons, brackets, quotes, etc.)
Rosetta currently supports hard error checking (TranslationValidation spec) -- this includes certain syntax and variable checks. The suggestion is that the additional checks could appear as warnings.
Offline translation is especially important to consider in countries that are bandwidth challenged and situations where translators are not always connected to the Internet. The pilot project showed that there are sufficient tools to support translation offline (for both Linux and Windows platforms). The main issue is regarding the integration of the translations with Rosetta.
- PO files should be exportable and importable (currently supported)
POT files should be exportable (not currently supported)
It was suggested that a soft-locking mechnism (similar to wiki-functionality) be added to Rosetta to allow translators to check out a PO file for a number of hours or days at a time
Another suggestion raised was to develop a custom Rosetta PyGtk application for offline translating
TBC: Decision on way forward
UDU BOF Agenda
- Prerelease dependencies and timelines
- Process and freeze dates for:
- Ubuntu packages
- Integration and synchronization issues:
- Official sources
- Lag between upstream and Ubuntu
- Ubuntu-specific changes
- Ubuntu-specific strings
- Translating offline (and integration with Rosetta)
BoF notes (to be sorted and spec'ed out)
- Better estimation of work
- String freeze violation checking/notification
- Better string validation/warnings
- Prioritizing strings within a package (profiling)
- Release management
Notes to be sorted out:
- Statistics needed:
- Number of strings in the desktop release
- Number of words
- Number already translated, number fuzzy
Translation with nonprofessionals: lack of technical terms, lack of vocab variety -> provide glossary
South Africa: no bandwith -> Rosetta is not appropriate there
- Offline apps: Emacs PO, kbabel
Custom Rosetta PyGtk application?
- String freezes
Same time as PreviewFreeze
- Nice for future: official Ubuntu template with proper enforced freeze days rather than translating upstream CVS
- Breaking freezes 'really' hurts and breaks translations
- Enforce string freeze at the time of feature freeze
- Automatically check string freeze violations after that time
- Script integrated with buildd
- Catch accidental violations
Ubntu specific strings will appear in Rosetta (-> regenerate POT files in builds, see LanguagePackRoadmap)
- Need to talk with MDZ and Colin W about release management
- Need to coordinate with the doc team
- The doc team is dependent on feature freeze, screenshots not changing
- Separate the set of translatable strings into important and unimportant strings
- Proposal: wrapper around gettext() which counts string usage, separate UI and unimportant corner case strings
- Count how many different PO templates a string appears in?
- Error checking
- Hard errors
- Syntax (missing quote marks)
- Interpolations (%s)
- Soft errors
- Double spaces
Rosetta now supports hard error checking (TranslationValidation spec) -- we should look at supporting warnings also.
- Need to customise error checking on a per-language basis
- Hard errors