AddingNewLanguage

Before a new translation team can effectively use Launchpad for translations, there must be a locale, keyboard and if necessary a font for the given language. There are many languages where some or none of these items are in place. This page will describe a few things that must be submitted that will help the language support for a language.

Locales

POSIX

The first step for being able to use translations in a Linux system is for the language's locale to be defined in the glibc library. The definition of a locale consists in the creation of a file describing the locale and its submission to the glibc maintainers, so that it can be included in the library.

These are the basic steps to accomplish that:

  1. Check for existing locales. Check out here if there is already a locale definition for your language. If there is one, you don't need to do anything else and you can skip all the subsequent steps.

  2. Choose the name of the locale. The format is language_REGION@modifier, where language is an ISO 639-2 two-letter code (or the three-letter one if the former is not available), REGION is an ISO 3166 code representing the region where this language is spoken, and modifier has no set syntax and can be used to specify advanced uses for the locale (e.g. a different script). Most locales will not need a modifier.

    • Examples: bn_ID is the locale for Bengali in India, it_CH is the locale for Italian in Switzerland.

  3. Create a file with that name. Once the name has been chosen, you'll have to create a file with that name, which will contain the definition of your locale. Note that if your language is spoken in different regions, you might have to create different files, one for each region.

    • Example: ca_AD, ca_ES, ca_FR, ca_IT are locale definitions for the Catalan language as spoken in the regions of Andorra, Spain, France and Italy.

  4. Define the locale. At this point you'll have to populate the file with the definition of your language. Locale definition files have got a specific syntax. Consult the additional resources below to learn about this syntax and how to define the new locale. Looking through the available locale files in the glibc sources can be helpful to get an idea of the format. Also remember to reuse: use the copy statement to include sections from locales with identical content.

  5. Test the locale definition. Once your locale file or files are ready, you should test them locally to ensure they are correct. The basic steps are copying the file to /usr/share/i18n/locales/, running the folowing command to generate a binary file to be used by applications, and doing the actual test.

    localedef -i inputfile -c -f <charset> <locale>
    • Example (testing the Asturian locale with the date command):

              cp ast_ES /usr/share/i18n/locales/ast_ES
              localedef -i ast_ES -c -f ISO-8859-15 ast_ES
              LANG=ast_ES date
  6. Submit the locale to glibc. Finally, you'll need to submit the file to the glibc maintainers by filing a bug on their bug tracker. If you haven't one, you'll have to create a glibc bugzilla account before being able to submit the request. The request content might be as simple as:

    Summary: Please add the new language_REGION locale
    
    Description:
    Dear maintainers,
    
    Please find attached the locale definition for <locale> to be
    considered for inclusion in glibc.
    
    Thanks!
  7. [OPTIONAL] Request the addition in Ubuntu. Requests for new language additions to glibc have been known to take some time. While your request is being processed, you might want to request the addition in Ubuntu by filing a bug here.

Additional resources on creating a glibc locale:

OpenOffice.org

To be able to use a new locale in OpenOffice.org additional steps are required.

Mozilla

Mozilla projects also require additional steps to use a new locale.

Fonts

Fontconfig and Orthographic Information

Useful information to gather before localization is the orthographic information. This includes the Unicode Code Points that are used to display the characters that make up a written language.

Fontconfig uses these in orth files. Orth files are named after the ISO-639-3 code. For instance, the orth file for Secwepemctsin would be shs.orth and the orth file for English would be en.orth.

Before collecting this information it would be good to download the upstream source files and look through the fc-lang directory for a list of the languages which have orthographic information. If your language is not found then you can submit your own information. Many of the North American Indigenous languages are missing orthographic information, as well as many other languages.

If your language is not included in fontconfig you can collect the orthographic information and submit it to the upstream source. You can then talk with the Ubuntu fontconfig maintainers to see if the next release will contain your information. If not then you can ask them to included a patched version of fontconfig until they bump the release.

Fonts associated to the language

Once orthographic information is part of Fontconfig one can use fc-list to determine which fonts that can be used to display your language.

Keyboards

Some languages need special keyboard layouts to easily input all of their characters. There are various ways to do this. A common and easy way to is create an xkb layout. This is then submitted to xkeyboard-config at freedesktop.org.

Before you start creating a keyboard you should read the submission rules. It is not to difficult to create a new layout. The easiest way is to copy a keyboard that is similar to your keyboard with the added characters needed to type in your language.

Additional Information


CategoryTranslations

Translations/KnowledgeBase/AddingNewLanguage (last edited 2011-03-22 10:56:04 by 226)