AddingNewLanguage

Differences between revisions 8 and 13 (spanning 5 versions)
Revision 8 as of 2009-10-29 10:44:06
Size: 3026
Editor: 171
Comment: Added documentation on adding locales to glibc
Revision 13 as of 2011-03-22 10:56:04
Size: 8416
Editor: 226
Comment:
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
Before a new translation team can effectively use Launchpad for translations, there must be a font, keyboard, and locale available to be used in Ubuntu.  There are many languages where some or none of these items are not in place. This page will describe a few things that must be submitted that will help the language support for a language. ||<tablestyle="float:right; font-size: 0.9em; width:40%; background:#F1F1ED; margin: 0 0 1em 1em;" style="padding:0.5em;"><<TableOfContents>>||

Before a new translation team can effectively use Launchpad for translations, there must be a '''locale''', '''keyboard''' and if necessary a '''font''' for the given language. There are many languages where some or none of these items are in place. This page will describe a few things that must be submitted that will help the language support for a language.
Line 12: Line 14:
 * Creating a glibc locale:
  * [[http://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/README;h=f05bf15cebb64284f0872357c8c68c8105e7c73f;hb=HEAD|Documentation]]
  * Example: https://bugzilla.redhat.com/show_bug.cgi?id=482881
The first step for being able to use translations in a Linux system is for the language's locale to be defined in the glibc library. The definition of a locale consists in the creation of a file describing the locale and its submission to the glibc maintainers, so that it can be included in the library.

These are the basic steps to accomplish that:

 1. '''Check for existing locales'''. Check out [[http://sourceware.org/git/?p=glibc.git;a=tree;f=localedata/locales;hb=HEAD|here]] if there is already a locale definition for your language. If there is one, you don't need to do anything else and you can skip all the subsequent steps.
 1. '''Choose the name of the locale'''. The format is `language_REGION@modifier`, where ''language'' is an [[http://www.loc.gov/standards/iso639-2/php/code_list.php|ISO 639-2 two-letter code]] (or the three-letter one if the former is not available), ''REGION'' is an [[http://www.iso.org/iso/english_country_names_and_code_elements|ISO 3166 code]] representing the region where this language is spoken, and ''modifier'' has no set syntax and can be used to specify advanced uses for the locale (e.g. a different script). Most locales will not need a modifier.
  * Examples: `bn_ID` is the locale for Bengali in India, `it_CH` is the locale for Italian in Switzerland.
 1. '''Create a file with that name'''. Once the name has been chosen, you'll have to create a file with that name, which will contain the definition of your locale. Note that if your language is spoken in different regions, you might have to create different files, one for each region.
  * Example: `ca_AD`, `ca_ES`, `ca_FR`, `ca_IT` are locale definitions for the Catalan language as spoken in the regions of Andorra, Spain, France and Italy.
 1. '''Define the locale'''. At this point you'll have to populate the file with the definition of your language. Locale definition files have got a specific syntax. Consult the additional resources below to learn about this syntax and how to define the new locale. Looking through the [[http://sourceware.org/git/?p=glibc.git;a=tree;f=localedata/locales;hb=HEAD|available locale files in the glibc sources]] can be helpful to get an idea of the format. Also remember to reuse: use the `copy` statement to include sections from locales with identical content.
 1. '''Test the locale definition'''. Once your locale file or files are ready, you should test them locally to ensure they are correct. The basic steps are copying the file to `/usr/share/i18n/locales/`, running the folowing command to generate a binary file to be used by applications, and doing the actual test.{{{
localedef -i inputfile -c -f <charset> <locale>
}}}

  * Example (testing the Asturian locale with the `date` command):{{{
 cp ast_ES /usr/share/i18n/locales/ast_ES
 localedef -i ast_ES -c -f ISO-8859-15 ast_ES
 LANG=ast_ES date
}}}
 1. '''Submit the locale to glibc'''. Finally, you'll need to submit the file to the glibc maintainers by [[http://sourceware.org/bugzilla/enter_bug.cgi?product=glibc|filing a bug on their bug tracker]]. If you haven't one, you'll have to [[http://sourceware.org/bugzilla/createaccount.cgi|create a glibc bugzilla account]] before being able to submit the request. The request content might be as simple as: {{{
Summary: Please add the new language_REGION locale

Description:
Dear maintainers,

Please find attached the locale definition for <locale> to be
considered for inclusion in glibc.

Thanks!

}}}
 1. '''[OPTIONAL] Request the addition in Ubuntu'''. Requests for new language additions to glibc have been known to take some time. While your request is being processed, you might want to request the addition in Ubuntu by filing a bug [[https://bugs.launchpad.net/ubuntu/+source/langpack-locales/+filebug|here]].

Additional resources on creating a glibc locale:
 * [[http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.files/doc/aixfiles/Locale_Definition.htm#c5d71c0bob|IBM Locale Definition document]]
 * [[http://sourceware.org/git/?p=glibc.git;a=tree;f=localedata/locales;hb=HEAD|Available locale files in the glibc sources]] (click on the ''blob'' link to see the contents of the file)
 * [[http://web.archive.org/web/20061014011601/www.student.uit.no/~pere/linux/glibc/howto.html|Tips on writing a glibc locale file]]
 * [[http://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/README;h=f05bf15cebb64284f0872357c8c68c8105e7c73f;hb=HEAD|Glibc Documentation on localedef]]
 * [[http://translate.sourceforge.net/wiki/guide/locales/glibc|Translate Toolkit project documentation]]
 * Examples:
  * [[https://bugzilla.redhat.com/show_bug.cgi?id=482881|Submission of the Pashto locale]]
  * [[http://sourceware.org/bugzilla/show_bug.cgi?id=10824|Submission of the Chuvash locale]]
Line 18: Line 59:
To be able to use a new locale in Open``Office.org [[http://wiki.services.openoffice.org/wiki/How_to_submit_new_Locale_Data|additional steps]] are required.
Line 20: Line 63:
== Fontconfig and Orthographic Information ==

Useful information to gather before localization is the orthographic information. This includes the Unicode Code Points that are used to display the characters that make up a written language.

Fontconfig uses these in orth files. These orth files are named after the ISO-639-3 code. For instance the orth file for Secwepemctsin would be shs.orth and the orth file for English would be en.orth.

Before collecting this information it would be good to download the upstream source files and look through the fc-lang directory for a list of the languages which have orthographic information. If your language is not found then you can submit your own information. Many of the North American Indigenous languages are missing orthographic information, as well as many other languages.

If you're language is not included in fontconfig you can collect the orthographic information and submit it to the upstream source. You could then talk with the Ubuntu fontconfig maintainers to see if the next release will contain your information, if not then you can ask them to included a patched version of fontconfig until they bymp the release.

 * Example: http://cgit.freedesktop.org/fontconfig/tree/fc-lang/ps_af.orth and http://cgit.freedesktop.org/fontconfig/tree/fc-lang/ps_pk.orth
Mozilla projects also require [[https://wiki.mozilla.org/L10n:Starting_a_localization|additional steps]] to use a new locale.
Line 34: Line 67:
Once orthographic information is part of Fontconfig one can use fc-list to determine which fonts that can be used to display your language. === Fontconfig and Orthographic Information ===

Useful information to gather before localization is the orthographic information. This includes the '''Unicode Code Points''' that are used to display the characters that make up a written language.

Fontconfig uses these in '''orth files'''. Orth files are named after the [[http://en.wikipedia.org/wiki/ISO_639-3|ISO-639-3]] code. For instance, the orth file for Secwepemctsin would be shs.orth and the orth file for English would be en.orth.

Before collecting this information it would be good to download the upstream source files and look through the `fc-lang` directory for a list of the languages which have orthographic information. If your language is not found then you can submit your own information. Many of the North American Indigenous languages are missing orthographic information, as well as many other languages.

If your language is not included in fontconfig you can collect the orthographic information and submit it to the upstream source. You can then talk with the Ubuntu fontconfig maintainers to see if the next release will contain your information. If not then you can ask them to included a patched version of fontconfig until they bump the release.

 * Examples:
  * [[http://cgit.freedesktop.org/fontconfig/tree/fc-lang/ps_af.orth|Orth file for Pashto in Afghanistan]]
  * [[http://cgit.freedesktop.org/fontconfig/tree/fc-lang/ps_pk.orth|Orth file for Pashto in Pakistan]]

=== Fonts associated to the language ===

Once orthographic information is part of Fontconfig one can use `fc-list` to determine which fonts that can be used to display your language.
Line 38: Line 87:
Some languages need special keyboard layouts to easily input all of their characters.   There are various ways to do this a common and easy way to is create a xkb layout. This is then submitted to xkeyboard-config at freedesktop.org. Some languages need special keyboard layouts to easily input all of their characters. There are various ways to do this. A common and easy way to is create an '''xkb layout'''. This is then submitted to xkeyboard-config at freedesktop.org.
Line 40: Line 89:
Before you start creating a keyboard you should read the [[http://www.freedesktop.org/wiki/Software/XKeyboardConfig/Rules|submission rules]]. It is not to difficult to create a new layout.   The easiest way is to copy a keyboard that is similar to your keyboard with the added characters needed to type in your language. Before you start creating a keyboard you should read the [[http://www.freedesktop.org/wiki/Software/XKeyboardConfig/Rules|submission rules]]. It is not to difficult to create a new layout. The easiest way is to copy a keyboard that is similar to your keyboard with the added characters needed to type in your language.
Line 42: Line 91:
Put example here.  * Reference:
  * [[http://hektor.umcs.lublin.pl/~mikosmul/computing/articles/custom-keyboard-layouts-xkb.html|Creating a custom XKB layout]]
Line 44: Line 94:
== Additional sources == == Additional Information ==
Line 46: Line 96:
 * http://live.gnome.org/TranslationProject/NewLanguage  * [[http://live.gnome.org/TranslationProject/NewLanguage|How to add a new language to GNOME]]

Before a new translation team can effectively use Launchpad for translations, there must be a locale, keyboard and if necessary a font for the given language. There are many languages where some or none of these items are in place. This page will describe a few things that must be submitted that will help the language support for a language.

Locales

POSIX

The first step for being able to use translations in a Linux system is for the language's locale to be defined in the glibc library. The definition of a locale consists in the creation of a file describing the locale and its submission to the glibc maintainers, so that it can be included in the library.

These are the basic steps to accomplish that:

  1. Check for existing locales. Check out here if there is already a locale definition for your language. If there is one, you don't need to do anything else and you can skip all the subsequent steps.

  2. Choose the name of the locale. The format is language_REGION@modifier, where language is an ISO 639-2 two-letter code (or the three-letter one if the former is not available), REGION is an ISO 3166 code representing the region where this language is spoken, and modifier has no set syntax and can be used to specify advanced uses for the locale (e.g. a different script). Most locales will not need a modifier.

    • Examples: bn_ID is the locale for Bengali in India, it_CH is the locale for Italian in Switzerland.

  3. Create a file with that name. Once the name has been chosen, you'll have to create a file with that name, which will contain the definition of your locale. Note that if your language is spoken in different regions, you might have to create different files, one for each region.

    • Example: ca_AD, ca_ES, ca_FR, ca_IT are locale definitions for the Catalan language as spoken in the regions of Andorra, Spain, France and Italy.

  4. Define the locale. At this point you'll have to populate the file with the definition of your language. Locale definition files have got a specific syntax. Consult the additional resources below to learn about this syntax and how to define the new locale. Looking through the available locale files in the glibc sources can be helpful to get an idea of the format. Also remember to reuse: use the copy statement to include sections from locales with identical content.

  5. Test the locale definition. Once your locale file or files are ready, you should test them locally to ensure they are correct. The basic steps are copying the file to /usr/share/i18n/locales/, running the folowing command to generate a binary file to be used by applications, and doing the actual test.

    localedef -i inputfile -c -f <charset> <locale>
    • Example (testing the Asturian locale with the date command):

              cp ast_ES /usr/share/i18n/locales/ast_ES
              localedef -i ast_ES -c -f ISO-8859-15 ast_ES
              LANG=ast_ES date
  6. Submit the locale to glibc. Finally, you'll need to submit the file to the glibc maintainers by filing a bug on their bug tracker. If you haven't one, you'll have to create a glibc bugzilla account before being able to submit the request. The request content might be as simple as:

    Summary: Please add the new language_REGION locale
    
    Description:
    Dear maintainers,
    
    Please find attached the locale definition for <locale> to be
    considered for inclusion in glibc.
    
    Thanks!
  7. [OPTIONAL] Request the addition in Ubuntu. Requests for new language additions to glibc have been known to take some time. While your request is being processed, you might want to request the addition in Ubuntu by filing a bug here.

Additional resources on creating a glibc locale:

OpenOffice.org

To be able to use a new locale in OpenOffice.org additional steps are required.

Mozilla

Mozilla projects also require additional steps to use a new locale.

Fonts

Fontconfig and Orthographic Information

Useful information to gather before localization is the orthographic information. This includes the Unicode Code Points that are used to display the characters that make up a written language.

Fontconfig uses these in orth files. Orth files are named after the ISO-639-3 code. For instance, the orth file for Secwepemctsin would be shs.orth and the orth file for English would be en.orth.

Before collecting this information it would be good to download the upstream source files and look through the fc-lang directory for a list of the languages which have orthographic information. If your language is not found then you can submit your own information. Many of the North American Indigenous languages are missing orthographic information, as well as many other languages.

If your language is not included in fontconfig you can collect the orthographic information and submit it to the upstream source. You can then talk with the Ubuntu fontconfig maintainers to see if the next release will contain your information. If not then you can ask them to included a patched version of fontconfig until they bump the release.

Fonts associated to the language

Once orthographic information is part of Fontconfig one can use fc-list to determine which fonts that can be used to display your language.

Keyboards

Some languages need special keyboard layouts to easily input all of their characters. There are various ways to do this. A common and easy way to is create an xkb layout. This is then submitted to xkeyboard-config at freedesktop.org.

Before you start creating a keyboard you should read the submission rules. It is not to difficult to create a new layout. The easiest way is to copy a keyboard that is similar to your keyboard with the added characters needed to type in your language.

Additional Information


CategoryTranslations

Translations/KnowledgeBase/AddingNewLanguage (last edited 2011-03-22 10:56:04 by 226)