BetterCJKSupportSpecification

Summary

This project aims at improving CJK support in Ubuntu.

Rationale

As of Breezy, since it has no default input method, normal CJK users can't write their native language on Ubuntu desktop environment. Additionally, the default configuration for various applications and the whole desktop is not so suitable for Asian users and users from certain countries. For example, default desktop font size is simply too small for CJK users, especially Chinese. To improve the experience for these users, some packages need to be patched, while others may need additional configuration.

Use cases

  • Chulsu installed Ubuntu onto his laptop and opened Firefox to see his favorite Korean web forum. Then, he found that first, "why this page looks diffrent than Firefox on Windows of my desktop", second, "how to input Korean to write my reply to the forum" and so on. He started to search Ubuntu Korean wiki and KLDP, and asked his questions. Spending several days, he just knew about how to install font packages, how to configure .fonts.conf under his home directory, how to install and use his Korean Input Method and so on. Now, he is thinking "why Linux is so difficult than Windows, but if all of these installed and configured when I installed Ubuntu that's the way to go."
  • Yeonhee loves to listen her favorite CDs when she is working on OpenOffice for her writing. For her one month trip to Jeju island, she wanted to convert them into MP3, but couldn't find a convert tool from Ubuntu installed on her new laptop. Anyway, she converted her favorite songs with MP3 music tag from her Windows, then opened Rhythmbox on Ubuntu laptop to listen them in case of testing. Now, she is looking at the song names aren't correctly shown up with Korean, "how I gonna go to my trip?"

  • Miyoung wanted to try Linux for her class, but she never used Linux before. Her classmate gave her Unbuntu CD so she was happy. But, on the way back to home, she felt some difficulties for installing the CD to her desktop which already had Windows installed, decided that "OK, I am going to search Ubuntu site for installing, what a great if I can find Korean guides are there, I should learn English..., does this CD support Korean?...".

Scope

  • Suggest that split this spec into several parts to concentrate and achieve one by one from most important one.
    • For SCIM as default Input Method for CJK users, suggest that use InputMethods/SCIM to gather development ideas about improvements and specifications

      • We need to workaround the SCIM ABI bug before we go ahead with this choice.

  • Default Input Method for CJK users, FontConfig and Enabling Embolden with patches are the topics we should focus first, in my opinion.

  • Keep this spec as main to see overall scope, progress and implementations, add sub-specifications here.
  • Please add your ideas to specify scope of Dapper implementation. If it is too late, we can still use this specification for Dapper+1.

For localization sprint

Design

  1. Install a default input method such as scim, and start it automatically when user start X. Besides, users should be allowed to have their own individual setting.
    • Useful links here for Korean Input Methods
    • Scim shall be the default input method, there are already many IM engines based on scim, and so far its language support is the best. Even Fedora and Mandriva are using it by default.
  2. Better environment variables tuning for CJK users in language-selector, installer etc.
  3. Tune fontconfig setting to achieve better CJK fonts display (e.g. more solid font outline, bold type, use bitmap for medium font size etc). Surely,to obtain this, we need some font package's support like ttf-arphic-uming/ukai.
    • Use ttf-arphic-uming/ukai by default, since these are the only package that contain Hong Kong characters for all sizes.
    • Install xfonts-wqy for simplified Chinese installation; ttf-newsung is not needed since it has already been included into uming/ukai.
    • Regarding this fontconfig topic, Korean Linux users are discussing about default font for Distro instead of ttf-baekmuk, currently most favorite font is ttf-unfonts then ttf-alee. KoreanTeam will provide up-to-date BeautifyKoreanFonts once decision will be made for a font package.

    • In Japanese case, there is no completely free and high quality Japanese font. Ubuntu uses Kochi font which is DFSG free, but it is inferior of quality to commercial Japanese fonts. This font issue is barrier to expand use of completely free linux distribution in Japan.
      1. Not exactly true. For Kochi Gothic & Mincho, you can change the fontconfig setting to display embedded bitmap at font size 12-17 & 20-21 (you can install the Fontforge package to verify this). Once the setting is adjusted, they will look much better at those sizes. Also, OpenOffice.org2 internally recognizes embedded bitmap and will look nice on these two font sets.

        • I think the majority of Japanese users prefer outlines, not bitmaps. In Fact, Fedora and Mandriva don't use embedded bitmaps for Japanese. These developpers choose outlines for Japanese. And Kochi's outline has worse quality than IPA and IPA Mona's.
          • Are you sure the decision not to use outline is an informed one among all three distro developers you mentioned? Because in general, very few people know embeddedbitmap fonts exist at all (even on Windows, because by default Windows uses embeddedbitmap). If as you said many Japanese users prefer outline font, sure go ahead. CJK Users can always switch it on again within /etc/fonts/fonts.conf. I do know the situation is quite opposite for Chinese users.
          • In Japan there is no clear tradition to select outlines or bitmaps unlike China and Korea.

            I have distribute Japanese customized Ubuntu CD image for 9 months, and I had some requests to use outlines on legacy(GTK1 etc) applications and OpenOffice.org, but I had no requests to use bitmaps on GTK2 applications. So I think many Japanese users prefer beautiful outlines in the tradition of Mac OS X. BTW, I had misconception the default setting on SuSE. The default setting for CJK on SuSE is bitmaps now. I'm removing that from above.

        • On OpenOffice.org Kochi Gothick & Mincho look fine with bitmaps as above. But if you print out with these fonts, turn into a nightmare. You'll see poor Japanese charactors on your printed page.

      2. There are "IPA Font" and "IPA Mona Font", they are high quality and free-for-use Japanese Font. Unfortunately this font is not completely free. The license but demands redistributing with one of softwares specified by the supplier. At present, Japanese Team supplies IPA Font and IPA Mona Font debian packages for Breezy. Check out this blog for more detail. Note: The article of this blog is no more than his guess. IPA Font and IPA Mona font packages distributed by the Japanese Team include one of software specified by the supplier based on the IPA Font license. Additionally, the packages listed in this blog are installer-helper package or installed from the backport repository by their GUI setting helper (like Easy Ubuntu).

        • Interesting find: Using Fontforge to open and explore the ttf files from IPA and IPA Mona font collection, it seems that these font sets carry embedded bitmap at various sizes: For "IPA Font" collection: All the fonts carry embedded bitmap at size 12, 14, 16. For "IPA Mona Font" collection:

          • IPAMonaUIGothic (ipagui-mona.ttf) carries embedded bitmap at size 12-16
          • For the rest of IPAMona collection, they all carry embedded bitmap at size 10-16
      3. As a result of these findings, it is strongly recommended to turn on embedded bitmap in the fontconfig setting by default. It is now known that Uming, Kochi, IPA and IPAMona will all benefit from this, possibly including some other free CJK fonts. If in suspicion, inspection by Fontforge is encouraged. (the program ftdump from freetype2-demos package can also be used to inspect in which size is bitmap available)

        • I'm the author of IPA Mona Fonts. IPA Mona Fonts include bitmap fonts as above. But I think that it's good to use autolines for Japanese by default, because normal users prefer outlines. Small number of Japanese users love bitmaps, so I added bitmaps in some sizes for them. This is not for all Japanese Users.
  4. Enable embolden font by default for CJK users
  5. Keyboard support for CJK native languages
    • Korean ( under survey : http://bbs.kldp.org/viewtopic.php?t=68034 )

      • This is not Ubuntu specific issue, but report here too.
      • Need to do two steps (setkeycodes and xmodmap) to get Hangul and Hangul_Hanja keys working.
        • Hangul

          Hangul_Hanja

          Keyboard Type

          Exception

          setkeycodes

          71 122

          72 123

          ps/2 type 106 keys

          ps/2 type keyboard requires setkeycodes, seems this is a bug with 2.6 kernel.

          xmodmap

          210

          209

          ps/2 type 106 keys

          xmodmap

          209

          210

          usb 106 keys

          Same MS Natural keyboard Pro(106 keys) tested for ps/2 and usb. In case of usb connection, setkeycodes not required, but xmodmap is in reverse.

          xmodmap

          113

          109

          Laptop keyboard

    • handrake's patch, 30848, a mail posted to ubuntu-devel

  6. CJK users should be able to display their mp3 file ID3 tag correctly. Historically these tagging issue is a mess, everybody is using her own legacy encoding for mp3 tag because there is no support for non-western languages until very recent ID3 tag specification.
    • For applications which make use of GStreamer, setting GST_ID3_TAG_ENCODING can be an internim solution. There are more discussions on UTFEightCurrentProblems.

  7. (?) Allow users to read/write CJK under console.
    • Or when this is impossible, change $LANGUAGE to C automatically so that users won't see lots of junk on console.
  8. Better support of CJK fonts in OpenOffice.org.

  9. Configure firefox for print CJK correctly
    • This can probably be done in per-language basis, in firefox language packs.
    • Firefox supports ps output which should be converted to pdf by ps2pdf
      • Status: AbelCheung says that Firefox 1.5 seems to be OK

  10. Not only from Firefox, printing support for CJK users also affects these :
    • Should able to convert installation guide to PS/PDF for CJK languages
    • printing man pages from X terminals
    • printing from xpdf-korean, groff, a2ps and gnome-u2ps : Check hard-coded font names

Implementation

Code

Data preservation and migration

Packages affected

input methods:

  • As of 2005-12-29, nabi (0.15-2) supports im-switch, no extra setup required for Korean Input if you install nabi and im-switch. If you install nabi under en_US locales, you can set nabi with "im-switch -s nabi" which creates ~/.xinput.d/en_US . For other locales, it should work same.

font packages:

  • As of 2005-12-08, ttf-arphic-uming/ukai packages are moved to main, and original Arphic fonts are obsoleted.

freetype:

fontconfig:

  • add configure file suitable for CJK fonts under /etc/fonts/conf.d/.
  • language-selector implemented instead of /etc/fonts/conf.d to sort out font configurations for CJK as mentioned in BetterCJKSupportSpecification/FontConfig.

firefox:

openoffice.org (done):

im-switch:

  • Improve automatic configuration; currently ONLY those who know this package exists and ONLY those who understand the ins and outs can configure input method settings.
  • scim and skim support im-switch with Dapper.

language-selector:

  • It should set appropriate environment variables like $LANGUAGE and $LANG according to real life usage, and not just dummy settings. For example, Hong Kong people are using Taiwan translation mostly, but they may have their own; thus the correct setting is LANGUAGE=zh_HK:zh_TW.

  • Add a variable, say $CONSOLE_NOT_LOCALIZED, and define it for each language. In particular, set it to "yes" for all CJK languages, so that during bash startup it could redefine $LANGUAGE to C under console. (and console ONLY!)

rhythmbox:

totem-xine:

mplayer:

scim :

skim :

  • Done :
    • Enabled Trigger keys with CJK native keys 37687

language-support-?? :

  • is used to cover specific requests per language such as different scim module dependency

gs-common(?) : /usr/share/defoma/scripts/gs.defoma's 106 ~ 109 lines

  • if ($c eq 'truetype-cjk') {
    • # FIXME: need to support the sub font id for the collection.

      • print FFFF '/', $Id->{0}->[$i], ' << /FileType /TrueType /Path (', $f, ') /SubfontID ', '0', ' /CSI [(', $h[6], ') ', $hh{$h[6]}, "] >> ;\n";

    } Exprienced that 108 line was complained with "use of uninitialized value ... /var/lib/defoma/scripts/gs.defoma"
  • 6614

Outstanding issues

BoF agenda and discussion

BetterCJKSupportSpecification (last edited 2008-08-06 16:22:10 by localhost)