BetterCJKSupportSpecification
Created: Date(2005-10-18T07:30:00Z) by ["Freeflying"]
Priority: NeedsPriority
People: NeedsLead, NeedsSecond
Contributors: ["Freeflying"], AbelCheung4, ["Atie"], JunKobayashi, ["Zero0w"]
- Interested:
Status: UbzSpecification, BrainDump (then DraftSpecification then EditedSpecification then ApprovedSpecification), DistroSpecification
- Branch:
- Malone bug:
- Packages affected:
- Depends:
- Dependents:
- BoF sessions: none yet
Summary
This project aims at improving CJK support in Ubuntu.
Rationale
As of Breezy, since it has no default input method, normal CJK users can't write their native language on Ubuntu desktop environment. Additionally, the default configuration for various applications and the whole desktop is not so suitable for Asian users and users from certain countries. For example, default desktop font size is simply too small for CJK users, especially Chinese. To improve the experience for these users, some packages need to be patched, while others may need additional configuration.
Use cases
- Chulsu installed Ubuntu onto his laptop and opened Firefox to see his favorite Korean web forum. Then, he found that first, "why this page looks diffrent than Firefox on Windows of my desktop", second, "how to input Korean to write my reply to the forum" and so on. He started to search Ubuntu Korean wiki and KLDP, and asked his questions. Spending several days, he just knew about how to install font packages, how to configure .fonts.conf under his home directory, how to install and use his Korean Input Method and so on. Now, he is thinking "why Linux is so difficult than Windows, but if all of these installed and configured when I installed Ubuntu that's the way to go."
Yeonhee loves to listen her favorite CDs when she is working on OpenOffice for her writing. For her one month trip to Jeju island, she wanted to convert them into MP3, but couldn't find a convert tool from Ubuntu installed on her new laptop. Anyway, she converted her favorite songs with MP3 music tag from her Windows, then opened Rhythmbox on Ubuntu laptop to listen them in case of testing. Now, she is looking at the song names aren't correctly shown up with Korean, "how I gonna go to my trip?"
- Miyoung wanted to try Linux for her class, but she never used Linux before. Her classmate gave her Unbuntu CD so she was happy. But, on the way back to home, she felt some difficulties for installing the CD to her desktop which already had Windows installed, decided that "OK, I am going to search Ubuntu site for installing, what a great if I can find Korean guides are there, I should learn English..., does this CD support Korean?...".
Scope
- Suggest that split this spec into several parts to concentrate and achieve one by one from most important one.
- For SCIM as default Input Method for CJK users, suggest that use ["InputMethods/SCIM"] to gather development ideas about improvements and specifications
- We need to workaround the ["SCIM"] ABI bug before we go ahead with this choice.
- For SCIM as default Input Method for CJK users, suggest that use ["InputMethods/SCIM"] to gather development ideas about improvements and specifications
Default Input Method for CJK users, FontConfig and Enabling Embolden with patches are the topics we should focus first, in my opinion.
- Keep this spec as main to see overall scope, progress and implementations, add sub-specifications here.
- ["BetterCJKSupportSpecification/FontConfig"] : Specify fonts.conf for CJK users to reflect into fontconfig
- Please add your ideas to specify scope of Dapper implementation. If it is too late, we can still use this specification for Dapper+1.
Design
- Install a default input method such as scim, and start it automatically when user start X. Besides, users should be allowed to have their own individual setting.
- Useful links here for Korean Input Methods
- Scim shall be the default input method, there are already many IM engines based on scim, and so far its language support is the best. Even Fedora and Mandriva are using it by default.
- Better environment variables tuning for CJK users in language-selector, installer etc.
[http://bugzilla.ubuntu.com/show_bug.cgi?id=20442 Bug report on language-selector]
- Tune fontconfig setting to achieve better CJK fonts display (e.g. more solid font outline, bold type, use bitmap for medium font size etc). Surely,to obtain this, we need some font package's support like ttf-arphic-uming/ukai.
- Use ttf-arphic-uming/ukai by default, since these are the only package that contain Hong Kong characters for all sizes.
- Install xfonts-wqy for simplified Chinese installation; ttf-newsung is not needed since it has already been included into uming/ukai.
Regarding this fontconfig topic, Korean Linux users are discussing about default font for Distro instead of ttf-baekmuk, currently most favorite font is ttf-unfonts then ttf-alee. ["KoreanTeam"] will provide up-to-date ["BeautifyKoreanFonts"] once decision will be made for a font package.
- In Japanese case, there is no completely free and high quality Japanese font. Ubuntu uses Kochi font which is DFSG free, but it is inferior of quality to commercial Japanese fonts. This font issue is barrier to expand use of completely free linux distribution in Japan.
Not exactly true. For Kochi Gothic & Mincho, you can change the fontconfig setting to display embedded bitmap at font size 12-17 & 20-21 (you can install the Fontforge package to verify this). Once the setting is adjusted, they will look much better at those sizes. Also, OpenOffice.org2 internally recognizes embedded bitmap and will look nice on these two font sets.
There are "IPA Font" and "IPA Mona Font", they are high quality and free-for-use Japanese Font. Unfortunately this font is not completely free. The license but demands redistributing with one of softwares specified by the supplier. At present, Japanese Team supplies IPA Font and IPA Mona Font debian packages for Breezy. Check out this [http://ukai.org/b/log/debian/legal blog] for more detail. Note: The article of this blog is no more than his guess. IPA Font and IPA Mona font packages distributed by the Japanese Team include one of software specified by the supplier based on the IPA Font license. Additionally, the packages listed in this blog are installer-helper package or installed from the backport repository by their GUI setting helper (like Easy Ubuntu).
Interesting find: Using Fontforge to open and explore the ttf files from IPA and IPA Mona font collection, it seems that these font sets carry embedded bitmap at various sizes: For "IPA Font" collection: All the fonts carry embedded bitmap at size 12, 14, 16. For "IPA Mona Font" collection:
- IPAMonaUIGothic (ipagui-mona.ttf) carries embedded bitmap at size 12-16
- For the rest of IPAMona collection, they all carry embedded bitmap at size 10-16
As a result of these findings, it is strongly recommended to turn on embedded bitmap in the fontconfig setting by default. It is now known that Uming, Kochi, IPA and IPAMona will all benefit from this, possibly including some other free CJK fonts. If in suspicion, inspection by Fontforge is encouraged.
- CJK users should be able to display their mp3 file ID3 tag correctly. Historically these tagging issue is a mess, everybody is using her own legacy encoding for mp3 tag because there is no support for non-western languages until very recent ID3 tag specification.
- For applications which make use of GStreamer, setting GST_ID3_TAG_ENCODING can be an internim solution. There are more discussions on ["UTFEightCurrentProblems"].
- (?) Allow users to read/write CJK under console.
- Or when this is impossible, change $LANGUAGE to C automatically so that users won't see lots of junk on console.
Better support of CJK fonts in OpenOffice.org.
- Firefly has a large patch which makes OOo2 much better for CJK.
- Virtual bold and italic style patch on OOo2 in progress
http://www.openoffice.org/issues/show_bug.cgi?id=18285
Status: The bug is officially fixed and verified, with the CWS module fakebold merged in the Sun's [http://eis.services.openoffice.org/EIS2/cws.rss.CWSAnnounceNewsFeed/mws?guid=2443 SRC680_m146] internal development build, to be released in OpenOffice.org 2.0.2 as a maintenance update. Ubuntu has merged the patch since OOo2.0.0m143-0ubuntu2 (see more below).
- Configure firefox for print CJK correctly
- This can probably be done in per-language basis, in firefox language packs.
- Firefox supports ps output which should be converted to pdf by ps2pdf
- Not only from Firefox, printing support for CJK users also affects these :
- printing man pages from X terminals
- printing from xpdf-korean, groff, a2ps and gnome-u2ps : Check hard-coded font names
- Should able to convert installation guide to PS/PDF for CJK languages
- Enable embolden font by default for CJK users
Debian unstable has freetype 2.1.10 - Dapper has this now (2005/11/11), please take care of next two items below.
- Build xft2, fontconfig, pango and cairo2 with embolden enabled
- A good news with Dapper's libxft2 2.1.8, embolden enabled without rebuilding them
- update fontconfig with 2.3.92+cvs20051129 : Enable embeddedbitmap as default
- See more about this in ["BetterCJKSupportSpecification/FontConfig"]
cairo-ft patch : http://lists.freedesktop.org/archives/cairo/2005-September/005404.html
- This bug found on Debian's freetype 2.1.10 package, of course same with Dapper's :
words in sentence are individually displayed right-upward like as several slopes. This often happens in bigger size like as web page heading than smaller, and in Konqueror and Opera than Firefox. But, a Gentoo user who had compiled xorg-x11-7.0 rc1 showed much nicer screen. Please see the screenshots in this link. http://bbs.kldp.org/viewtopic.php?t=65304&highlight= Also you might catch rendering quality by Akito's patch (top) and "embolden" (bottom) from this screenshot. Top one is much better. http://bbs.kldp.org/download.php?id=5319
Speaking of Akito's patch, it has been known that he has contributed [http://lists.gnu.org/archive/html/freetype-devel/2003-04/msg00071.html certain fantastic improvement to freetype's autohinter] back in 2003; this is [http://www.kde.gr.jp/~akito/xft/patch_xft.html Akito's Japanese homepage].
freetype 2.1.10 has upgraded the autohinter to autofit, which is supposedly to take care of hinting by parsing the language region information inside a ttf file. This is however, unknown whether Akito's patch has been merged into the autofit engine properly (if someone who speaks Japanese can contact Akito and get him to check on it, it will be great).
Here is another screenshot which can show the embolden rendering problem on Konqueror(3.5.0-0ubuntu1 + fontconfig2.3.2-1.1ubuntu1). http://bbs.kldp.org/viewtopic.php?p=336668#336668
Received this link, http://lists.freebsd.org/pipermail/freebsd-gnome/2005-July/011838.html In the patch, M_Y is redefined which may solve this problem. ([https://launchpad.net/distros/ubuntu/+source/freetype/+bug/5560 Launchpad #5560])
Had tested Fedora's patches (https://www.redhat.com/archives/fedora-cvs-commits/2005-October/msg00281.html) with dapper's freetype (2.1.10-1), please take a look at the [https://launchpad.net/distros/ubuntu/+source/freetype/+bug/5560 #5560].
Dual width problem : https://bugs.freedesktop.org/show_bug.cgi?id=5345
- Keyboard support for CJK native languages
Korean ( under survey : http://bbs.kldp.org/viewtopic.php?t=68034 )
- This is not Ubuntu specific issue, but report here too.
- Need to do two steps (setkeycodes and xmodmap) to get Hangul and Hangul_Hanja keys working.
Hangul
Hangul_Hanja
Keyboard Type
Exception
setkeycodes
71 122
72 123
ps/2 type 106 keys
ps/2 type keyboard requires setkeycodes, seems this is a bug with 2.6 kernel.
xmodmap
210
209
ps/2 type 106 keys
xmodmap
209
210
usb 106 keys
Same MS Natural keyboard Pro(106 keys) tested for ps/2 and usb. In case of usb connection, setkeycodes not required, but xmodmap is in reverse.
xmodmap
113
109
Laptop keyboard
[https://wiki.ubuntu.com/handrake handrake's patch], [https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/30848 30848], [https://lists.ubuntu.com/archives/ubuntu-devel/2006-March/016356.html a mail posted to ubuntu-devel]
Implementation
Code
Data preservation and migration
Packages affected
input methods:
- As of 2005-12-29, nabi (0.15-2) supports im-switch, no extra setup required for Korean Input if you install nabi and im-switch. If you install nabi under en_US locales, you can set nabi with "im-switch -s nabi" which creates ~/.xinput.d/en_US . For other locales, it should work same.
font packages:
- As of 2005-12-08, ttf-arphic-uming/ukai packages are moved to main, and original Arphic fonts are obsoleted.
freetype:
As of 2006-01-16, the 6 patches for freetype 2.1.10 are in REVU, http://revu.tauware.de/details.py?upid=1512
- As of 2006-01-19, freetype 2.1.10-1ubuntu1 contains the 6 patches above.
fontconfig:
- add configure file suitable for CJK fonts under /etc/fonts/conf.d/.
firefox:
For printing issues, it is fixed for Firefox 1.5. (c.f. https://bugzilla.mozilla.org/show_bug.cgi?id=190031)
openoffice.org (done):
- OOo2.0.1pre (src680-m137) included Firefly's patches - No extra patches required
By OOo2.0.0m143-0ubuntu2 : This is done as you see here http://bbs.kldp.org/viewtopic.php?p=335943#335943
im-switch:
- Improve automatic configuration; currently ONLY those who know this package exists and ONLY those who understand the ins and outs can configure input method settings.
language-selector:
It should set appropriate environment variables like $LANGUAGE and $LANG according to real life usage, and not just dummy settings. For example, Hong Kong people are using Taiwan translation mostly, but they may have their own; thus the correct setting is LANGUAGE=zh_HK:zh_TW.
- Add a variable, say $CONSOLE_NOT_LOCALIZED, and define it for each language. In particular, set it to "yes" for all CJK languages, so that during bash startup it could redefine $LANGUAGE to C under console. (and console ONLY!)
rhythmbox:
totem-xine:
mplayer:
scim :
- In progress :
- Done :
- scim-hangul : Asked including newer scim-hangul 0.2.1 for less bugs. (Launchpad #5534)
gs-common(?) : /usr/share/defoma/scripts/gs.defoma의 106 ~ 109 lines
- if ($c eq 'truetype-cjk') { } Exprienced that 108 line was complained with "use of uninitialized value ... /var/lib/defoma/scripts/gs.defoma"
Outstanding issues