ConsolidateSpellingLibs

Differences between revisions 10 and 12 (spanning 2 versions)
Revision 10 as of 2006-11-11 01:34:10
Size: 3878
Editor: 207
Comment: Review
Revision 12 as of 2008-06-04 15:11:15
Size: 4341
Editor: pd956a57b
Comment: update with notes from UDS Prague
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
Reduce the number of spelling libraries used in main; modify / extend applications currently not using the "system spellchecker" to use the system spell checker as well. Currently up to four implementations are in use:

 * ispell: still used by some applications, it's getting replaced by aspell (no application in main uses it any more).

 * aspell: ispell replacement, biggest users in main are Gnome and KDE

 * myspell: ispell replacement, currently used by thunderbird, mozilla.

 * hunspell: fork/replacement for myspell, dictionaries beeing compatible with myspell, used by openoffice.

ispell and aspell provide a C interface, myspell and hunspell provide a C++ interface. The enchant library provides a way to abstract from the underlying spell checker (aspell or myspell) and can be used to convert applications to a common interface.

Dictionary support is splitted out in its own spec ConsolidateDictionaries.
Reduce the number of spelling libraries used in main; modify / extend
applications currently not using the One True Spellchecker to do so.
Line 24: Line 13:
Support only one implementation in the long term. More importantly, only support one source of dictionaries for one spelling library. Supporting only one implementation and set of dictionaries eases
long-term maintenance. Also, words which the user teaches to the
spellcheckers will be consistently available throughout the system.
Line 28: Line 19:
Jane wonders why some words she writes in Gaim are marked as wrong, although they aren't marked as wrong in OpenOffice.  * Jane spell-checks a document in Open``Office and adds a few new ones
 to her personal dictionary. Half an hour later she discusses the
 document with a colleague in ICQ. Gaim's spell checking automatically
 knows about the newly added words.
Line 30: Line 24:
Asdf Qwert worries about the fact that no program knows his name. He is very depressed, because there is no easy way to teach it to all programs.

Karl from Germany, who knows that Thunderbird uses ispell, wonders why Thunderbird isn't able to spellcheck his emails using a German dictionary, although he has installed ingerman and iogerman. He also wonders why OpenOffice finds his dictionaries and Thunderbird doesn't. (TB uses an older version of ispell and dictionaries are installed via xpis.)
 * Karl from Germany, who knows that Thunderbird uses ispell, wonders why Thunderbird isn't able to spellcheck his emails using a German dictionary, although he has installed ingerman and iogerman. He also wonders why OpenOffice finds his dictionaries and Thunderbird doesn't. (TB uses an older version of ispell and dictionaries are installed via xpis.)
Line 38: Line 30:
This affects all supported packages in Ubuntu which have
spell-checking capability.
Line 39: Line 34:

=== Status quo ===

Currently four implementations are in use in current Intrepid:

 * ispell: still used by some applications, it's getting replaced by
 aspell. Not used in Ubuntu main any more, and just 4 reverse
 dependencies in Universe (spell, sqwebmail, liblingua-ispell-perl,
 gnumed-client)

 * aspell: ispell replacement. Biggest users in main are Gnome and
 KDE. Other reverse dependency in main: ekg, pan, php5-pspell,
 python-gnome2-extras. About 20 reverse dependencies in universe.

 * myspell: ispell replacement. Not used by anything in Ubuntu any
 more.

 * hunspell: fork/replacement for myspell, dictionaries being
 compatible with myspell. Currently used by openoffice, Thunderbird,
 and Firefox.

 * Finnish currently uses the voikko system, which computes the
 multitude of possible word permissions from a common stem. Altaic
 (Turkish and related languages) and Ugric (Estonian, Finnish,
 Hungarian and relatives) languages with complex prefix and suffix
 systems generally work badly with above systems. So we will continue
 to support Voikko for Finnish. Since the supported languages of
 Voikko and hunspell do not overlap, this does not cause large
 compatibility problems.

ispell and aspell provide a C interface, myspell and hunspell provide
a C++ interface. The enchant library provides a way to abstract from
the underlying spell checker (aspell or myspell) and can be used to
convert applications to a common interface.

Dictionary support is splitted out in its own spec ConsolidateDictionaries.

=== Goal in Intrepid ===

hunspell is the most modern implementation and considered the best
choice in the free software world.

 * Change GNOME, KDE, and other packages in main to use hunspell.

 * Drop ispell/aspell dictionaries from `language-support-*` and from
 main.

 * Demote ispell to universe. Remove myspell from the archive.

 * Drop myspell dictionaries from `language-support-*` where a
 hunspell dictionary is available. Keep the myspell one for other
 languages.
Line 42: Line 89:
The system spellchecker used in both Gnome and KDE is aspell. Two applications are not yet using aspell: === GNOME ===
Line 44: Line 91:
=== Firefox ===  * Fedora has patches to support hunspell through enchant now;
 The http://fedoraproject.org/wiki/Releases/FeatureDictionary spec has
 pointers to bugzilla.
Line 46: Line 95:
 * First step: use hunspell instead of myspell with the goal to demote myspell from main. myspell and hunspell are supposed to be API compatible.  * [https://launchpad.net/~phanatic/+archive Szilveszter Farkas's PPA]
 has test packages with those patches applied.
Line 48: Line 98:
 * Convert to use enchant. === KDE ===
Line 50: Line 100:
=== OpenOffice.org ===

OOo makes use of hunspell-based dictionaries and hyphenation patterns. Hypenation patterns are not (?) supported by aspell, so it may prove difficult to completely replace the use of hunspell by aspell. In a first step, write a plugin to additionally read/write the aspell system and user dictionaries in addition to using the aspell dictionaries.
KDE4 upstream already uses enchant/hunspell. Above Fedora
specification has patches for KDE 3, but since Intrepid will only ship
KDE 4.1, we do not need to do anything in particular for KDE for this
specification.
Line 56: Line 107:
== BoF agenda and discussion ==

Additional requests from the 2005 spec:

* It would be universally cool if we could somehow tie in Rosetta, language packs and this unified spellchecking library. This way translators and users would be using the same de-facto list, and translators could on the fly correct the list if there happened to be a problem. This would definitely make my life as a translator a tad easier.

* It should be easy to switch between languages for spell checking. People have a local language set but often need to use other languages to engage on-line (like I'm doing here). (I'm thinking of programs like Xchat/gaim/firefox that only check the local system language.) Spell checking should be enabled throughout the system for all languages selected in "language support". There should be a standard way to switch between languages in applications, but perhaps it is even possible to detect what language is being used in a particular field. - finalbeta

----
CategorySpec
The demotion of aspell has not been discussed. It would require
porting php and ekg to hunspell, which hasn't happened upstream yet.
Thus we will keep the aspell library itself in main for the time being.

Summary

Reduce the number of spelling libraries used in main; modify / extend applications currently not using the One True Spellchecker to do so.

Rationale

Supporting only one implementation and set of dictionaries eases long-term maintenance. Also, words which the user teaches to the spellcheckers will be consistently available throughout the system.

Use cases

  • Jane spell-checks a document in OpenOffice and adds a few new ones to her personal dictionary. Half an hour later she discusses the document with a colleague in ICQ. Gaim's spell checking automatically knows about the newly added words.

  • Karl from Germany, who knows that Thunderbird uses ispell, wonders why Thunderbird isn't able to spellcheck his emails using a German dictionary, although he has installed ingerman and iogerman. He also wonders why OpenOffice finds his dictionaries and Thunderbird doesn't. (TB uses an older version of ispell and dictionaries are installed via xpis.)

 tfheen: please provide use cases which show how things are supposed to work, not how they are currently broken 

Scope

This affects all supported packages in Ubuntu which have spell-checking capability.

Design

Status quo

Currently four implementations are in use in current Intrepid:

  • ispell: still used by some applications, it's getting replaced by aspell. Not used in Ubuntu main any more, and just 4 reverse dependencies in Universe (spell, sqwebmail, liblingua-ispell-perl, gnumed-client)
  • aspell: ispell replacement. Biggest users in main are Gnome and KDE. Other reverse dependency in main: ekg, pan, php5-pspell, python-gnome2-extras. About 20 reverse dependencies in universe.
  • myspell: ispell replacement. Not used by anything in Ubuntu any more.
  • hunspell: fork/replacement for myspell, dictionaries being compatible with myspell. Currently used by openoffice, Thunderbird, and Firefox.
  • Finnish currently uses the voikko system, which computes the multitude of possible word permissions from a common stem. Altaic (Turkish and related languages) and Ugric (Estonian, Finnish, Hungarian and relatives) languages with complex prefix and suffix systems generally work badly with above systems. So we will continue to support Voikko for Finnish. Since the supported languages of Voikko and hunspell do not overlap, this does not cause large compatibility problems.

ispell and aspell provide a C interface, myspell and hunspell provide a C++ interface. The enchant library provides a way to abstract from the underlying spell checker (aspell or myspell) and can be used to convert applications to a common interface.

Dictionary support is splitted out in its own spec ConsolidateDictionaries.

Goal in Intrepid

hunspell is the most modern implementation and considered the best choice in the free software world.

  • Change GNOME, KDE, and other packages in main to use hunspell.
  • Drop ispell/aspell dictionaries from language-support-* and from main.

  • Demote ispell to universe. Remove myspell from the archive.
  • Drop myspell dictionaries from language-support-* where a hunspell dictionary is available. Keep the myspell one for other languages.

Implementation

GNOME

KDE

KDE4 upstream already uses enchant/hunspell. Above Fedora specification has patches for KDE 3, but since Intrepid will only ship KDE 4.1, we do not need to do anything in particular for KDE for this specification.

Outstanding issues

The demotion of aspell has not been discussed. It would require porting php and ekg to hunspell, which hasn't happened upstream yet. Thus we will keep the aspell library itself in main for the time being.

ConsolidateSpellingLibs (last edited 2008-08-06 16:16:08 by localhost)