From Fedora Project Wiki

Fix the dictionary proliferation problem

Summary

Fix the proliferation of dictionaries in the OS.

Owners

Current status

  • Targeted release: Fedora 9
  • Last modified: 2008-04-07
  • Percentage of completion: 100%
  • This is complete, all major applications and default GNOME/KDE spell checking now goes through hunspell. All that remains is to package dictionaries for the lesser used languages where there hasn't already been a sufficiently vibrant fedora-using language community that has taken up packaging a dictionary for their language.

Usage cases/rationale

We have separate dictionaries for each language for OpenOffice.org, Firefox, Thunderbird, and aspell (which gnome and KDE use). This is dumb.

Benefit to Fedora

We get code reuse, a smaller distribution, and a decreased memory footprint.

Scope

Requires changing the OpenOffice.org, thunderbird, firefox, and dictionary packages.

Test Plan

Test spell checking in all apps.

Dependencies

None.

Details

  1. Split out hunspell from OpenOffice.org - rhbz#214764 complete
  2. Make OpenOffice.org use it - rhbz#214764 complete
  3. Split out the dictionaries into separate packages - rhbz#218769 (english) complete
  4. Make OpenOffice.org use system dictionaries - complete
  5. Make gedit/xchat use it, i.e. enchant. enchant by default already generally prefers using hunspell over aspell, just needs to be told where the dictionaries are - complete
  6. Make evolution use it, i.e. gnome-spell. gnome-spell can be patched to use enchant to achieve this - rhbz#426347 complete
  7. Make tomboy/pidgin use it, i.e. gtkspell. Same story as gnome-spell - rhbz#245888 complete
  8. Make Firefox (and other gecko apps) use it - rhbz#218762 complete, upstream state is now resolved
  9. Make KDE use enchant and/or hunspell - complete - KDE 4 already defaults to enchant in Sonnet. (For !K3Spell, see "legacy KSpell" below.) The aspell backend was dropped entirely in Rawhide. For kdelibs3:
    • The legacy KSpell uses command-line spellcheckers. Kevin Kofler wrote a patch to support hunspell, and kde-settings in Rawhide was changed to make it the default.
    • The newer KSpell2 API is plugin-based and uses libraries. It is what KDE 4's Sonnet is based on. Kevin Kofler backported Sonnet's enchant backend. The aspell and ispell backends were dropped in Rawhide.
    See the fedora-devel-list message.
  10. Remove copy of hunspell from enchant - rhbz#426402 complete
  11. Remove copy of hunspell from xulrunner complete
  12. Split enchant to have a separate enchant-aspell rpm to enable optionally removing the aspell support - rhbz#426402 complete
  13. Prefer hunspell over aspell as the default for install in comps. See table below for mis-match in language support. rhbz#439037 complete
  14. Repackage/replace the aspell dictionaries with hunspell dictionaries 80% see table below for language support

Optional

  1. Write an aspell compatibility layer so aspell apps can use the same dictionaries no volunteer -> deferred, is this neccessary at all ? All major desktop apps work now out of the box
  2. Make vim use hunspell - rhbz#219777 patch available, not necessary if vim continues to not use any spell-checking, but preferred over introducing built-in vim spellchecker which has yet another format which hunspell dicts are converted to for use

Dictionaries

  1. Language Support Matrix
Language Code Language aspell hunspell notes
aa Afar
af Afrikaans aspell-af hunspell-af
am Amharic available
an Aragonese
ar Arabic aspell-ar hunspell-ar
as Assamese
ast Asturian
az Azeri available
be Belarusian available
ber Amazigh
bg Bulgarian aspell-bg hunspell-bg
bn Bengali aspell-bn hunspell-bn
bo Tibetan
br Breton aspell-br
bs Bosnian
byn Blin
ca Catalan aspell-ca hunspell-ca
crh Crimean
cs Czech aspell-cs hunspell-cs
csb Kashubian available
cy Welsh aspell-cy hunspell-cy
da Danish aspell-da hunspell-da
de German aspell-de hunspell-de
dz Bhutanese
el Greek aspell-el hunspell-el
en English aspell-en hunspell-en
es Spanish aspell-es hunspell-es
et Estonian hunspell-ee
eu Basque available
fa Farsi available
fi Finnish Finnish Community has a parallel voikko based set of dictionaries for OOo/xulrunner/enchant
fil Filipino Filipino is the official Tagalog dialect, see tl entry for Tagalog dictionary
fo Faeroese aspell-fo hunspell-fo
fr French aspell-fr hunspell-fr
fur Furlan available
fy Frisian available
ga Irish aspell-ga hunspell-ga
gd Scots Gaelic aspell-gd hunspell-gd
gez Ge'ez
gl Galician aspell-gl hunspell-gl
gu Gujarati aspell-gu available
gv Manx
ha Hausa
he Hebrew aspell-he hunspell-he
hi Hindi aspell-hi hunspell-hi
hr Croatian aspell-hr hunspell-hr
hsb Upper Sorbian
hu Hungarian hunspell-hu
hy Armenian available
id Indonesian aspell-id hunspell-id
ig Igbo
ik Inupiaq
is Icelandic aspell-is available
it Italian aspell-it hunspell-it
iu Inuktitut
ja Japanese
ka Georgian
kk Kazakh
kl Greenlandic
km Khmer available
kn Kannada
ko Korean
ku Kurdish available
kw Cornish
ky Kyrgyz
lg Luganda
li Limburgish
lo Lao
lt Lithuanian hunspell-lt
lv Latvian available
mai Maithili
mg Malagasy available
mi Maori available
mk Macedonian available
ml Malayalam aspell-ml hunspell-ml
mn Mongolian available
mr Marathi aspell-mr hunspell-mr
ms Malay hunspell-ms
mt Maltese
nb Bokmaal aspell-no hunspell-nb
nds Lowlands Saxon
ne Nepali hunspell-ne
nl Dutch aspell-nl hunspell-nl
nn Nynorsk aspell-no hunspell-nn
nr Ndebele (Southern) available
nso Sotho (Nothern) available
oc Occitan available
om Oromo
or Oriya aspell-or hunspell-or
pa Punjabi aspell-pa hunspell-pa
pap Papiamento
pl Polish aspell-pl hunspell-pl
pt Portuguese aspell-pt hunspell-pt
ro Romanian available
ru Russian aspell-ru hunspell-ru
rw Kinyarwanda available
sa Sanskrit
sc Sardinian
se Saami available
shs Secwepemctsin
si Sinhala
sid Sidama
sk Slovak aspell-sk hunspell-sk
sl Slovenian aspell-sl hunspell-sl
so Somali
sq Albanian
sr Serbian aspell-sr hunspell-sr
ss Swati available
st Sotho (Southern) available
sv Swedish aspell-sv hunspell-sv
ta Tamil aspell-ta hunspell-ta
te Telugu aspell-te
tg Tajik
th Thai hunspell-th
ti Tigrigna
tig Tigre
tk Turkmen
tl Tagalog available
tn Tswana available
tr Turkish
ts Tsonga available
tt Tatar available
ug Uyghur
uk Ukrainian available
ur Urdu available
uz Uzbek available
ve Venda available
vi Vietnamese available
wa Walloon available
wo Wolof
xh Xhosa available
yi Yiddish
yo Yoruba
zh Chinese
zu Zulu hunspell-zu

User experience

Should not affect user experience.

Contingency plan

Continue to ship older dictionaries.

Documentation

[1]

Release Notes

There is a new default spell checking back-end, hunspell, for both the GNOME and KDE desktops, as well as applications such as OpenOffice.org, Firefox, and other XULRunner-based applications. This common back-end includes a set of shared, multi-lingual dictionaries for use with hunspell. This feature uses a single set of common dictionaries regardless of the application, which gives consistent suggestions for misspelled words and uses less diskpace by eliminating duplicate dictionaries.

Comments

Note that JDS is going down this route as well

The OpenOffice.org hunspell dictionary list of working dictionaries

The mozilla hunspell dictionary list of tri-licensed dictionaries

How to build a dictionary

[Language Codes http://www.loc.gov/standards/iso639-2/php/code_list.php]

A somewhat related issue .

Will help on adding Indic hunspell dictionaries in Fedora - paragn.

php5 and bluefish still link to aspell at least - kmaraas. (It's not practical for me to port everything, just the core default installed components and the default spell-checking solutions for the main desktop environments and applications - caolanm)