Firefox spellcheck feature requires some volunteers to fix the
tokenization issue. There are two bugs related to the tokenization
- Bug 434044 – The tokenization of words for spellcheck is wrong when there is a ZWJ/ZWNJ/ZWS in the word. – Reported: 2008-05-16 07:49 PDT by Santhosh Thottingal
- Bug 318040 – Spell checker flags words containing full stops (periods) Reported: 2005-11-28 12:45 PDT by Joseph Wright
There is a bug in Firefox in the spell check functionality that affects many Indian Langauges using Zero Width [Non] Joiners in the words. Firefox uses hunspell as the spelling checker. Openoffice also uses Hunspell. The bug is not there in Openoffice and problem with firefox is with the tokenization of words in editable textfields before doing spellcheck. Firefox splits the words if there is ZWJ/ZWNJ in the word. And because of this the input to the spellchecker is wrong and it is not the actual word.
I have filed a bug against the spellchecker of Firefox and you can see it here (bug #434044 )
I have given some sample words in Malayalam and Bengali(Thanks to Runa) with ZWJ/ZWNJ. If your language uses ZWJ/ZWNJ, please comment/vote in mozilla bugzilla.
I found this when I was trying to prepare a Malayalam spellcheck extension for firefox(Hunspell wordlist). Still many languages do not have the affix rules in place for aspell/hunspell and it makes the spellcheck less efficient particularly for highly inflected/agglutinated languages like Malayalam.
Thanks to Németh László, Hunspell developer for helping me to figure out the problem