LibreOffice Malayalam spellchecker using mlmorph

A few months back, I wrote about the spellchecker based on Malayalam morphology analyser. I was also trying to intergrate that spellchecker with LibreOffice. It is not yet ready for any serious usage, but if you are curious and would like to help me in its further development, please read on. Malayalam spellchecker – a morphology analyser based approach Blog post on spellchecker approach and pla Current status The libreoffice spellchecker for Malayalam is available at https://gitlab. [Read More]

KDE spellchecker not working for Indian Languages

As I mentioned in my blog post on Language detection the sonnet spellchecker of KDE is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. And eventually we get all words wrong. This is the logic used in Sonnet to recognize the word boundaries [Read More]

Language Detection and Spellcheckers

A few weeks back there was a discussion on #indlinux IRC channel about automatic language detection. The idea is, spellcheckers or any language tools should not ask the users to select a language. Instead, they should detect the language automatically. The idea is not new. There is a KDE bug hereand Ubuntu has this as an brainstorm idea. It seems M$ word already have this. A sample use case can be this: “While preparing a document in Openoffice, I want to write in English as well as in Hindi. [Read More]

Bug in Firefox Spellcheck

There is a bug in Firefox in the spell check functionality that affects many Indian Langauges using Zero Width [Non] Joiners in the words. Firefox uses hunspell as the spelling checker. Openoffice also uses Hunspell. The bug is not there in Openoffice and problem with firefox is with the tokenization of words in editable textfields before doing spellcheck. Firefox splits the words if there is ZWJ/ZWNJ in the word. And because of this the input to the spellchecker is wrong and it is not the actual word. [Read More]

Malayalam Spellchecker

See the Aspell Malayalam spelling checker working on Gedit.This development version is having only 4500 Malayalam words in the dictionary. It is not at all sufficient for Malayalam.

Compound word handling and soundslike features are yet to be developed. Snapshot from Anivar’s machine

Only Aspell, no space for others…

It seems that our work on our own spell checker doesnot have any importance other than learning. Aspell is light years ahead of us.There are ispell, myspell also. But we learned a lot about the approximate string comparison, fast search on a big wordlist, candidate list generation etc.. Gora Mohanty gave valuable insights to me on Aspell and how to create the Aspell word list for Malayalam.But still problems on compound words of malayalam. [Read More]

Spell checker and Late night coding..

It was a wonderful week end. Myself and Benzi were working on the spell checker for Malayalam. In April we had done lot of research on this. We did the coding for the dictionary representation in the Binary Retrieval tree (TRIE). Saturday night we did the candidate list generation coding. It is a wonderful experience to code in the late night – one laptop and two persons to code!!!. Every thing worked fine. [Read More]