As I mentioned in my blog post on Language detection the sonnet spellchecker of KDE is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. And eventually we get all words wrong. This is the logic used in Sonnet to recognize the word boundaries
Loop through the chars of the word, until the current char is not a letter/ anymore.
And for this , it use the QChar::.isLetter() function. This functions fails for Matra signs of our languages.
A screenshot from a text area in Konqueror:
For example
`
#include#include int main(){ QChar letter ; letter = 'அ'; fprintf(stdout,"%d\n", letter.isLetter()); letter = 'ी'; fprintf(stdout,"%d\n", letter.isLetter()); }
`
In this program, you will get true as output for அ and false for ी.
When I showed this to Sayamindu during foss.in , he showed me a bug in glibc . Eventhough the bug is about Bengali, it is applicable for all languages. It is assigned to Pravin Satpute and he told me that he got a solution and will be submitting soon to glibc.
But I am wondering why this bug in KDE unnoticed so far? Nobody used spellcheck for Indian languages in KDE?!
Let me explain why this is not happening in GNOME spellchecker if this is a glibc bug. In gnome, this word splitting will be done in application itself using gtk_text_iter_* and these iteration through words are done by pango words boundary detection algorithms.
Filed a bug in KDE to track it.