GNU C library 2.26 released and proud to have my name listed under contributors. Two small patches for localedata were merged:
Bug 19922 – iso14651_t1_common: Define collation for Malayalam chillu characters
Python isalpha is buggy
This code
#!/usr/bin/env python # -*- coding: utf-8 -*- ml_string=u"സന്തോഷ് हिन्दी" for ch in ml_string: if(ch.isalpha()): print ch
gives this output
സ ന ത ഷ ह न द
And fails for all mathra signs of Indian languages. This is a known bug in glibc.
Does anybody know whether python internally use glibc functions for this basic string operations or use separate character database llke QT does?
Yahoo search bug
None of the search engines can handle Indian languages very well. Google removes the zero width joiners, non joiners , that are used in many languages. Yahoo doesnot remove it. But a UI bug in webpage makes the results wrong..
See the below image:
The bottom half of the image is the source code. We can clearly see that the closing bold tag is placed in between the word instead of putting at the end of the word.
[Read More]
KDE spellchecker not working for Indian Languages
As I mentioned in my blog post on Language detection the sonnet spellchecker of KDE is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. And eventually we get all words wrong. This is the logic used in Sonnet to recognize the word boundaries
[Read More]
Firefox spellcheck bugs…
Firefox spellcheck feature requires some volunteers to fix the
tokenization issue. There are two bugs related to the tokenization
- Bug 434044 – The tokenization of words for spellcheck is wrong when there is a ZWJ/ZWNJ/ZWS in the word. – Reported: 2008-05-16 07:49 PDT by Santhosh Thottingal
- Bug 318040 – Spell checker flags words containing full stops (periods) Reported: 2005-11-28 12:45 PDT by Joseph Wright
10 GB /var/log/messages file
Again fedora! 🙂
After the installation of linux kernel and linux operating system, I installed some libraries, some small applications that I usually use… I have a partition for Fedora 9 with 14 GB size. After installing all those softwares, when I rebooted the system today, the gdm was not starting. GDM kept on restarting and I could not take a user session by pressing ALT + CTRL + F1. hmm… So added single at the kernel argument in the grub, and got the shell.
[Read More]
Bug in Firefox Spellcheck
There is a bug in Firefox in the spell check functionality that affects many Indian Langauges using Zero Width [Non] Joiners in the words. Firefox uses hunspell as the spelling checker. Openoffice also uses Hunspell. The bug is not there in Openoffice and problem with firefox is with the tokenization of words in editable textfields before doing spellcheck. Firefox splits the words if there is ZWJ/ZWNJ in the word. And because of this the input to the spellchecker is wrong and it is not the actual word.
[Read More]