<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Santhosh Thottingal &#187; Bugs</title>
	<atom:link href="http://thottingal.in/blog/category/bugs/feed/" rel="self" type="application/rss+xml" />
	<link>http://thottingal.in/blog</link>
	<description>/home/santhosh</description>
	<lastBuildDate>Mon, 14 Nov 2011 06:06:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Python isalpha is buggy</title>
		<link>http://thottingal.in/blog/2009/03/29/python-isalpha-is-buggy/</link>
		<comments>http://thottingal.in/blog/2009/03/29/python-isalpha-is-buggy/#comments</comments>
		<pubDate>Mon, 30 Mar 2009 00:29:00 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Bugs]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=65</guid>
		<description><![CDATA[This code #!/usr/bin/env python # -*- coding: utf-8 -*- ml_string=u"സന്തോഷ് हिन्दी" for ch in ml_string: if(ch.isalpha()): print ch gives this output സ ന ത ഷ ह न द And fails for all mathra signs of Indian languages. This is a known bug in glibc. Does anybody know whether python internally use glibc functions for this [...]]]></description>
			<content:encoded><![CDATA[<p>This code<br />
<br/></p>
<pre>
#!/usr/bin/env python
# -*- coding: utf-8 -*-
ml_string=u"സന്തോഷ്  हिन्दी"
for ch in ml_string:
    if(ch.isalpha()):
        print ch
</pre>
<p><br/><br />
gives this output<br />
<br/></p>
<pre>
സ
ന
ത
ഷ
ह
न
द
</pre>
<p>And fails for all mathra signs of Indian languages. This is a <a href="https://bugzilla.redhat.com/show_bug.cgi?id=466912"> known </a> <a href="https://bugzilla.redhat.com/show_bug.cgi?id=474124"> bug</a> in glibc.<br />
Does anybody know whether python internally use glibc functions for this basic string operations or use separate character database llke QT does?</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2009/03/29/python-isalpha-is-buggy/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Yahoo search bug</title>
		<link>http://thottingal.in/blog/2008/12/05/yahoo-search-bug/</link>
		<comments>http://thottingal.in/blog/2008/12/05/yahoo-search-bug/#comments</comments>
		<pubDate>Sat, 06 Dec 2008 02:54:00 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Bugs]]></category>
		<category><![CDATA[yahoo]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=58</guid>
		<description><![CDATA[None of the search engines can handle Indian languages very well. Google removes the zero width joiners, non joiners , that are used in many languages. Yahoo doesnot remove it. But a UI bug in webpage makes the results wrong.. See the below image: The bottom half of the image is the source code. We [...]]]></description>
			<content:encoded><![CDATA[<p>None of the search engines can handle Indian languages very well. Google removes the zero width joiners, non joiners , that are used in many languages. Yahoo doesnot remove it. But a UI bug in webpage makes the results wrong..<br />
See the below image:<br/><br />
<img src="http://pics.livejournal.com/santhoshtr/pic/0000ta1c" width="320" height="228" border='0'/><br />
<br/></p>
<p>The bottom half of the image is the source code. We can clearly see that the closing bold tag is placed in between the word instead of putting at the end of the word. As a result, the word is rendered wrong in the page.<br />
This happens for all languages which use ZWJ, ZWNJ, ZWS etc. It breaks the word just before the zwnj/zwj and puts the end of bold tag to highlight the search result..</p>
<p>I showed this to <a href="http://t3.dotgnu.info/blog/">Gopal</a> and told me that he filed a bug on that.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2008/12/05/yahoo-search-bug/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>KDE spellchecker not working for Indian Languages</title>
		<link>http://thottingal.in/blog/2008/11/30/kde-spellchecker-not-working-for-indian-languages/</link>
		<comments>http://thottingal.in/blog/2008/11/30/kde-spellchecker-not-working-for-indian-languages/#comments</comments>
		<pubDate>Mon, 01 Dec 2008 00:47:00 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Bugs]]></category>
		<category><![CDATA[kde]]></category>
		<category><![CDATA[spell checker]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=57</guid>
		<description><![CDATA[As I mentioned in my blog post on Language detection the sonnet spellchecker of KDE is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. [...]]]></description>
			<content:encoded><![CDATA[<p>As I mentioned in my blog post on <a href="http://santhoshtr.livejournal.com/13832.html">Language detection</a> the sonnet spellchecker of KDE  is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. And eventually we get all words wrong. This is the logic used in Sonnet to recognize the word boundaries</p>
<blockquote><p>Loop through the chars of the word, until the current char is not a letter/ anymore.</p></blockquote>
<p>And for this , it use the QChar::.isLetter() function. This functions fails for Matra signs of our languages. </p>
<p>
A screenshot from a text area in Konqueror:</p>
<p>
<a href="http://pics.livejournal.com/santhoshtr/pic/0000rw6t/"><img src="http://pics.livejournal.com/santhoshtr/pic/0000rw6t" width="246" height="28" border='0'/></a>
</p>
<p>For example<br />
<code></p>
<pre>
#include &lt;QtCore/QString&gt;
#include &lt;stdlib.h&gt;
int main(){
	QChar letter ;
	letter = 'அ';
	fprintf(stdout,"%d\n", letter.isLetter());
	letter = 'ी';
	fprintf(stdout,"%d\n", letter.isLetter());
}
</pre>
<p></code><br />
In this program, you will get true as output for அ and false for ी. </p>
<p>
When I showed this to <a href="http://sayamindu.randomink.org/ramblings/">Sayamindu</a> during <a href="http://foss.in">foss.in</a> , he showed me a <a href="https://bugzilla.redhat.com/show_bug.cgi?id=466912">bug in glibc </a>. Eventhough the bug is about Bengali, it is applicable for all languages. It is assigned to <a href="http://pravin-s.blogspot.com/">Pravin Satpute</a> and he told me that he got a solution and will be submitting soon to glibc.
</p>
<p>
But I am wondering why this bug in KDE unnoticed so far? Nobody used spellcheck for Indian languages in KDE?!
</p>
<p>
Let me explain why this is not happening in GNOME spellchecker if this is a glibc bug. In gnome, this word splitting will be done in application itself using gtk_text_iter_* and these iteration through words are done by pango words boundary detection algorithms.</p>
<p><a href="https://bugs.kde.org/show_bug.cgi?id=176537">Filed a bug</a> in KDE to track it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2008/11/30/kde-spellchecker-not-working-for-indian-languages/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Firefox spellcheck bugs&#8230;</title>
		<link>http://thottingal.in/blog/2008/06/02/firefox-spellcheck-bugs/</link>
		<comments>http://thottingal.in/blog/2008/06/02/firefox-spellcheck-bugs/#comments</comments>
		<pubDate>Tue, 03 Jun 2008 05:16:00 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Bugs]]></category>
		<category><![CDATA[firefox]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=44</guid>
		<description><![CDATA[Firefox spellcheck feature requires some volunteers to fix the tokenization issue. There are two bugs related to the tokenization Bug 434044 – The tokenization of words for spellcheck is wrong when there is a ZWJ/ZWNJ/ZWS in the word. &#8211; Reported: 2008-05-16 07:49 PDT by Santhosh Thottingal Bug 318040 – Spell checker flags words containing full [...]]]></description>
			<content:encoded><![CDATA[<p>Firefox spellcheck feature requires some volunteers to fix the<br />
tokenization issue. There are two bugs related to the tokenization</p>
<ol>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=434044">Bug 434044 – The tokenization of words for spellcheck is wrong when there is a ZWJ/ZWNJ/ZWS in the word.</a> &#8211; Reported:  2008-05-16 07:49 PDT by Santhosh Thottingal</li>
<li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=318040">Bug 318040 – Spell checker flags words containing full stops (periods) </a>    Reported:       2005-11-28 12:45 PDT by Joseph Wright 	</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2008/06/02/firefox-spellcheck-bugs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>10 GB /var/log/messages file</title>
		<link>http://thottingal.in/blog/2008/05/27/10-gb-varlogmessages-file/</link>
		<comments>http://thottingal.in/blog/2008/05/27/10-gb-varlogmessages-file/#comments</comments>
		<pubDate>Wed, 28 May 2008 06:11:00 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Bugs]]></category>
		<category><![CDATA[fedora]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=42</guid>
		<description><![CDATA[Again fedora! After the installation of linux kernel and linux operating system, I installed some libraries, some small applications that I usually use&#8230; I have a partition for Fedora 9 with 14 GB size. After installing all those softwares, when I rebooted the system today, the gdm was not starting. GDM kept on restarting and [...]]]></description>
			<content:encoded><![CDATA[<p>Again fedora! <img src='http://thottingal.in/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
After the installation of <a href="http://santhoshtr.livejournal.com/10581.html">linux kernel and linux operating system</a>, I installed some libraries, some small applications that I usually use&#8230; I have a partition for Fedora 9 with 14 GB size. After installing all those softwares, when I rebooted the system today, the gdm was not starting. GDM kept on restarting and I could not take a user session by pressing ALT + CTRL + F1. hmm&#8230; So added single at the kernel argument in the grub, and got the shell.<br />
To my surprise I saw that df -a is saying the partition is 100% full..!. I just installed a few application and not anything for 14 GB..<br />
So tried to figure out who is taking the full diskspace  and I caught him.<br />
/var/log/messages <img src='http://thottingal.in/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
Yes!<br />
$ls -l  messages<br />
-rw&#8212;&#8212;-+ 1 root root 10450239682 2008-05-27 20:39 messages</p>
<p>Ok, 9.7 GB. so who is writing to messages?<br />
$tail -n 100 messages<br />
This gave me some hint. Some sample lines from messages file:<br />
May 27 20:39:23 thottingal gdm-simple-slave[2523]: DEBUG: GdmSignalHandler: Adding handler 5: signum=8 0x804c520<br />
May 27 20:39:23 thottingal gdm-simple-slave[2523]: DEBUG: GdmSignalHandler: Registering for 8 signals<br />
May 27 20:39:23 thottingal gdm-simple-slave[2523]: DEBUG: GdmSignalHandler: Adding handler 6: signum=1 0x804c520<br />
May 27 20:39:23 thottingal gdm-simple-slave[2523]: DEBUG: GdmSignalHandler: Registering for 1 signals</p>
<p>GDM was writing all debug messages to the /var/log/messages. can somebody help me to figure out what is wrong with my GDM?<br />
(the [debug] section of /etc/gdm/custom.conf  is empty)</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2008/05/27/10-gb-varlogmessages-file/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bug in Firefox Spellcheck</title>
		<link>http://thottingal.in/blog/2008/05/18/bug-in-firefox-spellcheck/</link>
		<comments>http://thottingal.in/blog/2008/05/18/bug-in-firefox-spellcheck/#comments</comments>
		<pubDate>Mon, 19 May 2008 00:50:00 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Bugs]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[spell checker]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=40</guid>
		<description><![CDATA[There is a bug in Firefox in the spell check functionality that affects many Indian Langauges using Zero Width [Non] Joiners in the words. Firefox uses hunspell as the spelling checker. Openoffice also uses Hunspell. The bug is not there in Openoffice and problem with firefox is with the tokenization of words in editable textfields [...]]]></description>
			<content:encoded><![CDATA[<p>There is a bug in Firefox in the spell check functionality that affects many Indian Langauges using Zero Width [Non] Joiners in the words. Firefox uses hunspell as the spelling checker. Openoffice also uses Hunspell. The bug is not there in Openoffice and problem with firefox is with the tokenization of words in editable textfields before doing spellcheck. Firefox splits the words if there is ZWJ/ZWNJ in the word. And because of this the input to the spellchecker is wrong and it is not the actual word.<br />
I have filed a bug against the spellchecker of Firefox and you can see it <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=434044">here (bug #434044 )</a><br />
I have given some sample words in Malayalam and Bengali(Thanks to <a href="http://runab.livejournal.com">Runa</a>) with ZWJ/ZWNJ. If your language uses ZWJ/ZWNJ, please comment/vote in mozilla bugzilla.  </p>
<p>I found this when I was trying to prepare a Malayalam spellcheck <a href="http://download.savannah.gnu.org/releases/smc/Spellchecker/">extension</a> for firefox(Hunspell wordlist). Still many languages do not have the affix rules in place for aspell/hunspell and it makes the spellcheck less efficient particularly for highly inflected/agglutinated languages like Malayalam.</p>
<p>Thanks to Németh László, Hunspell developer for helping me to figure out the problem</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2008/05/18/bug-in-firefox-spellcheck/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
