<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Santhosh Thottingal &#187; Projects</title>
	<atom:link href="http://thottingal.in/blog/category/projects/feed/" rel="self" type="application/rss+xml" />
	<link>http://thottingal.in/blog</link>
	<description>/home/santhosh</description>
	<lastBuildDate>Mon, 23 Aug 2010 11:29:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Wikimania 2010, Poland</title>
		<link>http://thottingal.in/blog/2010/07/17/wikimania-2010-poland/</link>
		<comments>http://thottingal.in/blog/2010/07/17/wikimania-2010-poland/#comments</comments>
		<pubDate>Sat, 17 Jul 2010 09:25:44 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=276</guid>
		<description><![CDATA[I left Chennai on Wednesday(8th) and reached Frankfurt airport on Thursday morning. Rest of the people from India for wikimania- Shiju Alex, Tinu Cherian, Srinivas Gunta, Arjun Rao  were already reached the airport and I joined them. We reached Gdansk Airport by 12.30 PM. Our accommodation was at a students hostel of Gdansk University.  Language [...]]]></description>
			<content:encoded><![CDATA[<p>I left Chennai on Wednesday(8th) and reached Frankfurt airport on Thursday morning. Rest of the people from India for wikimania- Shiju Alex, Tinu Cherian, Srinivas Gunta, Arjun Rao  were already reached the airport and I joined them. We reached Gdansk Airport by 12.30 PM. Our accommodation was at a students hostel of Gdansk University.  Language was a big issue since most of the people does not understand English and only know Polish Language. The lady at the reception of the hostel we stayed was using Google translate tool for communicating with us.  The <a href="http://en.wikipedia.org/wiki/Gda%C5%84sk">Gdansk city</a> is a very beautiful city with streets of  big brick made tall buildings.</p>
<p>The conference started on Friday morning. Sue Gardner, Executive Director of the Wikimedia Foundation. talked about the strategies of foundation, and it followed by a QnA with wikimedia board members. We presented the Malayalam CD to Sue Gardner. She remembered the International free software conference she attended at Trivandum on 2008 december.<br />
Our workshop on offline wikipedia versions started at 2.30. Martin Walked introduced the workshop to participants. Manuel Schneider from German wikipedia explained the Openzim format for offline compressed storage of wikipedia and the available readers on desktop computers and mobile phones. Shiju Alex introduced the Malayalam wikipedia offline verision 1.0. I talked about the issues and solutions for providing an offline version, particularly a non-latin complex script wiki to users in CD ROM or DVD. I demonstrated sample offline wikis on Hebru, Tamil, Polish, English, Japanese with the wiki2cd tool. There were a couple of questions on wiki2cd and openzim. Kul Takanao Wadhwa and Tomasz Finc  from wikimedia foundation who are focusing on offline wiki projects attended the workshop and we had a discussion after the talk.</p>
<div id="attachment_278" class="wp-caption alignright" style="width: 310px"><a href="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5132.jpg"><img class="size-medium wp-image-278 " title="Offline wikipedia people: myself, Shiju Alex, Martin Walker, Manuel Schneider" src="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5132-300x225.jpg" alt="Offline wikipedia people: myself, Shiju Alex, Martin Walker, Manual Schneider" width="300" height="225" /></a><p class="wp-caption-text">Offline wikipedia people: myself, Shiju Alex, Martin Walker, Manuel Schneider</p></div>
<p>The offline wiki workshop continued with Pediapress team. They are the people behind the recently added export book/PDF feature of wikipedia. Unfortunately this feature require lots of improvements to get work with Indian scripts.<br />
We met Gerald M, who focus on language support of wikis. He is such a person with amazing passion towards our local language wikipedias. We discussed on the technical issues of the local language wikis. Siebrand joined the discussion and he pointed out some improvements in wiki2cd software.</p>
<div id="attachment_277" class="wp-caption alignleft" style="width: 310px"><a href="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5074.jpg"><img class="size-medium wp-image-277  " title="discussion with Siebrand, Gerard" src="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5074-300x225.jpg" alt="" width="300" height="225" /></a><p class="wp-caption-text">Discussion with Siebrand on wiki2cd improvements. From left 2 right: Tinu Cherian, myself, Gerard M, Siebrand</p></div>
<p>On the second day I met Volker Haas, the developer of PDF export/books feature of wikipedia. The library used by the extension of creating PDFs is Reportlab. But it does not support complex scripts such as Indic or Arabic. We have a long discussion on possible solutions. Discussed the Reportlab code. the mwlib code, and the s/w which I am writing now  a days to provide complex script pdf rendering APIs. We will continue to try out some of the options we have to solve this issue soon.</p>
<p>Martin Walker, who presented the Article Selection process of English wikipedia along with us in the workshop  invited me and Shiju for a dinner with his family. And we went for that.</p>
<p>The third day started with plenary session by Jimmy Wales. He talked about small language wikipedia and the issues faced by them. He emphasized the need for offline versions of wikipedia to reach more number of people and talked about the Malayalam Wikipedia offline version.</p>
<div id="attachment_278" class="wp-caption alignright" style="width: 310px"><a href="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5132.jpg"><img class="size-medium wp-image-278 " title="Jimmy Wales with Malayalam wikipedia CD" src="http://upload.wikimedia.org/wikipedia/commons/4/4d/2010-07-11-gdansk-by-RalfR-001.jpg" alt="Jimmy Wales with Malayalam wikipedia CD" width="300" height="225" /></a><p class="wp-caption-text">Jimmy Wales with Malayalam wikipedia CD</p></div>
<p>Mayooranathan from Tamil wikipedia presented the issues and statistics of Tamil Wikipedia Community. On Monday and Tuesday,  we spent time by roaming around the Old Town of Gdansk. Visited <a href="http://en.wikipedia.org/wiki/St._Mary%27s_Church,_Gda%C5%84sk">St. Marys Church</a> , the biggest brick made church in the world. We climbed the 400 steps of the tower of the church. From the top of the chruch, one can see the entire city. We went to the Baltic sea beach -Westerplatte on a boat and visited <a href="http://www.bfr.pl/index.php?option=com_content&amp;task=view&amp;id=55&amp;Itemid=39">Wisłoujście Fortress</a></p>
<h2>Related posts:</h2>
<p>* Creating Malayalam Wikipedia CD: <a href="http://shijualex.wordpress.com/2010/04/24/creating-malayalam-wikipedia-cd/" target="_blank">http://shijualex.wordpress.com/2010/04/24/creating-malayalam-wikipedia-cd/</a><br />
* Wiki2cd:<a href="http://thottingal.in/blog/2010/04/17/mlwikioncd/" target="_blank"> http://thottingal.in/blog/2010/04/17/mlwikioncd/</a><br />
* Wikipedia Sign post: <a href="http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-04-19/News_and_notes#Briefly" target="_blank">http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-04-19/News_and_notes#Briefly</a><br />
* Gerard&#8217;s Blog: <a href="http://ultimategerardm.blogspot.com/2010/04/best-of-malayalam-wikipedia.html" target="_blank">http://ultimategerardm.blogspot.com/2010/04/best-of-malayalam-wikipedia.html</a><br />
* Gerard&#8217;s Blog: <a href="http://ultimategerardm.blogspot.com/2010/04/cd-dowloaded-4390-in-10-days.html" target="_blank">http://ultimategerardm.blogspot.com/2010/04/cd-dowloaded-4390-in-10-days.html</a><br />
* Gerard&#8217;s Blog: <a href="http://ultimategerardm.blogspot.com/2010/07/malayalamwikipedia-success-story.html" target="_blank">http://ultimategerardm.blogspot.com/2010/07/malayalamwikipedia-success-story.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2010/07/17/wikimania-2010-poland/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Attending Wikimania 2010</title>
		<link>http://thottingal.in/blog/2010/07/06/wikimania-2010/</link>
		<comments>http://thottingal.in/blog/2010/07/06/wikimania-2010/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 09:32:45 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=273</guid>
		<description><![CDATA[I will be attending  Wikimania 2010,  Gdansk, Poland.  This annual international conference of the Wikimedia community is from July 9 to July 11. I will be presenting wik2cd, the tool I wrote for Malayalam wikipedia version 1.0 there in a joint workshop with wikipedia offline developers.  I will be joining with Manuel Schneider,  Shiju Alex, [...]]]></description>
			<content:encoded><![CDATA[<p>I will be attending  <a href="http://wikimania2010.wikimedia.org" target="_blank">Wikimania 2010,  Gdansk, Poland</a>.  This annual international conference of the Wikimedia community is from July 9 to July 11.</p>
<p>I will be presenting wik2cd, the tool I wrote for Malayalam wikipedia version 1.0 there in a joint workshop with wikipedia offline developers.  I will be joining with Manuel Schneider,  Shiju Alex, Martin Walker in the workshop titled: <a title="Submissions/Creating offline version of Wiki content - Solutions  and Challenges" href="http://wikimania2010.wikimedia.org/wiki/Submissions/Creating_offline_version_of_Wiki_content_-_Solutions_and_Challenges">Creating offline version of Wiki content – Solutions  and Challenges. </a>Apart from this, I will be meeting <a href="http://code.pediapress.com/" target="_blank">pediapress team</a>, the team behind wikipedia&#8217;s latest <a href="http://en.wikipedia.org/wiki/Help:Books" target="_blank">PDF/Book export feature</a>. There are some issues in this tool for working with Indic languages, mainly because of the PDF rendering engine not capable of rendering complex scripts.</p>
<p>Thanks to <a href="http://www.wikimediafoundation.org/" target="_blank">Wikimedia foundation</a> for granting me a scholarship to cover travel expenses.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2010/07/06/wikimania-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Malayalam Wikipedia releases selected articles on CD</title>
		<link>http://thottingal.in/blog/2010/04/17/mlwikioncd/</link>
		<comments>http://thottingal.in/blog/2010/04/17/mlwikioncd/#comments</comments>
		<pubDate>Sat, 17 Apr 2010 05:03:47 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Malayalam]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=250</guid>
		<description><![CDATA[As part of Malayalam Wikipedia Meetup 2010 , today  Malayalam wikipedia releases 500 selected articles on a CD ROM. This is the first time in India, a Wikipedia on local language releasing its articles for offline usage. I handled the technology part  of the project. The idea was to get the selected articles in static [...]]]></description>
			<content:encoded><![CDATA[<p>As part of <a href="http://ml.wikipedia.org/wiki/Meetup-2010_April" target="_blank">Malayalam Wikipedia Meetup 2010</a> , today  Malayalam wikipedia releases 500 selected articles on a CD ROM. This is the first time in India, a Wikipedia on local language releasing its articles for offline usage. I handled the technology part  of the project.</p>
<p>The idea was to get the selected articles in static form to the CD. But this is not easy as we imagine. It is not like saving each  page from browser to the local machine. Following were the challenges:</p>
<ul>
<li>Automate the process of getting the page and the images in it. Wikipedia articles changes frequently. So we need the program to fetch the latest article from wiki whenever it is executed.</li>
<li>Fix all the links, css, javascript, image references so that all resolves within CD itself</li>
<li>Provide an categorized index of the articles for easily locating the article.</li>
<li>Provide a search in the article titles.</li>
<li>ISO 9660 filesystem of CD/DVD has lots of limitations. There are restrictions on unicode names of the files, length of the file names, directory depth, special characters in filenames etc. Wikipedia has its article and image names with unicode, special characters and most of the time they exceeds the filename length. To avoid all these, we should rename most of the files and then fix the cross references in all files.</li>
<li>It should work on all Operating systems. All the content should be presented with HTML, Javascript and CSS. Being the content in Malayalam, even if the user does not have required fonts in her/his machine, there should not be any problem for reading the content(font embedding required).</li>
</ul>
<p>Manually solving all these challenges is not the way to go. So I wrote a program, which just takes the article titles and does all the above tasks and finally creates a repository ready for burning to CD ROM.</p>
<p>Wget disappointed me in fetching the content from wiki. There is an <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=411290" target="_blank">open bug</a> in wget which make the download of non-latin URLs impossible.</p>
<p>Have a look at the CD content we created : <a href="http://thottingal.in/projects/mlwikioncd/wiki/" target="_blank">Malayalam Wikipedia Selected 500 Articles</a> . <a href="http://hiran.in" target="_blank">Hiran</a> helped me with the artworks.</p>
<p><a href="http://thottingal.in/blog/wp-content/uploads/2010/04/mlwikioncd.png"><img class="aligncenter size-medium wp-image-257" title="mlwikioncd" src="http://thottingal.in/blog/wp-content/uploads/2010/04/mlwikioncd-300x291.png" alt="The CD cover image designed by Hiran" width="300" height="291" /></a></p>
<p>Since entire process is automated, the program can be used for any other language.  I am releasing the program for the benefit of everybody. You can get the program from <a href="http://github.com/santhoshtr/wiki2cd" target="_blank">here</a>. It is written on Python. Jquery was used for the UI.  For details on the usage, customization etc read the <a href="http://wiki.github.com/santhoshtr/wiki2cd/" target="_blank">wiki page</a> of the project.</p>
<p>For those who can&#8217;t read Malayalam, here is a <a href="http://thottingal.in/projects/wiki2cd/samplewiki/" target="_blank">sample wiki </a>created  by the wiki2cd program from English wikipedia by selecting 10 articles.</p>
<p>Malayalam Wikipedia Community  hope that this is a big step to reach the majority of the people who does not have internet access. If printed, this 500 articles will be at least 5000 pages. CDROM also includes information about commonly used free software based tools for Malayalam computing. Some writing tools and fonts are distributed in the same CD ROM.</p>
<p>Thanks to Malayalam Wikipedia for giving this great opportunity to wok on this project.</p>
<p>The ISO image of the CD is available <a href="http://www.mlwiki.in/mlwikicd/img/MLWikipediaCD-2010.iso" target="_blank">here</a> for download.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2010/04/17/mlwikioncd/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Predictive text entry with ibus</title>
		<link>http://thottingal.in/blog/2010/03/12/predictive-text-entry-with-ibus/</link>
		<comments>http://thottingal.in/blog/2010/03/12/predictive-text-entry-with-ibus/#comments</comments>
		<pubDate>Fri, 12 Mar 2010 14:39:37 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[ibus]]></category>
		<category><![CDATA[predictive text entry]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=245</guid>
		<description><![CDATA[A few days back I came to know about this project :Text Prediction on GNOME based on GTK+ Input Method context. Basically it is an input method with text prediction feature. I had a similar project idea during 2009 May and had done some amount of coding for that. The project was to have an [...]]]></description>
			<content:encoded><![CDATA[<p>A few days back I came to know about this project :<a href="http://www.joaquimrocha.com/2010/03/03/text-prediction-on-gnome/">Text Prediction on GNOME</a> based on <a title="GTK+ Input Method context" href="http://www.gtk.org/api/2.6/gtk/GtkIMContext.html" target="_blank">GTK+  Input Method context</a>. Basically it is an input method with text prediction feature.</p>
<p>I had a similar project idea during 2009 May and had done some amount of coding for that. The project was to have an <a href="http://code.google.com/p/ibus">IBUS</a> input method which can do letter prediction as well as word prediction. The prediction is based on ngrams.  Since it is based on ibus, it works on all desktop applications.  You can see the screenshots of prototype from <a href="http://smc.org.in/~santhosh/images/sulekha-proto1.png">here</a>, <a href="http://smc.org.in/~santhosh/images/sulekha-proto2.png">here </a>and <a href="http://smc.org.in/~santhosh/images/sulekha-proto3.png">here</a></p>
<p>The core code was ready. It was written in python and use ibus-python. Unfortunately I did not get time to spend on this project for a long time and currently this project is not there in my top priorities.  Since I see many people interested in auto-completion or predictive text entry, I uploaded the code here <a href="http://github.com/santhoshtr/ibus-sulekha">http://github.com/santhoshtr/ibus-sulekha</a> .  It is not in a working state as of now, but I would be happy if anybody interested in taking it forward.  I wrote a small documentation on algorithm <a href="http://wiki.github.com/santhoshtr/ibus-sulekha/">here</a>, and feel free to contact me if any help is required.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2010/03/12/predictive-text-entry-with-ibus/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Inkscape hyphenation extension</title>
		<link>http://thottingal.in/blog/2009/10/03/inkscape-hyphenation-extension/</link>
		<comments>http://thottingal.in/blog/2009/10/03/inkscape-hyphenation-extension/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 14:33:03 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[extensions]]></category>
		<category><![CDATA[hyphenation]]></category>
		<category><![CDATA[inkscape]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=231</guid>
		<description><![CDATA[One year back I wrote about how to use Inkscape as a workaround solution for DTP in indic scripts. Still we don&#8217;t have any DTP software which supports Indic scripts in Unicode. Scribus still does not have the Indic support. One issue with inkscape when used as DTP for indic script was, a few indic [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">One year back I wrote about <a href="http://thottingal.in/blog/2008/04/10/using-inkscape-for-dtp-in-indic-scripts/">how to use Inkscape as a workaround solution for DTP in indic scripts</a>. Still we don&#8217;t have any DTP software which supports Indic scripts in Unicode. <a href="http://www.scribus.net/">Scribus</a> still does not have the Indic support.</p>
<p style="text-align: justify;">One issue with inkscape when used as DTP for indic script was, a few indic scripts always wanted hyphenation when text is justified. For example Malayalam has lengthy words and often space is wasted in lines if the text is not automatically hyphenated. But this feature was not available in inkscape. There is a <a href="https://bugs.launchpad.net/inkscape/+bug/171140">wishlist bug</a> for adding this feature to Inkscape.  I tried to develop an extension for Inkscape to achieve this.</p>
<p style="text-align: justify;">It is on top of the python hyphenation code written by Wilbert  Berendsen. The hyphenation rules, also called as patterns is TeX or<br />
Openoffice itself. So  I can support any language which has TeX hyphenation rules. But, since the hyphenation rules are language specific we need a language selection mechanism for the text first. Then only we can select the rules and do the hyphenation. But it is very tricky to implement.  Asking the language of the text every time it is justified is not a good idea. Setting a language for document is another choice, but what if the text contains multiple languages?  But for Indian languages it is very easy, we can automatically detect the scripts using unicode codepoints and load the rules accordingly. So for the time being, my extension support only English and all Indian languages.</p>
<p style="text-align: justify;">Download the extension from <a href="http://thottingal.in/projects/inkscape_hyphenation/inkscape-hyphenation.zip">http://thottingal.in/projects/inkscape_hyphenation/inkscape-hyphenation.zip</a> . In GNU/Linux machines,  extract the zip file and copy to /usr/share/inkscape/extensions folder. In Windows , extract to [inkscape installation directory]\extensions folder.  After this close and reopen inkscape. You will see a menu named Hyphenate in Effects-&gt;Text menu.    In the document, add a text field, enter text in any indian language. Select the text and apply hyphenation by Effects-&gt;Text-&gt;Hyphenate. Then change the alignment of text to justify. You will see the text get hyphenated and occupying maximum possible space in the text field</p>
<p style="text-align: justify;">I got satisfactory result with Malayalam and Tamil. I did not test other languages. Following images illustrates hyphenated, justified, two column layout of text done in Inkscape</p>
<div class="mceTemp" style="text-align: justify;">
<dl class="wp-caption alignnone" style="width: 417px;">
<dt class="wp-caption-dt"><a href="http://thottingal.in/projects/inkscape_hyphenation/hyphenated-inkscape.png"><img title="Malayalam Hyphenation In inkscape " src="http://thottingal.in/projects/inkscape_hyphenation/hyphenated-inkscape.png" alt="Malayalam Hyphenation In inkscape " width="407" height="574" /></a></dt>
<dd class="wp-caption-dd">Malayalam Hyphenation In inkscape </dd>
</dl>
</div>
<div class="mceTemp" style="text-align: justify;">
<dl class="wp-caption alignnone" style="width: 420px;">
<dt class="wp-caption-dt"><a href="http://thottingal.in/projects/inkscape_hyphenation/hyphenated-inkscape-tamil.png"><img title="Tamil Hyphenation in Inkscape" src="http://thottingal.in/projects/inkscape_hyphenation/hyphenated-inkscape-tamil.png" alt="Tamil Hyphenation in Inkscape" width="410" height="577" /></a></dt>
<dd class="wp-caption-dd">Tamil Hyphenation in Inkscape</dd>
</dl>
</div>
<p style="text-align: justify;">We had a discussion about this in<a href="me: OK, Once you read it http://sourceforge.net/mailarchive/forum.php?thread_name=20090924155717.GC4250%40bowman.infotech.monash.edu.au&amp;forum_name=inkscape-devel"> inkscape mailing list </a>. Some developers suggested to have this feature built in, not as extension.  There are few issues to be solved for that. One thing is language selection as I explained. The other issue is regarding the hyphenation character to be used. <a href=" http://www.unicode.org/unicode/reports/tr14/#SoftHyphen">Unicode standard insists to use soft hyphen</a> &#8211; u00AD as hyphenation character. This is an invisible character. For Malayalam, visible hyphens are not required. But some other languages require the hyphen sign where the word is broken at the end of the line. The rules for whether the soft hyphen should be visible or not visible is not clear in Unicode&#8217;s specification. Pango never displays a the soft hyphen. There are criticism on this specification of softhyphen</p>
<ul style="text-align: justify;">
<li>Jukka Korpela, Soft hyphen (SHY) &#8211; a hard problem?  <a href="http://www.cs.tut.fi/%7Ejkorpela/shy.html" target="_blank">http://www.cs.tut.fi/~jkorpela/shy.html</a></li>
<li> Markus Kuhn, Unicode interpretation of SOFT HYPHEN breaks ISO 8859-1   compatibility. Unicode Technical Committee document L2/03-155R, June 2003. <a href="http://www.cl.cam.ac.uk/%7Emgk25/ucs/L2/03155r-kuhn-soft-hyphen.pdf" target="_blank">http://www.cl.cam.ac.uk/~mgk25/ucs/L2/03155r-kuhn-soft-hyphen.pdf</a></li>
</ul>
<p style="text-align: justify;">So I think there is something to be done from Rendering engine or Unicode need to clarify the confusions.  But Openoffice and HTML rendering engines always make soft hyphen at the end of the line, which is not desired for some languages.</p>
<p style="text-align: justify;">Try this extension, let me know the comments. For small scale DTP works, such as pamphlets, notices, brochures  inkscape is enough. But since inkscape is not primarily a DTP software and does not have paging support, for books and large scale DTP works, it may not work well.</p>
<p style="text-align: justify;">
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2009/10/03/inkscape-hyphenation-extension/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>New Hyphenation Pattern Extensions for Openoffice</title>
		<link>http://thottingal.in/blog/2009/08/15/ooo_hyphenation_extensions/</link>
		<comments>http://thottingal.in/blog/2009/08/15/ooo_hyphenation_extensions/#comments</comments>
		<pubDate>Sat, 15 Aug 2009 09:35:45 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[hyphenation]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=221</guid>
		<description><![CDATA[Openoffice Indic Natural Language group announces the availability of the following Openoffice hyphenation dictionary extensions. Malayalam Hyphenation Rules version 1.2 Kannada Hyphenation Rules version 1.1 Bengali Hyphenation Rules verson 1.1 Hindi Hyphenation Rules version 1.1 Telugu Hyphenation Rules version 1.0 Tamil Hyphenation Rules version 1.0 Gujarati Hyphenation Rules version 1.0 Panjabi Hyphenation Rules version 1.0 [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://wiki.services.openoffice.org/wiki/NLC/IndicGroup">Openoffice Indic Natural Language group</a> announces the availability of the following Openoffice hyphenation dictionary extensions.</p>
<ol>
<li><a title="http://extensions.services.openoffice.org/project/hyph_ml_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_ml_IN">Malayalam Hyphenation Rules version 1.2</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_kn_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_kn_IN">Kannada Hyphenation Rules version 1.1</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_bn_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_bn_IN">Bengali Hyphenation Rules verson 1.1</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_hi_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_hi_IN">Hindi Hyphenation Rules version 1.1</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_te_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_te_IN">Telugu Hyphenation Rules version 1.0</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_ta_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_ta_IN">Tamil Hyphenation Rules version 1.0</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_gu_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_gu_IN">Gujarati Hyphenation Rules version 1.0</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_pa_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_pa_IN">Panjabi Hyphenation Rules version 1.0</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_or_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_or_IN">Oriya Hyphenation Rules version 1.0</a></li>
<li> <a title="http://extensions.services.openoffice.org/project/hyph_mr_IN" rel="nofollow" href="http://extensions.services.openoffice.org/project/hyph_mr_IN">Marathi Hyphenation Rules version 1.0</a></li>
</ol>
<p><a href="http://extensions.services.openoffice.org/project/dict_ml_IN">Spellchecker extension for Malayalam</a> is also ready.</p>
<p>For a complete list of writing aids for Openoffice in Indic Languages is available <a href="http://wiki.services.openoffice.org/wiki/NLC/IndicGroup">here</a></p>
<p>Hyphenation Rules for Languages other than Marathi is already packages in Fedora 11. This releases contains updates and bug fixes. Fedora 12 will contains these updates. These extensions are yet to be packaged for Debian/Ubuntu.</p>
<p>More details about hyphenation rules are <a href="http://thottingal.in/blog/tag/hyphenation/">available here </a></p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2009/08/15/ooo_hyphenation_extensions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Project Silpa Updates</title>
		<link>http://thottingal.in/blog/2009/08/11/project-silpa-updates/</link>
		<comments>http://thottingal.in/blog/2009/08/11/project-silpa-updates/#comments</comments>
		<pubDate>Tue, 11 Aug 2009 10:47:04 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=212</guid>
		<description><![CDATA[[Please read the Silpa project annoucement  before reading this blogpost] Project silpa is getting ready for a 0.1 version. The web framework got many changes to support JSON based RPC calls from external applications. That means,  web/desktop applications can use the APIs of Silpa through RPC calls. Page rendering logic is moved from server to [...]]]></description>
			<content:encoded><![CDATA[<p>[Please read the <a href="http://thottingal.in/blog/2009/06/16/announcing-project-silpa/">Silpa project annoucement </a> before reading this blogpost]</p>
<p><a href="http://smc.org.in/silpa">Project silpa</a> is getting ready for a 0.1 version.</p>
<ol>
<li>The web framework got many changes to support <a href="http://json-rpc.org/wiki/python-json-rpc">JSON based RPC </a>calls from external applications. That means,  web/desktop applications can use the APIs of Silpa through RPC calls.</li>
<li>Page rendering logic is moved from server to client. Web interface use javascript based synchronous <a href="http://code.google.com/p/json-xml-rpc">JSON based RPC </a>calls to get the results from server. Jquery is used for render the page.</li>
<li>Uses <a href="http://entrian.com/PyMeld/">PyMeld </a>Templating Engine for modules having web interface(Not all modules will not have web interface)</li>
<li>Framework is now Python <a href="http://wsgi.org">WSGI </a>application. Initially it was plain CGI. WSGI reduces the response time and allows the server to be executed as daemon</li>
<li>Many new modules are getting added- <a href="http://smc.org.in/silpa/Spellcheck">Spellchecker </a>: which is not based on aspell or hunspell  and I am going to try out some algorithms to get optimal suggestions. Not completed.</li>
<li>Soundex Algorithm- webbased demo and APIs as I explained in my  <a href="http://thottingal.in/blog/2009/07/26/indicsoundex/">previous blog post</a></li>
<li><a href="http://smc.org.in/silpa/ApproxSearch">An Inexact search algorithm</a> and its implementation based on visual and phonetic distance between two words is getting ready. I will explain it in another blogpost</li>
<li>Hyphenation &#8211; <a href="http://smc.org.in/silpa/Hyphenate">Online tool </a>as well as APIs</li>
<li><a href="http://smc.org.in/silpa/NGram">N-gram for Indic languages</a>- API, web interface</li>
<li><a href="http://smc.org.in/silpa/apis.html">API documentation </a>is going on, but not completed. I have plans to make silpa as a python library for offline use too.</li>
<li>Moved from <a href="http://smc.org.in">SMC</a>&#8216;s git repo to a <a href="http://smc.org.in/silpa/source.html">seperate git repo</a>. After 0.1 baseline, I will create branches for stable and development.</li>
<li>Application is running on a git controlled deployment workflow. Thanks to <a href="http://joemaller.com">Joe Maller </a> for nice <a href="http://joemaller.com/2008/11/25/a-web-focused-git-workflow/">documentation on this</a>.</li>
</ol>
<p>That&#8217;s all for now!.  There are too many things to be done. Some of the modules does not support all languages as of now.  If anybody interested in contributing to the project, please contact me.  Try out the application, read the code and let me know your comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2009/08/11/project-silpa-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Announcing Project Silpa</title>
		<link>http://thottingal.in/blog/2009/06/16/announcing-project-silpa/</link>
		<comments>http://thottingal.in/blog/2009/06/16/announcing-project-silpa/#comments</comments>
		<pubDate>Tue, 16 Jun 2009 15:13:53 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[silpa]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=150</guid>
		<description><![CDATA[Many of my friends already know about a project I am working on,  this is a public announcement of that. The project is named as Silpa, may be an acronym of Swathanthra(Mukth, Free as in Freedom) Indian Language Processing Applications. It is a web framework and a set of applications for processing Indian Languages in [...]]]></description>
			<content:encoded><![CDATA[<p>Many of my friends already know about a project I am working on,  this is a public announcement of that.</p>
<p>The project is named as Silpa, may be an acronym of Swathanthra(Mukth, Free as in Freedom)  Indian Language Processing Applications. It is a web  framework and a  set of applications for processing Indian Languages in many ways. Or in other words, it is a platform for porting existing and upcoming language processing applications to the web.</p>
<p>Before going to the details, you can have a quick  preview of the application here : <a href="http://smc.org.in/silpa" target="_blank">http://smc.org.in/silpa</a></p>
<p>The project is designed for adding applications/utilities as plugins. The framework is written from scratch using python language. As you can see in the development version, there are number of modules already written.  Most of the modules requires some more work to make it _complete_. The application is free software and there is a link to the source code at the bottom of the application.</p>
<p>As it is meant for covering all languages of India, all modules should be capable of handling all scripts from India(Sometimes English too). At the same time , the language of input data is transparent , meaning, user need not mention that _this_ is the language in which she is entering the data. Unlike desktop applications which asks to specify the language along with the input data(for eg: Spell checker) , the modules should try to detect the language them self. And if possible, modules try to process the data even if the input data is in multiple Indic scripts.</p>
<p>The modules may be General purpose(eg: Dictionary, Spellcheck,Sort. Transliteration, Font conversion..) or Technology/Algorithm  Demonstration purpose (eg: Hyphenation, Stemmer, Search algorithms)</p>
<p>Some of the modules are usable  as of now, while some of them are in development. You may just try out them. User&#8217;s data will not be logged  except when a crash occurs(at that time user data and exception trace will be logged for later debugging).</p>
<p>And, this is also a call for contributors. You may propose new ideas for modules, feature suggestion etc.. A few  students showed interest in the project. Unfortunately python is not a language in their  college syllabus. So if you are good in python and have interest in contributing to the project, drop me a mail <img src='http://thottingal.in/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . There is no separate version for development and the one which is present at http://smc.org.in/silpa . All development happens there itself and any change in the code is immediately available for use!(or immediately starts crashing for user data)</p>
<p>I will write on some interesting algorithms I used for some modules later. If you are curious to know them, read the code!</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2009/06/16/announcing-project-silpa/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>&#8220;ക്ടാവ്&#8221; Slang converter തയാറാവുന്നു</title>
		<link>http://thottingal.in/blog/2009/03/31/slang-converter/</link>
		<comments>http://thottingal.in/blog/2009/03/31/slang-converter/#comments</comments>
		<pubDate>Wed, 01 Apr 2009 02:05:00 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Malayalam]]></category>
		<category><![CDATA[Projects]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/2009/03/31/%e0%b4%95%e0%b5%8d%e0%b4%9f%e0%b4%be%e0%b4%b5%e0%b5%8d-slang-converter-%e0%b4%a4%e0%b4%af%e0%b4%be%e0%b4%b1%e0%b4%be%e0%b4%b5%e0%b5%81%e0%b4%a8%e0%b5%8d%e0%b4%a8%e0%b5%81/</guid>
		<description><![CDATA[ചങ്ങാതിമാരേ, കേരളത്തിലെ രസകരമായ പ്രാദേശിക ഭാഷാ ഭേദങ്ങളെക്കുറിച്ചു് നിങ്ങള്‍ക്കെല്ലാമറിയാമല്ലോ? തിരുവനന്തപുരം, കോട്ടയം, തൃശ്ശൂര്‍, ഷൊര്‍ണ്ണൂര്‍, പാലക്കാട്, കോഴിക്കോട് കണ്ണൂര്‍, വയനാട് തുടങ്ങി നമുക്കു് വ്യത്യസ്തങ്ങളായ മലയാളത്തിന്റെ രൂപഭേദങ്ങളുണ്ടു്. അച്ചടി മലയാളത്തില്‍ നിന്നും വളരെയേറെ വ്യത്യസ്തമാണു് അവ. അച്ചടി മലയാളം കൊടുത്തു് സ്ഥലത്തിന്റെ പേരു കൊടുത്താല്‍ ആ പ്രദേശത്തെ മലയാളത്തിന്റെ രീതിയിലേക്കു അതിനെ മാറ്റിത്തരുന്ന ഒരു സോഫ്റ്റ്‌വെയര്‍ രസകരമാവില്ലേ? അത്തരത്തിലൊരു ശ്രമമാണു് &#8220;ക്ടാവ്&#8221; Slang converter എന്നു പേരിട്ടിരിക്കുന്ന പ്രൊജക്ട്. ഇതിന്റെ കൂടെ കൊടുത്തിരിക്കുന്ന സ്ക്രീന്‍ഷോട്ട് നോക്കൂ. ഡെവലപ്മെന്റ് [...]]]></description>
			<content:encoded><![CDATA[<div class="separator" style="clear: both; text-align: left;"><a href="http://4.bp.blogspot.com/_yXi4s2T6Sz4/SdLLcgwdFEI/AAAAAAAAAIQ/ALCTNXIIOoM/s1600-h/slangConvertor.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://4.bp.blogspot.com/_yXi4s2T6Sz4/SdLLcgwdFEI/AAAAAAAAAIQ/ALCTNXIIOoM/s400/slangConvertor.png" /></a></div>
<p>ചങ്ങാതിമാരേ, <br />കേരളത്തിലെ രസകരമായ പ്രാദേശിക ഭാഷാ ഭേദങ്ങളെക്കുറിച്ചു് നിങ്ങള്‍ക്കെല്ലാമറിയാമല്ലോ?  തിരുവനന്തപുരം, കോട്ടയം, തൃശ്ശൂര്‍, ഷൊര്‍ണ്ണൂര്‍, പാലക്കാട്, കോഴിക്കോട്  കണ്ണൂര്‍, വയനാട്  തുടങ്ങി നമുക്കു് വ്യത്യസ്തങ്ങളായ മലയാളത്തിന്റെ രൂപഭേദങ്ങളുണ്ടു്. അച്ചടി മലയാളത്തില്‍ നിന്നും വളരെയേറെ വ്യത്യസ്തമാണു് അവ. അച്ചടി മലയാളം കൊടുത്തു് സ്ഥലത്തിന്റെ പേരു  കൊടുത്താല്‍ ആ പ്രദേശത്തെ മലയാളത്തിന്റെ രീതിയിലേക്കു അതിനെ മാറ്റിത്തരുന്ന ഒരു സോഫ്റ്റ്‌വെയര്‍ രസകരമാവില്ലേ?</p>
<p>അത്തരത്തിലൊരു ശ്രമമാണു് &#8220;ക്ടാവ്&#8221; Slang converter എന്നു പേരിട്ടിരിക്കുന്ന പ്രൊജക്ട്. ഇതിന്റെ കൂടെ കൊടുത്തിരിക്കുന്ന സ്ക്രീന്‍ഷോട്ട് നോക്കൂ. ഡെവലപ്മെന്റ് പതിപ്പിന്റെ ചിത്രമാണതു്. കുറച്ചു നിയമങ്ങളുടെ അടിസ്ഥാനത്തില്‍ Natural Language Processing ന്റെ പുതിയ ശാഖയായ AMP(Ambiguous Language Processing)   എന്ന വിദ്യ ഉപയോഗിച്ചാണു് ഇതു ചെയ്തിരിക്കുന്നതു്. Qt/C++ ആണു് കോഡ്. UI ചെയ്യാന്‍ Qt Creator ഉപയോഗിച്ചു.</p>
<p>ഒരു മലയാളം ഫയലില്‍ പല സ്ലാങ്ങില്‍ തിരയാനുള്ള സംവിധാനവും തയ്യാറാക്കാന്‍ പദ്ധതിയുണ്ടു് . അതായതു് ഗഡി എന്നു തിരഞ്ഞാല്‍ സുഹൃത്തു് , ചങ്ങാതി എന്നൊക്കെ കിട്ടണം. പിന്നെ ഗഡി എന്നു സ്പെല്ലിങ്ങ് തെറ്റിച്ചെഴുതിയാല്‍ സുഹൃത്ത്, ചങ്ങാതി എന്നൊക്കെ സ്പെല്‍ചെക്കറില്‍ സൂചന വരാനുള്ള ഫീച്ചറും നമുക്കു് ചെയ്യണം. GPL V3 ലൈസന്‍സിലുള്ള ഈ അപ്ലിക്കേഷനു് ഇതിന്റെ നിയമങ്ങള്‍ വിപുലപ്പെടുത്താനും ടെസ്റ്റ് ചെയ്യാനും വിവിധ ജില്ലകളില്‍ താമസിക്കുന്നവരില്‍ നിന്നുള്ള സഹായം ആവശ്യമുണ്ടു്.</p>
<p>സഹകരിക്കുമല്ലോ.</p>
<p>അഭിപ്രായങ്ങളറിയിക്കുക.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2009/03/31/slang-converter/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>KDE Indic Screensavers</title>
		<link>http://thottingal.in/blog/2008/12/21/kde-indic-screensavers/</link>
		<comments>http://thottingal.in/blog/2008/12/21/kde-indic-screensavers/#comments</comments>
		<pubDate>Mon, 22 Dec 2008 04:10:00 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[hack]]></category>
		<category><![CDATA[kde]]></category>
		<category><![CDATA[screensaver]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=61</guid>
		<description><![CDATA[I ported all of the Matrix screensavers with Indian language glyphs to KDE4. For details about the screensavers please read: Hacking the GLMatrix screensaver Screensavers in your language Download the binary packages: Deb package, and RPM package There are 6 screensavers in that package, for Malayalam, Hindi, Oriya , Bengali, Tamil and Gujarati. After installation, [...]]]></description>
			<content:encoded><![CDATA[<p>I ported all of the Matrix screensavers with Indian language glyphs to KDE4. For details about the screensavers  please read:
<ul>
<li><a href="http://santhoshtr.livejournal.com/7078.html">Hacking the GLMatrix screensaver</a></li>
<li><a href="http://santhoshtr.livejournal.com/13439.html">Screensavers in your language</a></li>
</ul>
<p>
Download the binary packages: <a href="http://download.savannah.gnu.org/releases/smc/Screensaver/kscreensavers-indic-matrix_1.0.0.deb">Deb package</a>, and <a href="http://download.savannah.gnu.org/releases/smc/Screensaver/kscreensavers-indic-matrix-1.0.1-2.i386.rpm">RPM package</a>
</p>
<p>
There are 6 screensavers in that package, for Malayalam, Hindi, Oriya , Bengali, Tamil and Gujarati. After installation, goto KDE system settings->Desktop->Screensaver and select any of this.
</p>
<p>Screenshots(click to get the image in original size):<br/><br />
<a href="http://pics.livejournal.com/santhoshtr/pic/0000yg8c/"><img src="http://pics.livejournal.com/santhoshtr/pic/0000yg8c/s320x240" width="320" height="177" border='0'/></a><br />
<br/><br />
KDE Screensaver configuration for Hindi:<br />
<br/><br />
<a href="http://pics.livejournal.com/santhoshtr/pic/0000zdpy/"><img src="http://pics.livejournal.com/santhoshtr/pic/0000zdpy/s320x240" width="304" height="240" border='0'/></a><br/><br />
Enjoy&#8230;!</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2008/12/21/kde-indic-screensavers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->