<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Santhosh Thottingal &#187; Projects</title>
	<atom:link href="http://thottingal.in/blog/category/projects/feed/" rel="self" type="application/rss+xml" />
	<link>http://thottingal.in/blog</link>
	<description>/home/santhosh</description>
	<lastBuildDate>Sat, 10 Mar 2012 13:27:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>New version of Malayalam fonts released</title>
		<link>http://thottingal.in/blog/2012/03/10/new-version-of-malayalam-fonts-released/</link>
		<comments>http://thottingal.in/blog/2012/03/10/new-version-of-malayalam-fonts-released/#comments</comments>
		<pubDate>Sat, 10 Mar 2012 13:24:05 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[Malayalam]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[SMC]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[webfonts]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=390</guid>
		<description><![CDATA[Swathanthra Malayalam Computing project announced the release of new version of Malayalam unicode fonts this week. In this version, there are many improvements for popular Malayalam fonts Rachana and Meera. Dyuthi font has some bug fixes. I am listing the changes below. Meera font was small compared to other fonts. This was not really a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://smc.org.in" target="_blank">Swathanthra Malayalam Computing</a> project<a href="http://lists.smc.org.in/pipermail/discuss-smc.org.in/2012-March/013428.html" target="_blank"> announced the release</a> of new version of Malayalam unicode fonts this week. In this version, there are many improvements for popular Malayalam fonts Rachana and Meera. Dyuthi font has some bug fixes. I am listing the changes below.</p>
<ol>
<li>Meera font was small compared to other fonts. This was not really a problem in Gnome environment since <a href="http://www.freedesktop.org/software/fontconfig/" target="_blank">fontconfig</a> allows you to define a scaling factor to match other font size. But it was an issue in Libreoffice, KDE and mainly in Windows where this kind of scaling feature does not work. Thanks to<a href="http://suruma.freeflux.net/" target="_blank"> P Suresh</a> for a rework on glyphs and fixing this issue.</li>
<li>Rachana, Meera and Dyuthi had wrong glyphs used as placeholder glyphs. <a href="https://savannah.nongnu.org/bugs/?35098" target="_blank">Bugs</a> <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661898" target="_blank">like</a> these are fixed.</li>
<li>Virama 0D4D had a wrong LSB that cause the cursor positioning and glyph boundary go wrong. Fixed that <a href="https://bugzilla.redhat.com/show_bug.cgi?id=616324" target="_blank">bug</a><br />
<a href="http://thottingal.in/blog/wp-content/uploads/2012/03/meera-virama-cursor.png"><img class="alignnone size-full wp-image-393" title="meera-virama-cursor" src="http://thottingal.in/blog/wp-content/uploads/2012/03/meera-virama-cursor.png" alt="" width="147" height="63" /></a></li>
<li>Atomic Chilu code points introduced in Unicode 5.1 was missing in all the fonts that SMC maintained because of the controversial decision by Unicode and SMC&#8217;s stand against that. Issues still exist, but content with code point is present, to avoid any difficulties to users, added those characters to Meera and Rachana fonts.<br />
<a href="http://thottingal.in/blog/wp-content/uploads/2012/03/chillus.png"><img class="alignnone size-full wp-image-394" title="chillus" src="http://thottingal.in/blog/wp-content/uploads/2012/03/chillus.png" alt="" width="439" height="68" /></a></li>
<li>Rupee Symbols added to Meera and Rachana. Thanks to <a href="http://hiran.in" target="_blank">Hiran</a> for designing Sans and Serif glyphs for Rupee.<br />
<a href="http://thottingal.in/blog/wp-content/uploads/2012/03/rupee-meera.png"><img class="alignnone size-full wp-image-392" title="rupee-meera" src="http://thottingal.in/blog/wp-content/uploads/2012/03/rupee-meera.png" alt="" width="181" height="55" /></a></li>
<li>Dot Reph(0D4E) &#8211; The glyphs for this was already present in Meera but unmapped to any unicode point. GSUB Lookup tables added to the glyphs according to unicode specification.<br />
<a href="http://thottingal.in/blog/wp-content/uploads/2012/03/dotrepha.png"><img class="alignnone  wp-image-391" title="dotrepha" src="http://thottingal.in/blog/wp-content/uploads/2012/03/dotrepha.png" alt="" width="635" height="119" /></a></li>
</ol>
<p>For a more detailed change description see <a href="http://lists.smc.org.in/pipermail/discuss-smc.org.in/2012-February/013317.html " target="_blank">this</a> mail thread. There are some minor changes as well.</p>
<p>Thanks to Hussain K H (designer of both Meera and Rachana) , P Suresh, Hiran for their valuable contribution. And thanks to SMC community and font users for using the fonts and reporting bugs. We hope that we can bring this new version in your favorite GNU/Linux distros soon. Wikimedia&#8217;s <a href="https://www.mediawiki.org/wiki/Extension:WebFonts" target="_blank">WebFonts</a> extension uses Meera font and the font will be updated there soon. Next release of <a href="http://www.gnu.org/software/freefont/sources/" target="_blank">GNU Freefont </a>is expected to update Malayalam glyphs using Meera and Rachana for freefont-sans and freefont-serif font respectively. We plan to update other fonts we maintain also with these changes in next versions. There are still some glyphs missing in these fonts with respect to the latest unicode version.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2012/03/10/new-version-of-malayalam-fonts-released/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SVG Fonts</title>
		<link>http://thottingal.in/blog/2011/08/20/svg-fonts/</link>
		<comments>http://thottingal.in/blog/2011/08/20/svg-fonts/#comments</comments>
		<pubDate>Sat, 20 Aug 2011 15:56:53 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[css]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[svg]]></category>
		<category><![CDATA[webfonts]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=374</guid>
		<description><![CDATA[This post is some notes on the current state of SVG Fonts. SVG is not a webfont format. The purpose of SVG fonts is to be embedded inside of SVG documents  (or linked to them), similar to the way you would embed standard  TrueType or OpenType fonts in a PDF.  SVG fonts are text files [...]]]></description>
			<content:encoded><![CDATA[<p>This post is some notes on the current state of SVG Fonts.</p>
<p>SVG is not a webfont format. The purpose of SVG fonts is to be embedded inside of SVG documents  (or linked to them), similar to the way you would embed standard  TrueType or OpenType fonts in a PDF.  SVG fonts are text files that contain the glyph outlines represented  as standard SVG elements and attributes, as if they were single vector  objects in the SVG image. Unlike EOT, WOFF, TTF formats , SVG is plain text uncompressed file.</p>
<p>Even though it is not webfont format, some browsers will accept svg in the @fontface css3 declaration.</p>
<p>Firefox and IE does not support SVG Fonts. Here is the bug on Mozilla bugzilla about this with a lengthy discussion -<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=119490">https://bugzilla.mozilla.org/show_bug.cgi?id=119490</a>  This is one of the reason why Firefox does not score <a title="ACID3" href="http://en.wikipedia.org/wiki/Acid3" target="_blank">ACID3</a> test &#8211; <a href="http://www.itcode.org/why-firefox-is-not-scoring-100-in-the-acid3-test">http://www.itcode.org/why-firefox-is-not-scoring-100-in-the-acid3-test</a> . Developers argue that WOFF is sufficient and SVG Fonts does not give any advantage.  Support for SVG Fonts in the web development and font communities has been declining for some time. There’s already been discussion without objection of dropping SVG fonts from the Acid3 test. The community has put forth a proposal in the SVG Working Group to give SVG Fonts optional status.</p>
<p>Browser Support Matrix for SVG Fonts <a href="http://caniuse.com/#feat=svg-fonts">http://caniuse.com/</a><a href="http://caniuse.com/#feat=svg-fonts">#feat</a><a href="http://caniuse.com/#feat=svg-fonts">=svg-fonts</a>  IE and FF do not support it. Webkit based browsers support it &#8211; Chrome, Epiphany. For browsers not supporting svg features natively there is a flash based javascript library named svgweb <a href="http://code.google.com/p/svgweb/">http://code.google.com/p/svgweb/</a></p>
<p><span style="text-decoration: underline;">Limitations of svg fonts:</span></p>
<ul>
<li>Not all of the opentype features are available in SVG specification</li>
<li>For example Indic fonts require many opentype features for correct rendering &#8211; see opentype spec of Malayalam <a href="http://www.microsoft.com/typography/otfntdev/malayot/shaping.aspx">http://www.microsoft.com/typography/otfntdev/malayot/shaping.aspx</a></li>
<li>Even though SVG Fonts are support is available in some browsers, practically they cannot render SVG fonts for complex scripts such as Indic &#8211; Here is a sample svg file with Meera font defined in it &#8211; <a href="../../tests/svg/Meera-fontembedding.svg">http://thottingal.in/tests/svg/Meera-fontembedding.svg</a>. As you can see, rendering is wrong.</li>
<li>The main drawback to SVG fonts is there is no provision for font-hinting. The SVG standard states:<em> &#8220;SVG  fonts contain unhinted font outlines. Because of this, on many  implementations there will be limitations regarding the quality and  legibility of text in small font sizes. For increased quality and  legibility in small font sizes, content creators may want to use an  alternate font technology, such as fonts that ship with operating  systems or an alternate WebFont format. &#8211; <a href="http://www.w3.org/TR/SVG/fonts.html">http://www.w3.org/TR/SVG/fonts.html&#8221;</a></em></li>
</ul>
<p>There is an alternate proposal to use opentype features of the font and use svg just for the glyphs <a href="https://wiki.mozilla.org/SVGOpenTypeFonts">https://wiki.mozilla.org/SVGOpenTypeFonts</a></p>
<p>Fontforge can be used for creating SVG Fonts. But the created SVG font works only for simple scripts like Latin. Fails to export GPOS/GSUB tables to the SVG- bug report &#8211; <a href="http://sourceforge.net/mailarchive/message.php?msg_id=27964229">http://sourceforge.net/mailarchive/message.php?msg_id</a><a href="http://sourceforge.net/mailarchive/message.php?msg_id=27964229">=27964229</a> GSUB issue can be solved either by handcoding the unicode sequences for glyphs or by writing an external script. But , more important opentype features- Vowel sign(matra) reordering issues persists.</p>
<p>Eventhough svg fonts by defining font data inside svg itself does not seem to have much interest from developers, using webfonts inside for svg has some importance. Just like web pages, webfonts can be used to render the text inside the svg. The webfont format depends on the browser. Example: <a href="http://thottingal.in/tests/svg/Dyuthi-Webfont.svg">http://thottingal.in/tests/svg/Dyuthi-Webfont.svg</a> (Have a look at the source code of the file)</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2011/08/20/svg-fonts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Malayalam Wikisource Offline version</title>
		<link>http://thottingal.in/blog/2011/06/11/malayalam-wikisource-offline-version/</link>
		<comments>http://thottingal.in/blog/2011/06/11/malayalam-wikisource-offline-version/#comments</comments>
		<pubDate>Sat, 11 Jun 2011 09:11:38 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[Indic]]></category>
		<category><![CDATA[Malayalam]]></category>
		<category><![CDATA[Misc]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=358</guid>
		<description><![CDATA[Malayalam Wikisource community today released the first offline version of Malayalam wikisource during the 4th annual wiki meetup of Malayalam wikimedians. To the  best of our knowledge, this is the first time a wikisource project release its offline version. Malayalam wiki community had released the first version of Malayalam wikipedia one year back. Releasing the [...]]]></description>
			<content:encoded><![CDATA[<p>Malayalam Wikisource community today released the first offline version of <a href="http://ml.wikisource.org" target="_blank">Malayalam wikisource</a> during the 4th annual wiki meetup of Malayalam wikimedians. To the  best of our knowledge, this is the first time a wikisource project release its offline version. Malayalam wiki community <a href="http://thottingal.in/blog/2010/04/17/mlwikioncd/" target="_blank">had released</a> the first version of Malayalam wikipedia one year back.</p>
<p>Releasing the offline version of a wikisource is a challenging project. The technical aspects of the project was designed and implemented by myself. So let me share the details of the project.</p>
<p>As you know a Wikisource contains lot of books, and each book varies in its size, it is divided to chapters or sections. There is no common pattern for books. Each having its own structure. A novel presentation is different from a collection of poems from a Poet. Wikisource also has religious books like Bible, Quran, Bhagavat Geeta, Ramayana etc.  Since books are for continuous reading for a long time, the readabilty and how we present the lengthy chapters in screen also matters. Offline wikipedia tools for example, <a href="http://www.kiwix.org/" target="_blank">Kiwix</a> does not do any layout modification of the content and present as it is shown in wikipedia/wikisource. <a href="https://github.com/santhoshtr/wiki2cd" target="_blank">The tool</a> we wrote last year for Malayalam wikipedia offline version also present scrollable vertical content in the screen. Both are not configurable to give different presentation styles depending on the nature of the book.</p>
<p>What we wanted is a book reader kind of application interface.  Readers should be able to easily navigate to books, chapters. The chapter content will be very lengthy. For a long time reading of this content,  a lengthy vertically scrolled text is not a good idea. We also need to take care of the width of the lines.  If each line spans 80-90% of the screen, especially for a wide screen monitor, it is a strain for neck and eyes.</p>
<p>&nbsp;</p>
<div id="attachment_361" class="wp-caption aligncenter" style="width: 405px"><a href="http://thottingal.in/blog/wp-content/uploads/2011/06/2011-06-09-19-29-211.png"><img class="size-large wp-image-361" title="2011-06-09-19-29-21" src="http://thottingal.in/blog/wp-content/uploads/2011/06/2011-06-09-19-29-211-1024x455.png" alt="" width="395" height="175" /></a><p class="wp-caption-text">Screenshot of Offline version. Click to enlarge</p></div>
<p style="text-align: center;"><a href="http://thottingal.in/blog/wp-content/uploads/2011/06/2011-06-09-19-29-21.png"><br />
</a></p>
<p>The selection of books for the offline version was done by the active wikimedians at Wiksource. Some of the selected books was proof read by many volunteers within the last  2 weeks.</p>
<p>The tools used for extracting htmls were adhoc and adapted to meet the good presentation of each book. So there is nothing much to reuse here. Extracting the html and then taking the content part alone using pyquery and removing some unwanted sections from html- basically this is what our scripts did. The content is added to predefined HTML templates with proper CSS for the UI. CSS3 multicolumn feature was used for book like interface. Since IE did not implement this standard even in IE9, for that browser the book like interface was not provided. Chrome browser with version less than 12 could not support, because of these bugs: <a href="http://code.google.com/p/chromium/issues/detail?id=45840">http://code.google.com/p/chromium/issues/detail?id</a><a href="http://code.google.com/p/chromium/issues/detail?id=45840">=45840</a> and <a href="http://code.google.com/p/chromium/issues/detail?id=78155">http://code.google.com/p/chromium/issues/detail?id</a><a href="http://code.google.com/p/chromium/issues/detail?id=78155">=78155</a>. For easy navigation, mouse wheel support and page navigation buttons provided. For solving non-availability of required fonts, webfonts were integrated with a selection box  to select favorite font. Reader can also select the font size to make the reading comfortable.</p>
<p>Why static html? The variety of platforms and other versions we need to support, necessity to have webfonts, complex script rendering, effort to develop and customize UI, relatively small size of the data, avoiding any installation of software in users system etc made us to choose static html+ jquery + css as the technology choice. The downside is we could not provide full text search.</p>
<p>Apart from the wikisource, we also included a collection of copyleft of images from wikimedia commons. Thanks to <a href="http://nishan-naseer.blogspot.com/" target="_blank">Nishan Naseer</a>, for preparing a gallery application using jquery. We selected 4 categories from Commons which are related to Kerala. We hope everybody will like the pictures and it will give  a small introduction to Wikimedia Commons.<br />
<a href="http://thottingal.in/blog/wp-content/uploads/2011/06/2011-06-11-09-22-06.png"><img class="aligncenter size-large wp-image-364" title="2011-06-11 09-22-06" src="http://thottingal.in/blog/wp-content/uploads/2011/06/2011-06-11-09-22-06-1024x474.png" alt="" width="453" height="209" /></a><br />
Even though the python scripts are not ready to reuse in any projects, if anybody want to have a look at it, please mail me. I am not putting it in public since the script does not make sense outside the context of each book and its existing presentation in Malayalam wikisource.</p>
<p>The CD image is available for download <a href="http://www.mlwiki.in/cdimage/mlwikisource.iso" target="_blank">here</a> and one can also browse the CD content <a href="http://www.mlwiki.in/wikisrccd" target="_blank">here</a>.</p>
<p>Thanks to Shiju Alex for coordinating this project. And thanks to all Malayalam wikisource volunteers for making this happen.  We have included poems, folk songs, devotional songs, novel, grammar book, tales, books on Hinduism, Islam-ism, Christianity, Communism, Philosophy. With this release, it becomes the biggest offline digital archive of Malayalam books.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2011/06/11/malayalam-wikisource-offline-version/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Mediawiki Berlin hackathon</title>
		<link>http://thottingal.in/blog/2011/05/17/mediawiki-berlin-hackathon/</link>
		<comments>http://thottingal.in/blog/2011/05/17/mediawiki-berlin-hackathon/#comments</comments>
		<pubDate>Tue, 17 May 2011 16:16:36 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=353</guid>
		<description><![CDATA[I am just back from Mediawiki Berlin Hackathon. On May 13 to 15, Mediawiki developers attended the hackathon and squashed many bugs and discussed many features. Members of language committee had its first real-life meeting in parallel with hackathon. It was a nice event, learned a lot, talked to many awesome hackers and linguists. Milos [...]]]></description>
			<content:encoded><![CDATA[<p>I am just back from <a href="http://www.mediawiki.org/wiki/Berlin_Hackathon_2011">Mediawiki Berlin Hackathon</a>. <a href="http://commons.wikimedia.org/wiki/File:Wikimedia_Hackathon_Berlin_2011_group_photo.jpg"><img class="size-medium wp-image-4184 alignright" title="Group photo at the Berlin Hackathon 2011" src="http://blog.wikimedia.org/wp-content/uploads/2011/05/Wikimedia_Hackathon_Berlin_2011_group_photo-300x143.jpg" alt="" width="300" height="143" /></a>On May 13 to 15, Mediawiki developers attended the hackathon and squashed many bugs and discussed many features. Members of <a href="http://meta.wikimedia.org/wiki/Language_committee">language committee</a> had its first real-life meeting in parallel with hackathon. It was a nice event, learned a lot, talked to many awesome hackers and linguists.</p>
<ul>
<li><a title="User:Millosh" href="http://meta.wikimedia.org/wiki/User:Millosh">Milos Rancic</a> has written a summary of the discussions happened during language committee meeting here : <a href="http://lists.wikimedia.org/pipermail/foundation-l/2011-May/065537.html">http://lists.wikimedia.org/pipermail/foundation-l/2011-May/065537.html</a></li>
<li><a href="http://translatewiki.net/wiki/User:Nike">Niklas Laxström</a> and <a href="http://en.wikipedia.org/wiki/User:Siebrand">Siebrand</a> reviewed the <a href="http://www.mediawiki.org/wiki/Extension:WebFonts">WebFonts extension</a> and enabled at <a href="http://translatewiki.net/">translatewiki.net</a>. Fixed a few bugs that Niklas reported on the extension.</li>
<li><a href="http://en.wikipedia.org/wiki/User:Purodha">Purodha Blissenbach </a>was very much interested in the WebFonts and Narayam extensions, we discussed some of the features we need to add. We have it here: <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=28900">Bug 28900</a> , <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=28999">Bug 28999</a> and <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=29000">Bug 29000</a></li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2011/05/17/mediawiki-berlin-hackathon/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Creating a new Language ecosystem- Sourashtra as example</title>
		<link>http://thottingal.in/blog/2011/05/07/language-ecosystem-sourashtra/</link>
		<comments>http://thottingal.in/blog/2011/05/07/language-ecosystem-sourashtra/#comments</comments>
		<pubDate>Sat, 07 May 2011 06:31:39 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[glibc]]></category>
		<category><![CDATA[sourashtra]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=347</guid>
		<description><![CDATA[Sourashtra is a language spoken by Sourashtra  people living in South Tamilnadu and Gujarat of India. Originated from Brahmi and then Grandha, this language is mother tongue for half a million people. But most of them are not familiar with the script of this language. Very few people knows reading and writing on Sourashtra script. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Saurashtra_language" target="_blank">Sourashtra</a> is a language spoken by Sourashtra  people living in South Tamilnadu and Gujarat of India. Originated from Brahmi and then Grandha, this language is mother tongue for half a million people. But most of them are not familiar with <a href="http://en.wikipedia.org/wiki/Saurashtra_script" target="_blank">the script</a> of this language. Very few people knows reading and writing on Sourashtra script. Sourashtra has a ISO 639-3 language code saz and  Unicode range  U+A880 &#8211; U+A8DF</p>
<p>Recently Sourashtra wikipedia project was started in the wikimedia incubator : <a href="http://incubator.wikimedia.org/wiki/Wp/saz" target="_blank">http://incubator.wikimedia.org/wiki/Wp/saz</a> and Mediawiki localization <a href="http://ultimategerardm.blogspot.com/2011/03/saurashtra-language-from-india-new-to.html" target="_blank">started in translatewiki</a> Since the language did not had any proper fonts or input tools, this was not going well.</p>
<p>When we add a  new language support in Mediawiki or start a new language wikipedia,  we need to develop the language technology ecosystem to support its growth. This ecosystem comprises of Unicode code points for the script, proper fonts, rendering support,  input tools, availability of these fonts and input tools in operating systems or alternate ways to get it working in operating system etc.</p>
<p>Sourashtra language had a unicode font developed by<a href="http://www.khenikeri.com/" target="_blank"> Prabu M Rengachari</a>, named &#8216;Sourashtra&#8217; itself. The font <a href="http://khenikeri.blogspot.com/2011/01/test-sourashtra-unicode-font-versions.html" target="_blank">had problems</a> with browsers/operating systems. We fixed to make it work properly. The font was not licensed properly. Prabu agreed to release it in <a href="http://www.gnu.org/licenses/gpl-3.0.txt" target="_blank">GNU GPLV3</a> license with<a href="http://www.gnu.org/licenses/gpl-faq.html#FontException" target="_blank"> font exception</a>. He also agreed to rename the font to another name other than the script name itself.</p>
<p>The font was <a href="http://khenikeri.blogspot.com/2011/04/pagul-web-font.html" target="_blank">renamed to Pagul</a>, meaning &#8216;Footstep&#8217; in Sourashtra and <a href="https://sourceforge.net/projects/pagul/" target="_blank">hosted in sourceforge</a></p>
<p>Once we have a font with proper license, we wanted it to be available in operating systems. I filed a<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=623944" target="_blank"> packaging request</a> in Debian. <a href="http://blog.copyninja.info/" target="_blank">Vasudev Kamath</a> of Debian India Team packaged it and now it is available in<a href="http://packages.debian.org/sid/fonts-pagul" target="_blank"> debian unstable</a>(sid).  Parag Nemade of Fedora India <a href="https://bugzilla.redhat.com/show_bug.cgi?id=699587" target="_blank">packaged the font for Fedora</a> and will be avialable in upcoming Fedora 15.</p>
<p>To add a new language support in operating system, we need <a href="http://en.wikipedia.org/wiki/Locale" target="_blank">a locale definition</a>. In GNU Linux this is GLibc locale definition. With the help of Prabu, I prepared the saz_IN locale file for glibc, and filed as <a href="https://bugzilla.redhat.com/show_bug.cgi?id=698346" target="_blank">bug report to add to glibc</a>. I hope, soon it will be part of Glibc.</p>
<p>Well, all of these was possible since it was GNU/Linux or Free software. Things are a bit difficult on the other side, proprietary operating system world. There is nothing we can do with those operating systems. Since there is no &#8216;market&#8217; for these minority language, it won&#8217;t come to the priority of those companies to add support for these languages. Users will see squares or question marks when they visit sourashtra wikipedia.</p>
<p>We are working on a solution for this, not only for sourashtra, but a common solution for all languages. We are developing a webfonts extension for Mediawiki to provide font embedding in wiki pages to avoid the necessity of having fonts installed in user&#8217;s computers. The extension is <a href="http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/WebFonts" target="_blank">in development</a> and one can preview it in <a href="http://thottingal.in/wiki/" target="_blank">my test wiki</a>. For Sourashtra, we added webfonts support(<a href="http://thottingal.in/wiki/index.php?title=Sourashtra&amp;setlang=saz" target="_blank">preview</a>) .</p>
<p>Input tools needs to be developed and packaged. For mediaiwki, with the help of Narayam extension, we can easily add this support.</p>
<p>With the <a href="http://silpa.org.in" target="_blank">silpa project</a>, I added a server side, PDF/PNG/SVG <a href="http://silpa.org.in/Render" target="_blank">rendering support</a> for Sourashtra as well.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2011/05/07/language-ecosystem-sourashtra/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Cross Language Approximate Search on Indic Languages- A demo</title>
		<link>http://thottingal.in/blog/2011/04/03/cross-language-approximate-search-on-indic-languages-a-demo/</link>
		<comments>http://thottingal.in/blog/2011/04/03/cross-language-approximate-search-on-indic-languages-a-demo/#comments</comments>
		<pubDate>Sun, 03 Apr 2011 11:27:39 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[silpa]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=335</guid>
		<description><![CDATA[A demo of cross language approximate search in Indic text: The Malayalam word സാമ്പാര്‍ is compared against a paragraph from http://ml.wikipedia.org/wiki/Sambar. In the bottom half, words marked in yellow color are search results. You can see that a Kannada word ಸಾಂಬಾರ್‍ is matched for Malayalam word. And that is why this is called cross-language. The [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">A demo of cross language approximate search in Indic text:<br />
<a href="http://thottingal.in/images/silpaappoximatesearch-demo-1.png"><img class="aligncenter" src="http://thottingal.in/images/silpaappoximatesearch-demo-1.png" alt="click to enlarge" width="NaN" height="NaN" /></a><br />
The Malayalam word സാമ്പാര്‍ is compared against a paragraph from <a href="http://ml.wikipedia.org/wiki/Sambar">http://ml.wikipedia.org/wiki/Sambar</a>.<br />
In the bottom half,  words marked in yellow color are search results.<br />
You can see that a Kannada word ಸಾಂಬಾರ್‍ is matched for Malayalam word. And that is why this is called cross-language.<br />
The inflections of the words സാമ്പാര്‍ &#8211; സാമ്പാറും, സാമ്പാറു  etc are also found as results.<br />
This is the kind of search we need in Indic languages, not just the letter by letter comparison we do for English.</p>
<p style="text-align: center;">Another example showing all inflection forms of the noun പാലക്കാട്, and the same word written in Tamil, Telugu, Hindi. The search shows the results in those languages too. &#8211; <a href="http://thottingal.in/images/silpaappoximatesearch-demo-2.png"><img class="aligncenter" src="http://thottingal.in/images/silpaappoximatesearch-demo-2.png" alt="click to enlarge" width="NaN" height="NaN" /></a></p>
<p>You can try it here: <a href="http://silpa.org.in/ApproxSearch">http://silpa.org.in/ApproxSearch</a></p>
<p>This is a <a href="http://en.wikipedia.org/wiki/Fuzzy_string_searching">Fuzzy string search</a> application. This application illustrates the combined use of          <a href="http://en.wikipedia.org/wiki/Levenshtein_distance">Edit distance</a> and <a href="http://silpa.org.in/Soundex">Indic Soundex </a> algorithm.</p>
<p>By mixing both written like(edit distance) and sounds like(soundex), we achieve an efficient aproximate string searching. This application is capable of cross language string search too. That means, you can search Hindi words in Malayalam text. If there is any Malayalam word, which is approximate transliteration of hindi word, or sounds alike the Hindi words, it will be returned as an approximate match. The &#8220;written like&#8221; algorithm used here is a bigram average algorithm. The ratio of common bigrams in two strings and average number of bigrams will give a factor which is greater than zero and less than 1. Similarly the soundex algorithm also gives a weight. By selecting words which has comparison weight more than the threshold weight(which 0.6), we get the search results.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2011/04/03/cross-language-approximate-search-on-indic-languages-a-demo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dictionary Jabber Buddy Bots</title>
		<link>http://thottingal.in/blog/2010/11/20/dictionary-jabber-buddy-bots/</link>
		<comments>http://thottingal.in/blog/2010/11/20/dictionary-jabber-buddy-bots/#comments</comments>
		<pubDate>Sat, 20 Nov 2010 17:24:05 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Indic]]></category>
		<category><![CDATA[Malayalam]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[SMC]]></category>
		<category><![CDATA[bots]]></category>
		<category><![CDATA[dictionary]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[xmpp]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=308</guid>
		<description><![CDATA[Recently we released two Jabber buddy bots for dictionary lookup. By adding eng.mal.dict@gmail.com as a chat contact one can ask for the meaning of an English word in Malayalam by just sending a chat message. Similarly for English-Hindi or Hindi-English dictionary, we have another bot eng.hin.dict@jabber.org. Both of these dictionaries use Dict databases based on  [...]]]></description>
			<content:encoded><![CDATA[<p>Recently we released two Jabber buddy bots for dictionary lookup. By adding eng.mal.dict@gmail.com as a chat contact one can ask for the meaning of an English word in Malayalam by just sending a chat message. Similarly for English-Hindi or Hindi-English dictionary, we have another bot eng.hin.dict@jabber.org. Both of these dictionaries use Dict databases based on  <a title="DICT" href="http://en.wikipedia.org/wiki/DICT" target="_blank">DICT protocol</a>.</p>
<p>Both of these bots were well received  by the users. We have 8000+ users for English-Malayalam Dictionary.  Online blogs/media also gave good publicity. Thanks a lot!.</p>
<p><a title="Swathanthra Malayalam Computing" href="http://smc.org.in" target="_blank">SMC </a>developers Rajeesh Nambiar, Ershad, Ragsagar, and  Sarath Lakshman had helped in improving the program. You can get the source code from <a href="http://git.savannah.gnu.org/cgit/smc.git/tree/bots" target="_blank">here</a>. It is a small program written using python XMPP library.</p>
<p>We had written this programs one year back, 2009 december itself. We could not launch them for public since we did not had a server to host them.  Usually webhosting providers wont allow to run programs like this in their servers. Recently <a href="netdotnet.com" target="_blank">netdotnet.com</a> provided a VPS server for SMC and we could launch them from that server.</p>
<p>English-Hindi dictionary is reasonably big, but English-Malayalm is very small with only ~10k words. So we just added a Malayalam Wiktionary backend for the bot.</p>
<p>Here is a video on how to use English-Hindi bot prepared by  <a href="http://varunverma.org/blogs/translate-inside-your-google-chat-window/" target="_blank">Varun Verma </a></p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/1znJAHisf5M&amp;feature" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/1znJAHisf5M&amp;feature"></embed></object></p>
<ul>
<li>An article about English Malayalam bot in Epathram.com <a href="http://epathram.com/column-itsit/11/03/225654-english-malayalam-dictionary-in-google-chat.html" target="_blank">here. </a></li>
<li>A blog post by Sailesh in Hindi <a href="http://emadad.hindyugm.com/2010/10/know-hindi-meanings-while-chatting.html" target="_blank">http://emadad.hindyugm.com/2010/10/know-hindi-meanings-while-chatting.html</a></li>
</ul>
<p>We can start this kind of bot for other languages too, if we have dictionaries with Free S/w compatible licenses. If interested, please contact me.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2010/11/20/dictionary-jabber-buddy-bots/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wikimania 2010, Poland</title>
		<link>http://thottingal.in/blog/2010/07/17/wikimania-2010-poland/</link>
		<comments>http://thottingal.in/blog/2010/07/17/wikimania-2010-poland/#comments</comments>
		<pubDate>Sat, 17 Jul 2010 09:25:44 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=276</guid>
		<description><![CDATA[I left Chennai on Wednesday(8th) and reached Frankfurt airport on Thursday morning. Rest of the people from India for wikimania- Shiju Alex, Tinu Cherian, Srinivas Gunta, Arjun Rao  were already reached the airport and I joined them. We reached Gdansk Airport by 12.30 PM. Our accommodation was at a students hostel of Gdansk University.  Language [...]]]></description>
			<content:encoded><![CDATA[<p>I left Chennai on Wednesday(8th) and reached Frankfurt airport on Thursday morning. Rest of the people from India for wikimania- Shiju Alex, Tinu Cherian, Srinivas Gunta, Arjun Rao  were already reached the airport and I joined them. We reached Gdansk Airport by 12.30 PM. Our accommodation was at a students hostel of Gdansk University.  Language was a big issue since most of the people does not understand English and only know Polish Language. The lady at the reception of the hostel we stayed was using Google translate tool for communicating with us.  The <a href="http://en.wikipedia.org/wiki/Gda%C5%84sk">Gdansk city</a> is a very beautiful city with streets of  big brick made tall buildings.</p>
<p>The conference started on Friday morning. Sue Gardner, Executive Director of the Wikimedia Foundation. talked about the strategies of foundation, and it followed by a QnA with wikimedia board members. We presented the Malayalam CD to Sue Gardner. She remembered the International free software conference she attended at Trivandum on 2008 december.<br />
Our workshop on offline wikipedia versions started at 2.30. Martin Walked introduced the workshop to participants. Manuel Schneider from German wikipedia explained the Openzim format for offline compressed storage of wikipedia and the available readers on desktop computers and mobile phones. Shiju Alex introduced the Malayalam wikipedia offline verision 1.0. I talked about the issues and solutions for providing an offline version, particularly a non-latin complex script wiki to users in CD ROM or DVD. I demonstrated sample offline wikis on Hebru, Tamil, Polish, English, Japanese with the wiki2cd tool. There were a couple of questions on wiki2cd and openzim. Kul Takanao Wadhwa and Tomasz Finc  from wikimedia foundation who are focusing on offline wiki projects attended the workshop and we had a discussion after the talk.</p>
<div id="attachment_278" class="wp-caption alignright" style="width: 310px"><a href="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5132.jpg"><img class="size-medium wp-image-278 " title="Offline wikipedia people: myself, Shiju Alex, Martin Walker, Manuel Schneider" src="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5132-300x225.jpg" alt="Offline wikipedia people: myself, Shiju Alex, Martin Walker, Manual Schneider" width="300" height="225" /></a><p class="wp-caption-text">Offline wikipedia people: myself, Shiju Alex, Martin Walker, Manuel Schneider</p></div>
<p>The offline wiki workshop continued with Pediapress team. They are the people behind the recently added export book/PDF feature of wikipedia. Unfortunately this feature require lots of improvements to get work with Indian scripts.<br />
We met Gerald M, who focus on language support of wikis. He is such a person with amazing passion towards our local language wikipedias. We discussed on the technical issues of the local language wikis. Siebrand joined the discussion and he pointed out some improvements in wiki2cd software.</p>
<div id="attachment_277" class="wp-caption alignleft" style="width: 310px"><a href="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5074.jpg"><img class="size-medium wp-image-277  " title="discussion with Siebrand, Gerard" src="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5074-300x225.jpg" alt="" width="300" height="225" /></a><p class="wp-caption-text">Discussion with Siebrand on wiki2cd improvements. From left 2 right: Tinu Cherian, myself, Gerard M, Siebrand</p></div>
<p>On the second day I met Volker Haas, the developer of PDF export/books feature of wikipedia. The library used by the extension of creating PDFs is Reportlab. But it does not support complex scripts such as Indic or Arabic. We have a long discussion on possible solutions. Discussed the Reportlab code. the mwlib code, and the s/w which I am writing now  a days to provide complex script pdf rendering APIs. We will continue to try out some of the options we have to solve this issue soon.</p>
<p>Martin Walker, who presented the Article Selection process of English wikipedia along with us in the workshop  invited me and Shiju for a dinner with his family. And we went for that.</p>
<p>The third day started with plenary session by Jimmy Wales. He talked about small language wikipedia and the issues faced by them. He emphasized the need for offline versions of wikipedia to reach more number of people and talked about the Malayalam Wikipedia offline version.</p>
<div id="attachment_278" class="wp-caption alignright" style="width: 310px"><a href="http://thottingal.in/blog/wp-content/uploads/2010/07/IMG_5132.jpg"><img class="size-medium wp-image-278 " title="Jimmy Wales with Malayalam wikipedia CD" src="http://upload.wikimedia.org/wikipedia/commons/4/4d/2010-07-11-gdansk-by-RalfR-001.jpg" alt="Jimmy Wales with Malayalam wikipedia CD" width="300" height="225" /></a><p class="wp-caption-text">Jimmy Wales with Malayalam wikipedia CD</p></div>
<p>Mayooranathan from Tamil wikipedia presented the issues and statistics of Tamil Wikipedia Community. On Monday and Tuesday,  we spent time by roaming around the Old Town of Gdansk. Visited <a href="http://en.wikipedia.org/wiki/St._Mary%27s_Church,_Gda%C5%84sk">St. Marys Church</a> , the biggest brick made church in the world. We climbed the 400 steps of the tower of the church. From the top of the chruch, one can see the entire city. We went to the Baltic sea beach -Westerplatte on a boat and visited <a href="http://www.bfr.pl/index.php?option=com_content&amp;task=view&amp;id=55&amp;Itemid=39">Wisłoujście Fortress</a></p>
<h2>Related posts:</h2>
<p>* Creating Malayalam Wikipedia CD: <a href="http://shijualex.wordpress.com/2010/04/24/creating-malayalam-wikipedia-cd/" target="_blank">http://shijualex.wordpress.com/2010/04/24/creating-malayalam-wikipedia-cd/</a><br />
* Wiki2cd:<a href="http://thottingal.in/blog/2010/04/17/mlwikioncd/" target="_blank"> http://thottingal.in/blog/2010/04/17/mlwikioncd/</a><br />
* Wikipedia Sign post: <a href="http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-04-19/News_and_notes#Briefly" target="_blank">http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-04-19/News_and_notes#Briefly</a><br />
* Gerard&#8217;s Blog: <a href="http://ultimategerardm.blogspot.com/2010/04/best-of-malayalam-wikipedia.html" target="_blank">http://ultimategerardm.blogspot.com/2010/04/best-of-malayalam-wikipedia.html</a><br />
* Gerard&#8217;s Blog: <a href="http://ultimategerardm.blogspot.com/2010/04/cd-dowloaded-4390-in-10-days.html" target="_blank">http://ultimategerardm.blogspot.com/2010/04/cd-dowloaded-4390-in-10-days.html</a><br />
* Gerard&#8217;s Blog: <a href="http://ultimategerardm.blogspot.com/2010/07/malayalamwikipedia-success-story.html" target="_blank">http://ultimategerardm.blogspot.com/2010/07/malayalamwikipedia-success-story.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2010/07/17/wikimania-2010-poland/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Attending Wikimania 2010</title>
		<link>http://thottingal.in/blog/2010/07/06/wikimania-2010/</link>
		<comments>http://thottingal.in/blog/2010/07/06/wikimania-2010/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 09:32:45 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[Indic]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=273</guid>
		<description><![CDATA[I will be attending  Wikimania 2010,  Gdansk, Poland.  This annual international conference of the Wikimedia community is from July 9 to July 11. I will be presenting wik2cd, the tool I wrote for Malayalam wikipedia version 1.0 there in a joint workshop with wikipedia offline developers.  I will be joining with Manuel Schneider,  Shiju Alex, [...]]]></description>
			<content:encoded><![CDATA[<p>I will be attending  <a href="http://wikimania2010.wikimedia.org" target="_blank">Wikimania 2010,  Gdansk, Poland</a>.  This annual international conference of the Wikimedia community is from July 9 to July 11.</p>
<p>I will be presenting wik2cd, the tool I wrote for Malayalam wikipedia version 1.0 there in a joint workshop with wikipedia offline developers.  I will be joining with Manuel Schneider,  Shiju Alex, Martin Walker in the workshop titled: <a title="Submissions/Creating offline version of Wiki content - Solutions  and Challenges" href="http://wikimania2010.wikimedia.org/wiki/Submissions/Creating_offline_version_of_Wiki_content_-_Solutions_and_Challenges">Creating offline version of Wiki content – Solutions  and Challenges. </a>Apart from this, I will be meeting <a href="http://code.pediapress.com/" target="_blank">pediapress team</a>, the team behind wikipedia&#8217;s latest <a href="http://en.wikipedia.org/wiki/Help:Books" target="_blank">PDF/Book export feature</a>. There are some issues in this tool for working with Indic languages, mainly because of the PDF rendering engine not capable of rendering complex scripts.</p>
<p>Thanks to <a href="http://www.wikimediafoundation.org/" target="_blank">Wikimedia foundation</a> for granting me a scholarship to cover travel expenses.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2010/07/06/wikimania-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Malayalam Wikipedia releases selected articles on CD</title>
		<link>http://thottingal.in/blog/2010/04/17/mlwikioncd/</link>
		<comments>http://thottingal.in/blog/2010/04/17/mlwikioncd/#comments</comments>
		<pubDate>Sat, 17 Apr 2010 05:03:47 +0000</pubDate>
		<dc:creator>Santhosh</dc:creator>
				<category><![CDATA[Malayalam]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://thottingal.in/blog/?p=250</guid>
		<description><![CDATA[As part of Malayalam Wikipedia Meetup 2010 , today  Malayalam wikipedia releases 500 selected articles on a CD ROM. This is the first time in India, a Wikipedia on local language releasing its articles for offline usage. I handled the technology part  of the project. The idea was to get the selected articles in static [...]]]></description>
			<content:encoded><![CDATA[<p>As part of <a href="http://ml.wikipedia.org/wiki/Meetup-2010_April" target="_blank">Malayalam Wikipedia Meetup 2010</a> , today  Malayalam wikipedia releases 500 selected articles on a CD ROM. This is the first time in India, a Wikipedia on local language releasing its articles for offline usage. I handled the technology part  of the project.</p>
<p>The idea was to get the selected articles in static form to the CD. But this is not easy as we imagine. It is not like saving each  page from browser to the local machine. Following were the challenges:</p>
<ul>
<li>Automate the process of getting the page and the images in it. Wikipedia articles changes frequently. So we need the program to fetch the latest article from wiki whenever it is executed.</li>
<li>Fix all the links, css, javascript, image references so that all resolves within CD itself</li>
<li>Provide an categorized index of the articles for easily locating the article.</li>
<li>Provide a search in the article titles.</li>
<li>ISO 9660 filesystem of CD/DVD has lots of limitations. There are restrictions on unicode names of the files, length of the file names, directory depth, special characters in filenames etc. Wikipedia has its article and image names with unicode, special characters and most of the time they exceeds the filename length. To avoid all these, we should rename most of the files and then fix the cross references in all files.</li>
<li>It should work on all Operating systems. All the content should be presented with HTML, Javascript and CSS. Being the content in Malayalam, even if the user does not have required fonts in her/his machine, there should not be any problem for reading the content(font embedding required).</li>
</ul>
<p>Manually solving all these challenges is not the way to go. So I wrote a program, which just takes the article titles and does all the above tasks and finally creates a repository ready for burning to CD ROM.</p>
<p>Wget disappointed me in fetching the content from wiki. There is an <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=411290" target="_blank">open bug</a> in wget which make the download of non-latin URLs impossible.</p>
<p>Have a look at the CD content we created : <a href="http://thottingal.in/projects/mlwikioncd/wiki/" target="_blank">Malayalam Wikipedia Selected 500 Articles</a> . <a href="http://hiran.in" target="_blank">Hiran</a> helped me with the artworks.</p>
<p><a href="http://thottingal.in/blog/wp-content/uploads/2010/04/mlwikioncd.png"><img class="aligncenter size-medium wp-image-257" title="mlwikioncd" src="http://thottingal.in/blog/wp-content/uploads/2010/04/mlwikioncd-300x291.png" alt="The CD cover image designed by Hiran" width="300" height="291" /></a></p>
<p>Since entire process is automated, the program can be used for any other language.  I am releasing the program for the benefit of everybody. You can get the program from <a href="http://github.com/santhoshtr/wiki2cd" target="_blank">here</a>. It is written on Python. Jquery was used for the UI.  For details on the usage, customization etc read the <a href="http://wiki.github.com/santhoshtr/wiki2cd/" target="_blank">wiki page</a> of the project.</p>
<p>For those who can&#8217;t read Malayalam, here is a <a href="http://thottingal.in/projects/wiki2cd/samplewiki/" target="_blank">sample wiki </a>created  by the wiki2cd program from English wikipedia by selecting 10 articles.</p>
<p>Malayalam Wikipedia Community  hope that this is a big step to reach the majority of the people who does not have internet access. If printed, this 500 articles will be at least 5000 pages. CDROM also includes information about commonly used free software based tools for Malayalam computing. Some writing tools and fonts are distributed in the same CD ROM.</p>
<p>Thanks to Malayalam Wikipedia for giving this great opportunity to wok on this project.</p>
<p>The ISO image of the CD is available <a href="http://www.mlwiki.in/mlwikicd/img/MLWikipediaCD-2010.iso" target="_blank">here</a> for download.</p>
]]></content:encoded>
			<wfw:commentRss>http://thottingal.in/blog/2010/04/17/mlwikioncd/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
