Skip to content

{ Author Archives }

Malayalam Wikisource Offline version

Malayalam Wikisource community today released the first offline version of Malayalam wikisource during the 4th annual wiki meetup of Malayalam wikimedians. To the  best of our knowledge, this is the first time a wikisource project release its offline version. Malayalam wiki community had released the first version of Malayalam wikipedia one year back. Releasing the […]


Mediawiki Berlin hackathon

I am just back from Mediawiki Berlin Hackathon. On May 13 to 15, Mediawiki developers attended the hackathon and squashed many bugs and discussed many features. Members of language committee had its first real-life meeting in parallel with hackathon. It was a nice event, learned a lot, talked to many awesome hackers and linguists. Milos […]

Tagged ,

Creating a new Language ecosystem- Sourashtra as example

Sourashtra is a language spoken by Sourashtra  people living in South Tamilnadu and Gujarat of India. Originated from Brahmi and then Grandha, this language is mother tongue for half a million people. But most of them are not familiar with the script of this language. Very few people knows reading and writing on Sourashtra script. […]

Tagged , , ,

Cross Language Approximate Search on Indic Languages- A demo

A demo of cross language approximate search in Indic text: The Malayalam word സാമ്പാര്‍ is compared against a paragraph from In the bottom half, words marked in yellow color are search results. You can see that a Kannada word ಸಾಂಬಾರ್‍ is matched for Malayalam word. And that is why this is called cross-language. The […]

Tagged , , ,

Tamil Collation in GLIBC

A  few months back, we started fixing the collation rules of Indian languages in GNU C library. Pravin Satpute prepared patches for many languages and I prepared patches for Malayalam and Tamil. Later Pravin enhanced the Tamil patch. You can read the rules used for Malayalam collation here[PDF document]. Tamil patch was applied to upstream, […]

Tagged , , ,

Identifiers In Indic Languages

Recently, while preparing a critique for  IDN Policy for Malayalam language prepared by CDAC,  I noticed that ICANN does not allow control characters in the domain names.  Sometime back I noticed Python 3 identifiers also does not allow control characters in the Identifiers. This blog post attempts to analyze the issue by looking at the […]

Tagged , , , , , , ,

Dictionary Jabber Buddy Bots

Recently we released two Jabber buddy bots for dictionary lookup. By adding as a chat contact one can ask for the meaning of an English word in Malayalam by just sending a chat message. Similarly for English-Hindi or Hindi-English dictionary, we have another bot Both of these dictionaries use Dict databases based on  […]

Tagged , , ,

Indic Language Computing Workout, Pune

On 22nd August, I conducted a workout session with Praveen on Indic Language Computing at Red Hat Office, Pune. The plan was to solve some of the issues in Devanagari support for the encoding converter Payyans. But most of the time was spent on Introducing the concepts of Indic language computing to participants.  Project Silpa […]

Tagged ,

Wikimania 2010, Poland

I left Chennai on Wednesday(8th) and reached Frankfurt airport on Thursday morning. Rest of the people from India for wikimania- Shiju Alex, Tinu Cherian, Srinivas Gunta, Arjun Rao  were already reached the airport and I joined them. We reached Gdansk Airport by 12.30 PM. Our accommodation was at a students hostel of Gdansk University.  Language […]


Attending Wikimania 2010

I will be attending  Wikimania 2010,  Gdansk, Poland.  This annual international conference of the Wikimedia community is from July 9 to July 11. I will be presenting wik2cd, the tool I wrote for Malayalam wikipedia version 1.0 there in a joint workshop with wikipedia offline developers.  I will be joining with Manuel Schneider,  Shiju Alex, […]

Tagged ,