Markov chain for Malayalam

I have been trying to generate a Markov chain for Malayalam content. A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.(wikipedia). For natural language, it represents a probabilistic model of words- the probability that one word can come after another word. This model can be prepared by feeding large amount of text to system that learns the probabilities of each words. [Read More]
markov 

Chilanka version 1.400 released

A new version of Chilanka typeface is available now. Version 1.400 is available for download from SMC’s font download and preview site smc.org.in/fonts For users, there is not much changes, but the source and code build system got a major upgrade. Source code updated to UFO format from fontforge sfd format. This allows to work with modern font editors. Use cubic beziers for master design, generate OTF along with TTF. The original drawings for Chilanka was using cubic beziers. [Read More]
fonts 

Lexicon Curation for Mlmorph

One of the key components of Mlmorph is its lexicon. The lexicon contains the root words categorized as nouns, verbs, adjectives, adverbs etc. These are the components used with morphological rules to generate the vocabulary of Malayalam. I collected initial lexicon with about 100,000 words from various sources such as Wikipedia, CLDR and many targeted web crawls. One problem with such collected words is they often contains spelling mistakes. Secondly, classifying these words is not possible without the tedious task of a person going through each and every words. [Read More]

LibreOffice Malayalam spellchecker using mlmorph

A few months back, I wrote about the spellchecker based on Malayalam morphology analyser. I was also trying to intergrate that spellchecker with LibreOffice. It is not yet ready for any serious usage, but if you are curious and would like to help me in its further development, please read on. Malayalam spellchecker – a morphology analyser based approach Blog post on spellchecker approach and pla Current status The libreoffice spellchecker for Malayalam is available at https://gitlab. [Read More]

Gayathri – New Malayalam typeface

Swathanthra Malayalam Computing is proud to announce Gayathri – a new typeface for Malayalam. Gayathri is designed by Binoy Dominic, opentype engineering by Kavya Manohar and project coordination by Santhosh Thottingal. This typeface was financially supported by Kerala Bhasha Institute, a Kerala government agency under cultural department. This is the first time SMC work with Kerala Government to produce a new Malayalam typeface. Gayathri is a display typeface, available in Regular, Bold, Thin style variants. [Read More]

Swanalekha input method now available for Windows and Mac

The Swanalekha transliteration based Malayalam input method is now available in Windows and Mac platforms. Thanks to Ramesh Kunnappully, who wrote the keyman implementation. I wrote this input method in 2008. At those days SCIM was the popular input method for Linux. Later it was rewritten for M17N and used with either IBus or FCITX. A few years later, this input method was made to available in Android using Indic keyboard. [Read More]

Malayalam morphology analyser – First release

I am happy to announce the first version of Malayalam morphology analyser. After two years of development, I tagged version 1.0.0 . In this release In this release, mlmorph can analyse and generate malayalam words using the morpho-phonotactical rules defined and based on a lexicon. We have a test corpora of Fifty thousand words and 82% of the words in it are recognized by the analyser. A python interface is released to make the usage of library very easy for developers. [Read More]

The many forms of ചിരി ☺️

This is an attempt to list down all forms of Malayalam word ചിരി(meaning: ☺️, smile, laugh). For those who are unfamiliar with Malayalam, the language is a highly inflectional Dravidian language. I am actively working on a morphology analyser(mlmorph) for the language as outlined in one of my previous blogpost. I prepared this list as a test case for mlmorph project to evaluate the grammar rule coverage. So I thought of listing it here as well with brief comments. [Read More]

Talk on ‘Malayalam orthographic reforms’ at Grafematik 2018

Santhosh and I presented a paper on ‘Malayalam orthographic reforms: impact on language and popular culture’ at Graphematik conference held at IMT Atlantique, Brest, France. Our session was chaired by Dr. Christa Dürscheid. The paper we presented is available here. The video of our presentation is available in youtube. Grafematik is a conference, first of its kind, bringing together disciplines concerned with writing systems and their representation in written communication. There were lot of interesting talks on various scripts around the world, their digital representation, role of Unicode, typeface design and so on. [Read More]