Foreign word detection in mlmorph

The test corpus for Malayalam Morphological analysis has many foreign words. They are either written in a non-Malayalam script or written in Malayalam. For example, “ഇലക്ട്രിസിറ്റി”, “ഡോക്സ്”, “ഇന്റർമീഡിയറ്റ്”, “അബ്സ്ട്രാക്റ്റ്”, “ഇല്ലസ്ടേഷൻ”, “ഇല്ലിറ്ററേറ്റ്”, “റെക്കോർഡ്”, “procrastination”, “唐宸禹” - These are all foreign words and it is useless to analyse them using mlmorph. Since mlmorph works based on a root word lexicon, it is practically impossible to have them in lexicon. So there should be a way to identify the words easily and tag them as FW - Foreign word Part of speech. [Read More]

What is a good input method?

As more and more people enter to the Malayalam digital world, the issue of not having any formal training for Malayalam computing is more observed. People sometimes just search for input methods in web, ask friends or use whatever coming with the devices they have. Since I myself is an author of two input methods, sometimes people ask me too. This essay is about the characteristics of a good input method to help people make the right choice. [Read More]

Stuttgart Finite State Transducer(SFST) formalism support for VS Code

I just published a VS Code language extension to support syntax highlighting for Stuttgart Finite State Transducer (SFST) formalism to VS Code.

I learned how to write a language extension when I attempted the opentype feature file support. So I thought of applying that learning to SFST which I regularly use for the Malayalam morphology analyser project.

Opentype feature file support for VS Code

I just published a VS Code language extension to support OpenType feature files in the Adobe “AFDKO” format. The extension provides syntax highlighting and code snippet support. (Screenshot From Amiri font) The syntax highlighting patterns for AFDKO is based on the opentype-feature-bundle for Atom Editor by Kennet Ormandy which is based upon Brook Elgie’s original Textmate bundle. The code snippets are based on the snippets prepared by Simon Cozens for AFDKO-SublimeText [Read More]

Malayalam Spellchecker version 1.1.1 released

A new version of Malayalam spell checker based on mlmorph is available as python library. Install the library $ pip install mlmorph_spellchecker Sample usage >>> from mlmorph_spellchecker import SpellChecker >>> spellchecker = SpellChecker() >>> word = "ഉച്ഛാരണം" >>> spellchecker.spellcheck(word) False >>> spellchecker.candidates(word) ['ഉച്ചാരണം'] >>> spellchecker.spellcheck("ചിത്രകാരൻ") True The new version adds a database of commonly mistaken words of Malayalam for quick checks and correction. If the given word is present in that common list, spellcheck result and correction suggestions will be based on that database. [Read More]

Manjari version 1.910 released

A new version of Manjari typeface is available now. New version adds about 25 Latin glyphs that are considered important by Google Fonts checks. Manjari is now integrated with Fontbakery font quality check in its CI. Some bugs reported by Fontbakery is also fixed. It is available at SMC website. Change log is available in gitlab SMC also started to publish the font releases in a new release file server - releases. [Read More]

Tamil Computing Virtual Meetup

Today(August 09, 2020), Tamil Virtual Academy organized a virual meetup on Tamil computing and its roadmap. This full day event had 18 sessions presented by various people working on Tamil computing. Event was chaired by T. Udhayachandran IAS, Director of TVU. I was also invited for the program. I talked about potentially collaboration of Tamil and Malayalam computing communities to solve common problems. Opensource based language computing helps to accelerate language computing in both languages by such collaboration. [Read More]

Manjari - 4th anniversary

A rough drawing I did in 2014 November 20 and shared with my friends as a new font idea. I got this concept from my explorations about perfect curves in Malayalam script after I released Chilanka font. I spent all my free time from then onwards till releasing Manjari typeface on 23rd July 2016 by making it as perfect as I can. I took two months time off from my job in 2016 to complete this work too. [Read More]

Morphology analyser based spellchecker - Web version

I prepared a web frontend for the Malayalam spellchecker based on the Malayalam morphology analyser. It is available at https://morph.smc.org.in/spellcheck.html. I had written an article about its technology two years ago. There s also an incomplete extension to LibreOffice. The spellchecker is available as an API too. If you want to use it, please refer a minimal code snippet available at codepen. The quality of spellcheck and suggestions provided depend on the completeness of mlmorph project. [Read More]