How to type Malayalam using Keyman 10 and Mozhi

This is a quick tutorial on installing Mozhi input method in Windows 10.

Mozhi is a transliteration based keyboard  for Malayalam. You can type malayaalam to get മലയാളം for example. We will use Keyman tool as the input tool. Keyman input tool is an opensource input mechanism now developed by SIL. It supports lot of languages and Mozhi malayalam is one of that.

Step 1: Download Keyman desktop with Mozhi Malayalam keyboard

Go to https://keyman.com/keyboards/mozhi_malayalam. There you will see the following options to download. Select the first one as shown below. Download the installer to your computer. It is a file about 20MB.

Keyman 10 Desktop download page.

Step 2: Installation

Double click the downloaded file to start installation. The installer will be like this:

Keyman 10 Desktop installer

Click on the Install Keyman Desktop button. You will see the below screen.

Keyman 10 Desktop welcome page.

 

Press the “Start keyman” button. The installation will start and keyboard will start.

Step 3: Choose Mozhi input method

You will see a small icon at the bottom of your screen, near time is displayed.

Click on that to choose Mozhi.

Keyboard selection

Once you chose Mozhi, you can type in Manglish anywhere and you will see malayalam. To learn typing click on the “Keyboard Usage” as shown above.

Step 4: Start typing in Malayalam

You can directly type Malayalam in any application without copy paste. Just like English, start typing. Make sure to use a good Malayalam font. You can get them from https://smc.org.in/fonts/

Using Mozhi in LibreOffice. Notice the font used is Manjari.What I typed is “ippOL enikk malayaalam ezhuthaanaRiyaam”

 

It is your language and your pen

Photo by Joe Shillington on Unsplash

Google recently added voice typing support to more languages. Among the languages Malayalam is also included. The speech recognition is good quality and I see lot of positive comments in my social media stream. Many people started using it as primary input mechanism. This is a big step for Malayalam users without any doubt. Technical difficulties related to writing in Malayalam in mobile devices is getting reduced a lot. This will lead to more content generated and that is one of the stated goals of Google’s Next billion users project. The cloud api for speech recognition will help android developers to build new innovative apps around the speech recognition feature.

Google had added handwriting based input method for many of these languages in 2015. It was also well recieved by Malayalam user community and many chose it as primary input method mechanism for mobile devices.

Google’s machine learning based language tools, including the machine translation is well engineered projects and takes the language technology forward. For a language like Malayalam with relatively less language processing technology, this is a big boost. There is not even a competing product in the above mentioned areas.

All of these above technologies are closed source software, completely controlled by Google. Google’s opensource strategy is a complicated one. Google supports and uses opensource to gain maximum out of it – a pragmatic corporate exploitation. Machine learning based technologies are complex to be defined in the traditional open source definition. Here, for a ML based service provider, the training toolkit might be opensource, tensorflow for example. At the same time, the training data, models might be closed and secret. So, basically the system can be only reproduced by the owners of the data and those who has enough processing capacity. These emerging trends in language technology is also hard for individual opensource developers to catch up because of resourcing issues(data, processing capacity).

Is this model good for language?

Think about this. With no competition, the android operating system with Google’s technology platform is becoming default presence in mobile devices of Malayalam speakers with no doubt. The new language technologies are being quickly accepted as the one and only way to convey a persons expressions to digital world. No, it is not an exaggeration. The availability and quality of these tools is clearly winning its mass user crowd. There is no formal education for Malayalam typing. People discover and try anything that is available. For a new person to the digital world, handwriting was the easiest method to input Malayalam. Now it is speech recognition. And that will be the one and only one way these users know to enter Malayalam content. And these tools are fully owned and controlled by Google with no alternatives.

The open soure alternatives for input methods are still at the traditional typing keyboards. With its peers, they indeed won large user base and it even came to the users before Google entered. For example, the Indic keyboard has 1.4 million installations and actively improved by contributor for 23 languages. But I don’t see any opensource project that is in parallel with handwriting and speech recognition based input methods. As a developer working in Indic language technology based on free software, this is indeed a failure of opensource community.

I contacted a few academic researchers working on speech recognition and handwring recognition and asked what they think about these products by Google. For them, it is more difficult to convince the value of their research. ‘Well, we have products from Google that does this and thousands are using it. Why you want to work again on it?’ This question can’t be answered easily.

But to me, all of these products and its above mentioned nature strongly emphasis the need for free software alternatives. The mediation by closed sourced systems on one of the fundamental language computing task- inputting – with no alternatives puts the whole language and hence its users in heavy risk. Input method technologies, speech recognition, handwriting recognition.. all these are core to the language technology. These technolgies and science behind them should be owned by its speakers. People should be able to study, innovate on top of this technology and should be able to build mechanisms that are free from any corporate control to express their language.

I don’t want to imply or spread fear, uncertainity that Google will one day just start charging for these services or shutdown the tools. That is not my concern. All these language tools I mentioned are not to be built for facing that situation. It is to be developed as fundamental communication tools for the people for the digital age – build, own, learn, use, maintain by the people.

Updated Swanalekha JavaScript Library

Six years back I wrote a javascript version of popular Swanalekha input method. Friends were requesting web based version that they can use in Windows platforms too. Swanalekha was initially written as a SCIM input method and later m17n input method, readily available in GNU/Linux platform. I had provided a bookmarklet version too. Later Rajiv Nair and Nishan Nasir wrote Chrome and Firefox extensions.

Yesterday I noticed that the updated Marunadan Malayali news portal using my javascript library to power Malayalam input method in their search boxes. I don’t track who else or how many use these libraries and tools. But I quickly realized how bad the code is. Or in other words, How better I can write them today. 6 years is a long span of time anyway.

So I did some cleanup and rewrite, added documentation, example and here it is: http://thottingal.in/projects/swanalekha/swanalekha-ml.html

Code: https://github.com/smc/input-methods/tree/master/swanalekha-js

Enjoy.