Procrustes Analysis Based Handwriting Recognition

Many months back, I started an experiment to see if Malayalam handwriting recognition can be done in a non-machine learning based approach. This blog post explains the approach, the work done so far and results. Handwriting recognition can be done while the user is writing(called online handwriting recognition) and recognizing a sample somebody wrote in the past(offline recognition). Online and offline recognition problems are different problems. This is because, in online recognition, it is possible to capture additional details such as pen up, pen down, pen movement directions and rotations. [Read More]

Translating HTML content using a plain text supporting machine translation engine

At Wikimedia, I am currently working on ContentTranslation tool, a machine aided translation system to help translating articles from one language to another. The tool is deployed in several wikipedias now and people are creating new articles sucessfully. The ContentTranslation tool provides machine translation as one of the translation tool, so that editors can use it as an initial version to improve up on. We used Apertium as machine translation backend and planning to support more machine translation services soon. [Read More]

Cross Language Approximate Search on Indic Languages- A demo

A demo of cross language approximate search in Indic text: The Malayalam word സാമ്പാര്‍ is compared against a paragraph from http://ml.wikipedia.org/wiki/Sambar. In the bottom half, words marked in yellow color are search results. You can see that a Kannada word ಸಾಂಬಾರ್‍ is matched for Malayalam word. And that is why this is called cross-language. The inflections of the words സാമ്പാര്‍ – സാമ്പാറും, സാമ്പാറു etc are also found as results. [Read More]