Translating HTML content using a plain text supporting machine translation engine

At Wikimedia, I am currently working on ContentTranslation tool, a machine aided translation system to help translating articles from one language to another. The tool is deployed in several wikipedias now and people are creating new articles sucessfully. The ContentTranslation tool provides machine translation as one of the translation tool, so that editors can use it as an initial version to improve up on. We used Apertium as machine translation backend and planning to support more machine translation services soon. [Read More]

Making of Keraleeyam font: From ASCII to Unicode

Keraleeyam is a new unicode malayalam font designed for titles. It was originally designed in 2005 for ‘Keraleeyam’, a magazine supporting environmental movements in Kerala, with ASCII encoding and was distributed along with Rachana editor software. Unicode font feature tables for malayalam are complex, which include diverse rules for ligature formation and glyph positioning. Keraleeyam which was originally ASCII encoded, contained no such rules. It would have been a herculian task to manually add the rules for each glyph. [Read More]

New handwriting style font for Malayalam: Chilanka

A new handwriting style font for Malayalam is in development. The font is named as “Chilanka”(ചിലങ്ക). This is a alpha version release. Following is a sample rendering. More samples here. You may try the font using this edtiable page http://smc.org.in/downloads/fonts/chilanka/tests/ -It has the font embedded Download the latest version: http://smc.org.in/downloads/fonts/chilanka/Chilanka.ttf Font license: Free licensed font, OFL. Source code: https://github.com/smc/Chilanka Tools used for drawing: Inkscape and fontforge Chilanka/ചിലങ്ക is a musical anklet [Read More]

HOWTO: Wacom Bamboo CTH301K in Debian

This is a short documentation on getting Wacom Bamboo CTH301K working in Debian. I use Debian Sid with Linux kernel 3.16 at the time of writing this. But this should work with latest Ubuntu(14.04 or 14.10) and new kernels. Wacom Bamboo CTH301K is an entry level touch pad with stylus – you can use it as a mouse, or drawing pad with stylus. It has multitouch features like pinch zoom and all. [Read More]

Updated Swanalekha JavaScript Library

Six years back I wrote a javascript version of popular Swanalekha input method. Friends were requesting web based version that they can use in Windows platforms too. Swanalekha was initially written as a SCIM input method and later m17n input method, readily available in GNU/Linux platform. I had provided a bookmarklet version too. Later Rajiv Nair and Nishan Nasir wrote Chrome and Firefox extensions. Yesterday I noticed that the updated Marunadan Malayali news portal using my javascript library to power Malayalam input method in their search boxes. [Read More]

Video of our presentation from 7th Multilingual Workshop by W3C

Video of our presentation from 7th Multilingual Workshop by W3C, Madrid, Spain, May 7-8 https://www.youtube.com/embed/_tNancNqFIQ Best Practices on the Design of Translation- Pau Giner, David Chan and Santhosh Thottingal. Abstract: Wikipedia is one of the most multilingual projects on the web today. In order to provide access to knowledge to everyone, Wikipedia is available in more than 280 languages. However, the coverage of topics and detail varies from language to language. [Read More]

Typesetting Malayalam using XeTeX

XeTeX is an extension of TeX with built-in support for Unicode and OpenType. In this tutorial, we are going to learn how to typeset Malayalam using XeTeX. With some learning effort, we can produce high quality typesetting using XeTeX. Installing XeTeX XeTeX is packaged for all famous GNU/Linux distros. The installation method depends your distro. For ease of installation and configuration, we suggest to use a TeXLive version 2012 or above – either standalone TeXLive distribution or install from your distribution’s package manager. [Read More]
TeX  xetex