Google recently added voice typing support to more languages. Among the languages Malayalam is also included. The speech recognition is good quality and I see lot of positive comments in my social media stream. Many people started using it as primary input mechanism. This is a big step for Malayalam users without any doubt. Technical difficulties related to writing in Malayalam in mobile devices is getting reduced a lot. This will lead to more content generated and that is one of the stated goals of Google’s Next billion users project.
[Read More]
Malayalam collation updates in Glibc
GNU C library 2.26 released and proud to have my name listed under contributors. Two small patches for localedata were merged:
Bug 19922 – iso14651_t1_common: Define collation for Malayalam chillu characters
Detailed font reports using fontreport tool
Google i18n team developed a tool to create detailed report of fonts. The tool named fontreport, produces a multi page PDF with Unicode coverage of the font, what glyphs are in it, what Open Type features it supports, available ligatures, and glyph substitutions. Optionally the tool can also create plain text reports. The PDF is generated using TeX.
Manjari font report generated using fontreport tool I found it very useful to create report for a dozen of fonts I maintain with Swathantha Malayalam Computing community.
[Read More]
Proposal for Malayalam language subtags for orthography variants rejected
The Internet Engineering Task Force (IETF) – Languages is responsible for the registration of language tags, subtags and script variants. These registered language tags are used in a wide set of internet standards and applications to identify and annotate language uniquely.
Recently Sascha Brawer(currently working at Google) submitted a proposal to register two new language subtags for Malayalam to denote the orthography variations. Malayalam orthography had a diverging moment in history when Kerala government decided to script reformation in 1971.
[Read More]
A short story of one lakh Wikipedia articles
At Wikimedia Foundation, I am working on a project to help people translate articles from one language to another. The project started in 2014 and went to production in 2015.
Over the last one year, a total of 100,000 new artcles were created across many languages. A new article get translated in every five minutes, 2000+ articles translated per week.
The 100000th Wikipedia page created with Content Translation is in Spanish, for the song ‘Crying, Waiting, Hoping’
[Read More]
Internationalized Top Level Domain Names in Indian Languages
Medianama recently published a news report- “ICANN approves Kannada, Malayalam, Assamese & Oriya domain names“, which says:
ICANN (Internet Corporation for Assigned Names and Numbers) has approved four additional proposed Indic TLDs (top level domain names), in Malayalam, Kannada, Assamese and Oriya languages. The TLDs are yet to be delegated to NIXI (National Internet exchange of India). While Malayalam, Kannada and Oriya will use their own scripts, Assamese TLDs will use the Bengali script.
[Read More]
Fontconfig language matching
I had to spend a few hours to debug a problem about fontconfig not identifiying a font for a language. Following the tradition of sharing the knowledge you acquired in hard way, let me note it down here for search engines.
The font that I am designing now has 3 style variants, thin, regular and bold. All has same family name. So if you set this family for whatever purpose, depending on context, thin, regular or bold versions will be picked up.
[Read More]
Indic hyphenation patterns relicensed
The hyphenation rules for Indian languages I maintain now relicensed to much permissive MIT license. This was a blocker for to get added to Android, and Firefox.
Translating HTML content using a plain text supporting machine translation engine
At Wikimedia, I am currently working on ContentTranslation tool, a machine aided translation system to help translating articles from one language to another. The tool is deployed in several wikipedias now and people are creating new articles sucessfully.
The ContentTranslation tool provides machine translation as one of the translation tool, so that editors can use it as an initial version to improve up on. We used Apertium as machine translation backend and planning to support more machine translation services soon.
[Read More]
Video of our presentation from 7th Multilingual Workshop by W3C
Video of our presentation from 7th Multilingual Workshop by W3C, Madrid, Spain, May 7-8
https://www.youtube.com/embed/_tNancNqFIQ
Best Practices on the Design of Translation- Pau Giner, David Chan and Santhosh Thottingal.
Abstract: Wikipedia is one of the most multilingual projects on the web today. In order to provide access to knowledge to everyone, Wikipedia is available in more than 280 languages. However, the coverage of topics and detail varies from language to language.
[Read More]