A few days back I posted an experiment on Natural language querying for wikipedia by generating questions and answers. I was suggesting that building such a collection of question and answers can help natural language answering. One missing piece was actually suggesting an answer for a new question that is not part of QA set for article. As a continuation of that experiment, I was exploring various options for answering questions. [Read More]
Natural language question answering in Wikipedia - an exploration
In this blog post I explain the prospects of providing questions and answers as an additional content format in wikipedia and a human-in-the-loop approach for that with a prototype. Introduction Wikipedia is a hub for curiosity, with people visiting the site in search of answers to their questions. However, they typically arrive at Wikipedia via intermediaries such as search engines, which direct them to the relevant article. While Wikipedia’s keyword-based search function can be helpful, it may not be sufficient for addressing more complex natural language queries. [Read More]
One million Wikipedia articles by translation
I am happy to share a news from my work at Wikimedia Foundation. The Wikipedia article translation system, known as Content Translation reached a milestone of creating one million articles. Since 2015, this is my major project at WMF and I am lead engineer for the project. The Content Translation system helps Wikipedia editors to quickly translate and publish articles from one language wiki to another. This way, the knowledge gap between different languages are reduced. [Read More]
വിക്കിപീഡിയയ്ക്ക് പതിനെട്ട്. നാലുലക്ഷം തർജ്ജമകളും
വിക്കിപീഡിയയുടെ പതിനെട്ടാം പിറന്നാളാണിന്ന്. അമ്പത്തെട്ടുലക്ഷം ലേഖനങ്ങളോടെ ഇംഗ്ലീഷ് വിക്കിപീഡിയയും അറുപതിനായിരത്തോളം ലേഖനങ്ങളോടെ മലയാളം വിക്കിപീഡിയയും ഒരുപാടു പരിമിതികൾക്കും വെല്ലുവിളികൾക്കുമിടയിൽ യാത്ര തുടരുന്നു.
292 ഭാഷകളിൽ വിക്കിപീഡിയ ഉണ്ടെങ്കിലും ഉള്ളടക്കത്തിന്റെ അനുപാതം ഒരുപോലെയല്ല. വിക്കിമീഡിയ ഫൗണ്ടേഷനിൽ കഴിഞ്ഞ നാലുവർഷമായി എന്റെ പ്രധാനജോലി ഭാഷകൾ തമ്മിൽ മെഷീൻ ട്രാൻസ്ലേഷന്റെയും മറ്റും സഹായത്തോടെ ലേഖനങ്ങൾ പരിഭാഷപ്പെടുത്തുന്ന സംവിധാനത്തിന്റെ സാങ്കേതികവിദ്യയ്ക്ക് നേതൃത്വം കൊടുക്കലായിരുന്നു.
ഇന്നലെ ഈ സംവിധാനത്തിന്റെ സഹായത്തോടെ പുതുതായി കൂട്ടിച്ചേർത്ത ലേഖനങ്ങളുടെ എണ്ണം നാലുലക്ഷമായി.
A short story of one lakh Wikipedia articles
At Wikimedia Foundation, I am working on a project to help people translate articles from one language to another. The project started in 2014 and went to production in 2015. Over the last one year, a total of 100,000 new artcles were created across many languages. A new article get translated in every five minutes, 2000+ articles translated per week. The 100000th Wikipedia page created with Content Translation is in Spanish, for the song ‘Crying, Waiting, Hoping’ [Read More]
Translating HTML content using a plain text supporting machine translation engine
At Wikimedia, I am currently working on ContentTranslation tool, a machine aided translation system to help translating articles from one language to another. The tool is deployed in several wikipedias now and people are creating new articles sucessfully. The ContentTranslation tool provides machine translation as one of the translation tool, so that editors can use it as an initial version to improve up on. We used Apertium as machine translation backend and planning to support more machine translation services soon. [Read More]
Talk at Wikimania 2014
I presented the Content Translation project of my team at Wikimania 2014 at London. Here is the video of the presentation.
English and many other languages have only 2 plural forms. Singular if the count is one and anything else is plural including zero. But for some other languages, the plural forms are more than 2. Arabic, for example has 6 plural forms, sometimes referred as ‘zero’, ‘one’, ‘two’, ‘few’, ‘many’, ‘other’ forms. Integers 11-26, 111, 1011 are of ‘many’ form, while 3,4,..10 are ‘few’ form. While preparing the interface messages for application user interfaces, grammatically correct sentences are must. [Read More]
W3C Workshop at Madrid
I will be speaking at the upcoming W3C workshop at Madrid. The workshop is on 7-8 May 2014 and the theme is “New Horizons for the Multilingual Web”. I will be co-presenting with Pau Giner, David Chan from Wikimedia Foundation Language engineering team on best practices of translation at wikipedia. It will cover the design (from both technical and user experience perspectives) of the translation tools, and their expected impact on Wikipedia and the Web as a whole. [Read More]
Collaboratively edited documentation for Indic font developers
One of the integral building blocks for providing multilingual support for digital content are fonts. In current times, OpenType fonts are the choice. With the increasing need for supporting languages beyond the Latin script, the TrueType font specification was extended to include elements for the more elaborate writing systems that exist. This effort was jointly undertaken in the 1990s by Microsoft and Adobe. The outcome of this effort was the OpenType Specification – a successor to the TrueType font specification. [Read More]