Wikipedia -

Natural language question answering in Wikipedia - an exploration - Part 4

Posted on September 19, 2023 | Santhosh Thottingal

I wrote about the exploration on Natural language querying for wikipedia in previous three blog posts. In Part 1, I was suggesting that building such a collection of question and answers can help natural language answering. One missing piece was actually suggesting an answer for a new question that is not part of QA set for article. In Part 2, I tried using distilbert-base-cased-distilled-squad with ONNX optimization to answer the questions. [Read More]

wikipedia

Natural language question answering in Wikipedia - an exploration - Part 3

Posted on July 21, 2023 | Santhosh Thottingal

I wrote about the exploration on Natural language querying for wikipedia in previous two blog posts. In Part 1, I was suggesting that building such a collection of question and answers can help natural language answering. One missing piece was actually suggesting an answer for a new question that is not part of QA set for article. In Part 2, I tried using distilbert-base-cased-distilled-squad with ONNX optimization to answer the questions. [Read More]

wikipedia

Natural language question answering in Wikipedia - an exploration - Part2

Posted on March 25, 2023 | Santhosh Thottingal

A few days back I posted an experiment on Natural language querying for wikipedia by generating questions and answers. I was suggesting that building such a collection of question and answers can help natural language answering. One missing piece was actually suggesting an answer for a new question that is not part of QA set for article. As a continuation of that experiment, I was exploring various options for answering questions. [Read More]

wikipedia

Natural language question answering in Wikipedia - an exploration

Posted on March 10, 2023 | Santhosh Thottingal

In this blog post I explain the prospects of providing questions and answers as an additional content format in wikipedia and a human-in-the-loop approach for that with a prototype. Introduction Wikipedia is a hub for curiosity, with people visiting the site in search of answers to their questions. However, they typically arrive at Wikipedia via intermediaries such as search engines, which direct them to the relevant article. While Wikipedia’s keyword-based search function can be helpful, it may not be sufficient for addressing more complex natural language queries. [Read More]

wikipedia

One million Wikipedia articles by translation

Posted on October 22, 2021 | Santhosh Thottingal

I am happy to share a news from my work at Wikimedia Foundation. The Wikipedia article translation system, known as Content Translation reached a milestone of creating one million articles. Since 2015, this is my major project at WMF and I am lead engineer for the project. The Content Translation system helps Wikipedia editors to quickly translate and publish articles from one language wiki to another. This way, the knowledge gap between different languages are reduced. [Read More]

Wikipedia

വിക്കിപീഡിയയ്ക്ക് പതിനെട്ട്. നാലുലക്ഷം തർജ്ജമകളും

Posted on January 15, 2019 | Santhosh Thottingal

വിക്കിപീഡിയയുടെ പതിനെട്ടാം പിറന്നാളാണിന്ന്. അമ്പത്തെട്ടുലക്ഷം ലേഖനങ്ങളോടെ ഇംഗ്ലീഷ് വിക്കിപീഡിയയും അറുപതിനായിരത്തോളം ലേഖനങ്ങളോടെ മലയാളം വിക്കിപീഡിയയും ഒരുപാടു പരിമിതികൾക്കും വെല്ലുവിളികൾക്കുമിടയിൽ യാത്ര തുടരുന്നു.

292 ഭാഷകളിൽ വിക്കിപീഡിയ ഉണ്ടെങ്കിലും ഉള്ളടക്കത്തിന്റെ അനുപാതം ഒരുപോലെയല്ല. വിക്കിമീഡിയ ഫൗണ്ടേഷനിൽ കഴിഞ്ഞ നാലുവർഷമായി എന്റെ പ്രധാനജോലി ഭാഷകൾ തമ്മിൽ മെഷീൻ ട്രാൻസ്‌ലേഷന്റെയും മറ്റും സഹായത്തോടെ ലേഖനങ്ങൾ പരിഭാഷപ്പെടുത്തുന്ന സംവിധാനത്തിന്റെ സാങ്കേതികവിദ്യയ്ക്ക് നേതൃത്വം കൊടുക്കലായിരുന്നു.

ഇന്നലെ ഈ സംവിധാനത്തിന്റെ സഹായത്തോടെ പുതുതായി കൂട്ടിച്ചേർത്ത ലേഖനങ്ങളുടെ എണ്ണം നാലുലക്ഷമായി.

wikipedia

A short story of one lakh Wikipedia articles

Posted on July 16, 2016 | Santhosh Thottingal

At Wikimedia Foundation, I am working on a project to help people translate articles from one language to another. The project started in 2014 and went to production in 2015. Over the last one year, a total of 100,000 new artcles were created across many languages. A new article get translated in every five minutes, 2000+ articles translated per week. The 100000th Wikipedia page created with Content Translation is in Spanish, for the song ‘Crying, Waiting, Hoping’ [Read More]

wikimedia wikipedia

Translating HTML content using a plain text supporting machine translation engine

Posted on February 9, 2015 | Santhosh Thottingal

At Wikimedia, I am currently working on ContentTranslation tool, a machine aided translation system to help translating articles from one language to another. The tool is deployed in several wikipedias now and people are creating new articles sucessfully. The ContentTranslation tool provides machine translation as one of the translation tool, so that editors can use it as an initial version to improve up on. We used Apertium as machine translation backend and planning to support more machine translation services soon. [Read More]

algorithms machine translation mediawiki wikimedia wikipedia

Talk at Wikimania 2014

Posted on August 18, 2014 | Santhosh Thottingal

I presented the Content Translation project of my team at Wikimania 2014 at London. Here is the video of the presentation.

talks wikimania wikimedia wikipedia

Parsing CLDR plural rules in javascript

Posted on May 24, 2014 | Santhosh Thottingal

English and many other languages have only 2 plural forms. Singular if the count is one and anything else is plural including zero. But for some other languages, the plural forms are more than 2. Arabic, for example has 6 plural forms, sometimes referred as ‘zero’, ‘one’, ‘two’, ‘few’, ‘many’, ‘other’ forms. Integers 11-26, 111, 1011 are of ‘many’ form, while 3,4,..10 are ‘few’ form. While preparing the interface messages for application user interfaces, grammatically correct sentences are must. [Read More]

cldr javascript mediawiki nodejs plural wikipedia