English and many other languages have only 2 plural forms. Singular if the count is one and anything else is plural including zero.
But for some other languages, the plural forms are more than 2. Arabic, for example has 6 plural forms, sometimes referred as ‘zero’, ‘one’, ‘two’, ‘few’, ‘many’, ‘other’ forms. Integers 11-26, 111, 1011 are of ‘many’ form, while 3,4,..10 are ‘few’ form.
While preparing the interface messages for application user interfaces, grammatically correct sentences are must. “Found 1 results” or “Found 1 result(s)” are bad interface messages. For a developer, if the language in the context is English or languages having similar plural forms, it may be a matter of an if condition to conditionally choose one of the messages.
But that approach is not scalable if we want to deal with lot of languages. Some applications come with their own plural handling mechanism, probably by a module that tells you the plural form, given a number, and language. The plural forms per language and the rules to determine it is defined in CLDR. CLDR defines the plural rules in a markup language named LDML and releases the collections frequently.
If you look at the CLDR plural rules table you can easily understand this. The rules are defined in a particular syntax. For example, the Russian plural rules are given below.
One need to pass the value of the number to the variable in the above expressions and evaluate. If the expression evaluates to a boolean true, then the corresponding plural form should be used.
So, an expression like n = 0 or n != 1 and n mod 100 = 1..19 mapped to ‘many’ holds true if the value of n=0,119, 219, 319. So we say that they are of ‘few’ plural form.
But in the Russian example given above, we don’t see n, but we see variables v, i etc. The meaning of these variables are defined in the standard as:
|n||absolute value of the source number (integer and decimals).|
|i||integer digits of n.|
|v||number of visible fraction digits in n, with trailing zeros.|
|w||number of visible fraction digits in n, without trailing zeros.|
|f||visible fractional digits in n, with trailing zeros.|
|t||visible fractional digits in n, without trailing zeros.|
Keeping these definitions in mind, the expression v = 0 and i % 10 = 1 and i % 100 != 11 evaluates true for 1,21,31, 41 etc and false for 11. In other words, number 1,21,31 are of plural form “one” in Russian.
CLDRPluralRuleParser is that evaluator. I wrote this parser when we at Wikimedia foundation wanted a data driven plural rule evaluation for the 300+ languages we support. It started as a free time project in June 2012. Later it became part of MediaWiki core to support front-end internationalization. We wanted a PHP version also to support interface messages constructed at server side. Tim Starling wrote a PHP CLDR plural rule evaluator.
The node module comes with command line interface, just to experiment with rules.
$ cldrpluralruleparser 'n is 1' 0
License: Initially the license of the module was GPL, but as per some of the collaboration discussion between Wikimedia, cldrjs, jQuery.globalize, moment.js, it was decided to change the license to MIT.
Browsers provide an option to choose the preferred language a website to be shown, often named as “Accept language“.
These preference values allows websites to deliver a suitable language version to the user.
navigator.language does exist, but that does not give the correct values. For chrome, it gives browsers UI language and it differs from what is meant by accept-languages. Firefox 5 onwards this property’s value is based on the value of the Accept-Language header value. It returns a string value, but accept-language is usually a list of language values in the order of preference.
The good news is, a patch just landed in Firefox to support
It returns an array of language tags representing the user’s preferred languages, with the most preferred language first.
The most preferred language is the one returned by
Now that it is landed in Firefox, Blink developers are also considering the implementation.
This will definitely improve the web experience to users and help a lot for internationalization developers.
I will be speaking at the upcoming W3C workshop at Madrid. The workshop is on 7-8 May 2014 and the theme is “New Horizons for the Multilingual Web”.
I will be co-presenting with Pau Giner, David Chan from Wikimedia Foundation Language engineering team on best practices of translation at wikipedia. It will cover the design (from both technical and user experience perspectives) of the translation tools, and their expected impact on Wikipedia and the Web as a whole.
Ubuntu Trusty Tahr is going to be released on April 17th 2014.
The font is already available in Debian. In both Ubuntu and Debian you can install the font by
sudo apt-get install fonts-meera-taml
Thanks Vasudev for packaging it for Debian.
For an advanced logging system for nodejs applications, winston is very helpful. Winston is a multi-transport async logging library for node.js. Similar to famous logging systems like log4j, we can configure the log levels and winston allows to define multiple logging targets like file, console, database etc.
I wanted to configure logging as per usual nodejs production vs development environment. Of course with development mode, I am more interested in debug level logging and at production environment I am more interested in higher level logs.
I am sharing my singleton logger instance setup code.
This is based on the RFC: https://www.mediawiki.org/wiki/Requests_for_comment/Localisation_format
A lot of extensions were also migrated to the new localisation format, thanks to Siebrand Mazeland
If you are interested in seeing some of the sample json files see https://gerrit.wikimedia.org/r/#/c/122787/ , claimed as “largest patch set in the history of MediaWiki”
I use Brackets for web development. I had tried several other IDEs but Brackets is my current favorite IDE. A few things I liked is listed below
- It is free software licensed under the MIT License
- Availability of large number of extensions
Some extensions I use with Brackets are:
- Markdown Preview for easy editing of markdown
- Brackets Git for git integration
- Themes for Brackets For Monokai Darksoda theme I use
- Brackets Linux UI
- Interactive Linter realtime JSHint/JSLint/CoffeeLint reports into brackets as you work on your code
- WD Minimap for SublimeText like code overview
- Beautify for automatic code formatting as you save using jsbeautify
There was an enhancement bug for this. I wrote a patch for handling project specific jsbeautifyrc and Martin Zagora merged it to the repo. Here is my .jsbeautifyrc for MediaWiki https://gist.github.com/santhoshtr/9867861
Brackets is in active development and I look forward for more features. The most important bug I would like to get fixed, that all code editors I tried suffer including brackets is support of pain free complex script editing and rendering. Brackers uses CodeMirror for the code editor and I had reported this issue . It is not trivial to fix and root cause is related to the core design. Along with js,css,html, php etc I have to work with files containing all kind of natural language text and this feature is important to me.
- uni0D7B(ൻ) + uni0D4D(്) + uni0D31(റ) => ൻ + ് + റ
- uni0D28(ന) + uni0D4D(്) + uni200D(ZWNJ) +uni0D31(റ) => ന് + റ
- uni0D28(ന) + uni0D4D(്) + uni0D31(റ) => ന് +റ
The first one is what is defined in Unicode chapter 09 section 9.9[pdf]. The second is what Microsoft Kartika used to use for /nta/ as a bug. The last one is what all other fonts follows. If this is what standards can achieve, what can I say?