Parsing CLDR plural rules in javascript

English and many other languages have only 2 plural forms. Singular if the count is one and anything else is plural including zero.

But for some other languages, the plural forms are more than 2. Arabic, for example has 6 plural forms, sometimes referred as ‘zero’, ‘one’, ‘two’, ‘few’, ‘many’, ‘other’ forms. Integers 11-26, 111, 1011 are of ‘many’ form, while 3,4,..10 are ‘few’ form.

While preparing the interface messages for application user interfaces, grammatically correct sentences are must. “Found 1 results” or “Found 1 result(s)” are bad interface messages. For a developer, if the language in the context is English or languages having similar plural forms, it may be a matter of an if condition to conditionally choose one of the messages.

But that approach is not scalable if we want to deal with lot of languages. Some applications come with their own plural handling mechanism, probably by a module that tells you the plural form, given a number, and language. The plural forms per language and the rules to determine it is defined in CLDR. CLDR defines the plural rules in a markup language named LDML and releases the collections frequently.

If you look at the CLDR plural rules table you can easily understand this. The rules are defined in a particular syntax. For example, the Russian plural rules are given below.

One need to pass the value of the number to the variable in the above expressions and evaluate. If the expression evaluates to a boolean true, then the corresponding plural form should be used.

So, an expression like  n = 0 or n != 1 and n mod 100 = 1..19 mapped to ‘many’ holds true if the value of n=0,119, 219, 319. So we say that they are of ‘few’ plural form.

But in the Russian example given above, we don’t see n, but we see variables v, i etc. The meaning of these variables are defined in the standard as:

<th>
  Value
</th>
<td>
  absolute value of the source number (integer and decimals).
</td>
<td>
  integer digits of n.
</td>
<td>
  number of visible fraction digits in n, <em>with</em> trailing zeros.
</td>
<td>
  number of visible fraction digits in n, <em>without</em> trailing zeros.
</td>
<td>
  visible fractional digits in n, <em>with</em> trailing zeros.
</td>
<td>
  visible fractional digits in n, <em>without</em> trailing zeros.
</td>
Symbol
n
i
v
w
f
t

Keeping these definitions in mind, the expression v = 0 and i % 10 = 1 and i % 100 != 11 evaluates true for 1,21,31, 41 etc and false for 11. In other words, number 1,21,31 are of plural form “one” in Russian.

A module to support the plural forms for any language can manually(or semi automatically) convert this expressions to programming language one time and use it. Twitter-cldr a CLDR abstraction library by twitter follows this method. It converted the above given plural rules to the following javascript expression using a compiler.

This works. But CLDR updates the plural rules in every releases. Most of the time, it contains additional language support. Sometimes the rules are changed or improved too. The maintainer of the module need to recompile them to javascript expression in such cases.

If we can write a compiler to generate javascript from this expressions, can’t we write a parser-evaluator for the expressions? So that we just need to pass the rule and the number to that evaluator  and it returns the plural form?

CLDRPluralRuleParser

[ CLDR Plural Rule Evaluator][3]
CLDR Plural Rule Evaluator

CLDRPluralRuleParser is that evaluator. I wrote this parser when we at Wikimedia foundation wanted a data driven plural rule evaluation for the 300+ languages we support. It started as a free time project in June 2012. Later it became part of MediaWiki core to support front-end internationalization. We wanted a PHP version also to support interface messages constructed at server side. Tim Starling wrote a PHP CLDR plural rule evaluator.

It is javascript library that takes the standardized plural rule and an integer and returning true or false depending on whether the rule pass for the given integer. It is written with UMD/common.js pattern and available as a node module too.

The node module comes with command line interface, just to experiment with rules.

$ cldrpluralruleparser 'n is 1' 0

false<br />

The module does not self contain the plural rules collection or data. Developers need to have that collection as an xml or json inside the application and need to pass to the module. In that sense, one cannot offload the whole i18n message processing task to this module. For a more handy internationalization with javascript, that takes care of plural, gender, grammar etc, you may consider jquery.i18n which contains CLDRPluralRuleParser.

An example showing how to use the CLDR supplied plural rule data and this library is included in the repository. You can play with that application here.

License: Initially the license of the module was GPL, but as per some of the collaboration discussion between Wikimedia, cldrjs, jQuery.globalize, moment.js, it was decided to change the license to MIT.

comments powered by Disqus