NotoSansMalayalam and nta

NotoSansMalayalam has the following ligature rules for ന്റ (nta)- All uses Akhand Opentype feature uni0D7B(ൻ) + uni0D4D(്) + uni0D31(റ) => ൻ + ് + റ uni0D28(ന) + uni0D4D(്) + uni200D(ZWNJ) +uni0D31(റ) => ന്‍ + റ uni0D28(ന) + uni0D4D(്) + uni0D31(റ) => ന് +റ The first one is what is defined in Unicode chapter 09 section 9.9[pdf]. The second is what Microsoft Kartika used to use for /nta/ as a bug. The last one is what all other fonts follows. [Read More]

Spurious glyphs in NotoSansMalayalam

NotoSansMalayalam is a font released by Google internationalization team under noto project. I was checking the glyphs of Malayalam and noted a number of spurious glyphs in the font It is interesting because the font attempted to provide a minimal Malayalam font with reduced glyph set. While attempting that about 10% of the glyphs are either non-existing Malayalam glyphs(Glyphs with dot under consonants) or rarely used glyphs(Glyphs with U+0D62 MALAYALAM VOWEL SIGN VOCALIC L) [Read More]
fonts 

Collaboratively edited documentation for Indic font developers

One of the integral building blocks for providing multilingual support for digital content are fonts. In current times, OpenType fonts are the choice. With the increasing need for supporting languages beyond the Latin script, the TrueType font specification was extended to include elements for the more elaborate writing systems that exist. This effort was jointly undertaken in the 1990s by Microsoft and Adobe. The outcome of this effort was the OpenType Specification – a successor to the TrueType font specification. [Read More]

Hyphenation in web

This is a follow up of a 4 year old blog post about hyphenation. Hyphenation allows the controlled splitting of words to improve the layout of paragraphs, typically splitting words at syllabic or morphemic boundaries and visually indicating the split (usually with a hyphen). I wrote about how a webpage can use Hyphenator javascript library to achieve hyphenation for a text with ‘justify‘ style. Along with the hyphenation rules I wrote for many Indian languages, this solution works and some websites already use it. [Read More]

New version of Malayalam fonts released

Swathanthra Malayalam Computing project announced the release of new version of Malayalam unicode fonts this week. In this version, there are many improvements for popular Malayalam fonts Rachana and Meera. Dyuthi font has some bug fixes. I am listing the changes below. Meera font was small compared to other fonts. This was not really a problem in Gnome environment since fontconfig allows you to define a scaling factor to match other font size. [Read More]

SVG Fonts

This post is some notes on the current state of SVG Fonts. SVG is not a webfont format. The purpose of SVG fonts is to be embedded inside of SVG documents (or linked to them), similar to the way you would embed standard TrueType or OpenType fonts in a PDF. SVG fonts are text files that contain the glyph outlines represented as standard SVG elements and attributes, as if they were single vector objects in the SVG image. [Read More]

Malayalam Wikisource Offline version

Malayalam Wikisource community today released the first offline version of Malayalam wikisource during the 4th annual wiki meetup of Malayalam wikimedians. To the best of our knowledge, this is the first time a wikisource project release its offline version. Malayalam wiki community had released the first version of Malayalam wikipedia one year back. Releasing the offline version of a wikisource is a challenging project. The technical aspects of the project was designed and implemented by myself. [Read More]

Mediawiki Berlin hackathon

I am just back from Mediawiki Berlin Hackathon. On May 13 to 15, Mediawiki developers attended the hackathon and squashed many bugs and discussed many features. Members of language committee had its first real-life meeting in parallel with hackathon. It was a nice event, learned a lot, talked to many awesome hackers and linguists. Milos Rancic has written a summary of the discussions happened during language committee meeting here : http://lists. [Read More]

Creating a new Language ecosystem- Sourashtra as example

Sourashtra is a language spoken by Sourashtra people living in South Tamilnadu and Gujarat of India. Originated from Brahmi and then Grandha, this language is mother tongue for half a million people. But most of them are not familiar with the script of this language. Very few people knows reading and writing on Sourashtra script. Sourashtra has a ISO 639-3 language code saz and Unicode range U+A880 – U+A8DF Recently Sourashtra wikipedia project was started in the wikimedia incubator : http://incubator. [Read More]

Cross Language Approximate Search on Indic Languages- A demo

A demo of cross language approximate search in Indic text: The Malayalam word സാമ്പാര്‍ is compared against a paragraph from http://ml.wikipedia.org/wiki/Sambar. In the bottom half, words marked in yellow color are search results. You can see that a Kannada word ಸಾಂಬಾರ್‍ is matched for Malayalam word. And that is why this is called cross-language. The inflections of the words സാമ്പാര്‍ – സാമ്പാറും, സാമ്പാറു etc are also found as results. [Read More]