Creating a new Language ecosystem- Sourashtra as example

Sourashtra is a language spoken by Sourashtra  people living in South Tamilnadu and Gujarat of India. Originated from Brahmi and then Grandha, this language is mother tongue for half a million people. But most of them are not familiar with the script of this language. Very few people knows reading and writing on Sourashtra script. Sourashtra has a ISO 639-3 language code saz and  Unicode range  U+A880 – U+A8DF

Recently Sourashtra wikipedia project was started in the wikimedia incubator : http://incubator.wikimedia.org/wiki/Wp/saz and Mediawiki localization started in translatewiki Since the language did not had any proper fonts or input tools, this was not going well.

When we add a  new language support in Mediawiki or start a new language wikipedia,  we need to develop the language technology ecosystem to support its growth. This ecosystem comprises of Unicode code points for the script, proper fonts, rendering support,  input tools, availability of these fonts and input tools in operating systems or alternate ways to get it working in operating system etc.

Sourashtra language had a unicode font developed by Prabu M Rengachari, named ‘Sourashtra’ itself. The font had problems with browsers/operating systems. We fixed to make it work properly. The font was not licensed properly. Prabu agreed to release it in GNU GPLV3 license with font exception. He also agreed to rename the font to another name other than the script name itself.

The font was renamed to Pagul, meaning ‘Footstep’ in Sourashtra and hosted in sourceforge

Once we have a font with proper license, we wanted it to be available in operating systems. I filed a packaging request in Debian. Vasudev Kamath of Debian India Team packaged it and now it is available in debian unstable(sid).  Parag Nemade of Fedora India packaged the font for Fedora and will be avialable in upcoming Fedora 15.

To add a new language support in operating system, we need a locale definition. In GNU Linux this is GLibc locale definition. With the help of Prabu, I prepared the saz_IN locale file for glibc, and filed as bug report to add to glibc. I hope, soon it will be part of Glibc.

Well, all of these was possible since it was GNU/Linux or Free software. Things are a bit difficult on the other side, proprietary operating system world. There is nothing we can do with those operating systems. Since there is no ‘market’ for these minority language, it won’t come to the priority of those companies to add support for these languages. Users will see squares or question marks when they visit sourashtra wikipedia.

We are working on a solution for this, not only for sourashtra, but a common solution for all languages. We are developing a webfonts extension for Mediawiki to provide font embedding in wiki pages to avoid the necessity of having fonts installed in user’s computers. The extension is in development and one can preview it in my test wiki. For Sourashtra, we added webfonts support(preview) .

Input tools needs to be developed and packaged. For mediaiwki, with the help of Narayam extension, we can easily add this support.

With the silpa project, I added a server side, PDF/PNG/SVG rendering support for Sourashtra as well.