Blogs -

Openoffice Indic Regional Language group

Posted on May 27, 2009 | Santhosh Thottingal

We just formed Indic Regional Language group for Openoffice. This is as per the Openoffice Native Language Consortium Plans. The objectives of such groups can be read from here. Basically the group is meant for better coordination among Indic languages to make Openoffice experience in our language better. The announcement of this group is here Thanks to Charles-H. Schulz, we got a mailing list indic@native-lang.openoffice.org. To subscribe login to http://native-lang.openoffice.org [Read More]

openoffice

In solidarity

Posted on May 13, 2009 | Santhosh Thottingal

politics

“ക്ടാവ്” Slang converter തയാറാവുന്നു

Posted on April 1, 2009 | Santhosh Thottingal

ചങ്ങാതിമാരേ, കേരളത്തിലെ രസകരമായ പ്രാദേശിക ഭാഷാ ഭേദങ്ങളെക്കുറിച്ചു് നിങ്ങള്‍ക്കെല്ലാമറിയാമല്ലോ? തിരുവനന്തപുരം, കോട്ടയം, തൃശ്ശൂര്‍, ഷൊര്‍ണ്ണൂര്‍, പാലക്കാട്, കോഴിക്കോട് കണ്ണൂര്‍, വയനാട് തുടങ്ങി നമുക്കു് വ്യത്യസ്തങ്ങളായ മലയാളത്തിന്റെ രൂപഭേദങ്ങളുണ്ടു്. അച്ചടി മലയാളത്തില്‍ നിന്നും വളരെയേറെ വ്യത്യസ്തമാണു് അവ. അച്ചടി മലയാളം കൊടുത്തു് സ്ഥലത്തിന്റെ പേരു കൊടുത്താല്‍ ആ പ്രദേശത്തെ മലയാളത്തിന്റെ രീതിയിലേക്കു അതിനെ മാറ്റിത്തരുന്ന ഒരു സോഫ്റ്റ്‌വെയര്‍ രസകരമാവില്ലേ? അത്തരത്തിലൊരു ശ്രമമാണു് “ക്ടാവ്” Slang converter എന്നു പേരിട്ടിരിക്കുന്ന പ്രൊജക്ട്. ഇതിന്റെ കൂടെ കൊടുത്തിരിക്കുന്ന സ്ക്രീന്‍ഷോട്ട് നോക്കൂ. ഡെവലപ്മെന്റ് പതിപ്പിന്റെ ചിത്രമാണതു്. കുറച്ചു നിയമങ്ങളുടെ അടിസ്ഥാനത്തില്‍ Natural Language Processing ന്റെ പുതിയ ശാഖയായ AMP(Ambiguous Language Processing) എന്ന വിദ്യ ഉപയോഗിച്ചാണു് ഇതു ചെയ്തിരിക്കുന്നതു്. [Read More]

Python isalpha is buggy

Posted on March 30, 2009 | Santhosh Thottingal

This code

#!/usr/bin/env python
# -*- coding: utf-8 -*-
ml_string=u"സന്തോഷ്  हिन्दी"
for ch in ml_string:
    if(ch.isalpha()):
        print ch

gives this output

സ
ന
ത
ഷ
ह
न
द

And fails for all mathra signs of Indian languages. This is a known bug in glibc.

Does anybody know whether python internally use glibc functions for this basic string operations or use separate character database llke QT does?

python

N-gram Visualization Experiment

Posted on March 29, 2009 | Santhosh Thottingal

Following image shows the python-graphviz generated visualization of N-Gram representation of first paragraph this article from Hindi wikipedia. The image represents the possible paths through which a sentence can be constructed if we start from a word भारत.

Click to view the enlarged image

experiment

Localization: What are we missing?

Posted on March 26, 2009 | Santhosh Thottingal

[This blog post is kind of self criticism and written not forgetting the valuable contribution that l10n communities are doing. ] Some observations on the Localized desktops in Indian Languages Not all localization team members try the application that he/she translate at least once before working on the PO file. Result: If somebody does the localization without understanding what the application does and try the en_US interface, he/she miss the context of the strings. [Read More]

localization

Updates…

Posted on January 11, 2009 | Santhosh Thottingal

Praveen prepared videos from the matrix screen savers in 6 languages This video is translated to Malayalam. For those who are interested in how to do that refer this I prepared the glibc collation table for Malayalam . But still some more bugs to be fixed We friends are working on adding Saka year system to KDE calendar system and it is almost ready . And here is the video : Saka calendar in KDE Dict based english-malayalam dictionary is in developement and we are ready for a beta release. [Read More]

മലയാളം അകാരാദിക്രമം

Posted on January 1, 2009 | Santhosh Thottingal

സ്വതന്ത്ര പ്രവര്‍ത്തകസംവിധാനങ്ങള്‍ക്കായി തയ്യാറാക്കിയ glibc (Gnu C Library ) അകാരാദിക്രമത്തിന്റെ(Collation) വിശദവിവരങ്ങള്‍ താഴെക്കൊടുക്കുന്നു. അഭിപ്രായങ്ങള്‍ അറിയിക്കുക. താഴെപ്പറയുന്ന നിയമങ്ങളുടെ അടിസ്ഥാനത്തിലാണു് മലയാളം അകാരാദിക്രമം തയ്യാറാക്കിയിരിക്കുന്നതു്. അക്ഷരമാലാക്രമം പിന്തുടരുക. അനുസ്വാരം മയുടെ സ്വരസാന്നിദ്ധ്യമില്ലാത്ത രൂപമായി പരിഗണിച്ചു് മയുടെ തൊട്ടുമുന്നില്‍ ക്രമീകരിയ്ക്കുക. പംപ < പമ്പ എന്ന പോലെ . ഓരോ വ്യഞ്ജനവും അതിന്റെ സ്വരസാന്നിദ്ധ്യമില്ലാത്ത രൂപത്തിന്റെ കൂടെ അകാരം ഉള്ള രൂപമായി കണക്കാക്കുക. അതായതു് ത എന്നതു് ത് എന്ന സ്വരസാന്നിദ്ധ്യമില്ലാത്ത വ്യഞ്ജനത്തിന്റെ കൂടെ അകാരം ഉള്ള രൂപമാണു്. ത = ത് + അ . താ = ത് + ആ എന്നിങ്ങനെ. ഇതില്‍ നിന്നും ത് < ത എന്നു വ്യക്തമാകുന്നു. [Read More]

glibc locale

KDE Indic Screensavers

Posted on December 22, 2008 | Santhosh Thottingal

I ported all of the Matrix screensavers with Indian language glyphs to KDE4. For details about the screensavers please read: Hacking the GLMatrix screensaver Screensavers in your language Download the binary packages: Deb package, and RPM package There are 6 screensavers in that package, for Malayalam, Hindi, Oriya , Bengali, Tamil and Gujarati. After installation, goto KDE system settings->Desktop->Screensaver and select any of this. Screenshots(click to get the image in original size): [Read More]

hack kde screensaver

Hyphenation of Indian Languages in Webpages

Posted on December 17, 2008 | Santhosh Thottingal

In my last blogpost I explained hyphenation of Indian language text in openoffice. In this blogpost I will explain how hyphenation can be done in webpages. As I explained importance of hyphenation come into picture when we justify the text. The length of the lines are controlled by the parent tags…. Unicode had defined a special character called soft hyphen for hyphenation denoted by . In HTML, the plain hyphen is represented by the “-” character (- or-). [Read More]

hack hyphenation javascript web