N-gram Visualization Experiment

Following image shows the python-graphviz generated visualization of N-Gram representation of first paragraph this article from Hindi wikipedia. The image represents the possible paths through which a sentence can be constructed if we start from a word भारत.

Click to view the enlarged image

Localization: What are we missing?

[This blog post is kind of self criticism and written not forgetting the valuable contribution that l10n communities are doing. ] Some observations on the Localized desktops in Indian Languages Not all localization team members try the application that he/she translate at least once before working on the PO file. Result: If somebody does the localization without understanding what the application does and try the en_US interface, he/she miss the context of the strings. [Read More]

Updates…

Praveen prepared videos from the matrix screen savers in 6 languages This video is translated to Malayalam. For those who are interested in how to do that refer this I prepared the glibc collation table for Malayalam . But still some more bugs to be fixed We friends are working on adding Saka year system to KDE calendar system and it is almost ready . And here is the video : Saka calendar in KDE Dict based english-malayalam dictionary is in developement and we are ready for a beta release. [Read More]

മലയാളം അകാരാദിക്രമം

സ്വതന്ത്ര പ്രവര്‍ത്തകസംവിധാനങ്ങള്‍ക്കായി തയ്യാറാക്കിയ glibc (Gnu C Library ) അകാരാദിക്രമത്തിന്റെ(Collation) വിശദവിവരങ്ങള്‍ താഴെക്കൊടുക്കുന്നു. അഭിപ്രായങ്ങള്‍ അറിയിക്കുക. താഴെപ്പറയുന്ന നിയമങ്ങളുടെ അടിസ്ഥാനത്തിലാണു് മലയാളം അകാരാദിക്രമം തയ്യാറാക്കിയിരിക്കുന്നതു്. അക്ഷരമാലാക്രമം പിന്തുടരുക. അനുസ്വാരം മയുടെ സ്വരസാന്നിദ്ധ്യമില്ലാത്ത രൂപമായി പരിഗണിച്ചു് മയുടെ തൊട്ടുമുന്നില്‍ ക്രമീകരിയ്ക്കുക. പംപ < പമ്പ എന്ന പോലെ . ഓരോ വ്യഞ്ജനവും അതിന്റെ സ്വരസാന്നിദ്ധ്യമില്ലാത്ത രൂപത്തിന്റെ കൂടെ അകാരം ഉള്ള രൂപമായി കണക്കാക്കുക. അതായതു് ത എന്നതു് ത് എന്ന സ്വരസാന്നിദ്ധ്യമില്ലാത്ത വ്യഞ്ജനത്തിന്റെ കൂടെ അകാരം ഉള്ള രൂപമാണു്. ത = ത് + അ . താ = ത് + ആ എന്നിങ്ങനെ. ഇതില്‍ നിന്നും ത് < ത എന്നു വ്യക്തമാകുന്നു. [Read More]

KDE Indic Screensavers

I ported all of the Matrix screensavers with Indian language glyphs to KDE4. For details about the screensavers please read: Hacking the GLMatrix screensaver Screensavers in your language Download the binary packages: Deb package, and RPM package There are 6 screensavers in that package, for Malayalam, Hindi, Oriya , Bengali, Tamil and Gujarati. After installation, goto KDE system settings->Desktop->Screensaver and select any of this. Screenshots(click to get the image in original size): [Read More]

Hyphenation of Indian Languages in Webpages

In my last blogpost I explained hyphenation of Indian language text in openoffice. In this blogpost I will explain how hyphenation can be done in webpages. As I explained importance of hyphenation come into picture when we justify the text. The length of the lines are controlled by the parent tags…. Unicode had defined a special character called soft hyphen for hyphenation denoted by ­ . In HTML, the plain hy­phen is rep­re­sent­ed by the “-” char­ac­ter (- or-). [Read More]

Hyphenation of Indian Languages and Openoffice

What is Hiphenation? Hyphenation is the process inserting hyphens in between the syllables of a word so that when the text is justified, maximum space is utilized. Hiphenation is an important feature that DTP softwares provide. For Indian languages there is no good DTP softwares available. XeTex is the only choice to work with unicode and professional quality page layout. But xetex and DTP are not exactly same. Inkscape can be used as temporary solution. [Read More]

Yahoo search bug

None of the search engines can handle Indian languages very well. Google removes the zero width joiners, non joiners , that are used in many languages. Yahoo doesnot remove it. But a UI bug in webpage makes the results wrong.. See the below image: The bottom half of the image is the source code. We can clearly see that the closing bold tag is placed in between the word instead of putting at the end of the word. [Read More]
Bugs  yahoo 

KDE spellchecker not working for Indian Languages

As I mentioned in my blog post on Language detection the sonnet spellchecker of KDE is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. And eventually we get all words wrong. This is the logic used in Sonnet to recognize the word boundaries [Read More]

Youtube to MPEG or Ogg video conversion

Here is the two line method to convert a youtube video to oggvorbis video. Locate clive and ffmpeg2theora in your package and install $clive <a href="http://in.youtube.com/watch?v=6JeZ5oeAEyU">http://in.youtube.com/watch?v=6JeZ5oeAEyU </a>(replace this with the youtube address you want) It will create a flv file. Convert to mpeg video file $ffmpeg -i AmericaAmerica.flv AmericaAmerica.mpg Convert to ogg video file $ffmpeg2theora AmericaAmerica.mpg (replace it with the name of the flv file the previous command created) Done. You can see the . [Read More]