Blogs -

Hyphenation of Indian Languages in Webpages

Posted on December 17, 2008 | Santhosh Thottingal

In my last blogpost I explained hyphenation of Indian language text in openoffice. In this blogpost I will explain how hyphenation can be done in webpages. As I explained importance of hyphenation come into picture when we justify the text. The length of the lines are controlled by the parent tags…. Unicode had defined a special character called soft hyphen for hyphenation denoted by . In HTML, the plain hyphen is represented by the “-” character (- or-). [Read More]

Hyphenation of Indian Languages and Openoffice

Posted on December 14, 2008 | Santhosh Thottingal

What is Hiphenation? Hyphenation is the process inserting hyphens in between the syllables of a word so that when the text is justified, maximum space is utilized. Hiphenation is an important feature that DTP softwares provide. For Indian languages there is no good DTP softwares available. XeTex is the only choice to work with unicode and professional quality page layout. But xetex and DTP are not exactly same. Inkscape can be used as temporary solution. [Read More]

hack hyphenation openoffice

Yahoo search bug

Posted on December 6, 2008 | Santhosh Thottingal

None of the search engines can handle Indian languages very well. Google removes the zero width joiners, non joiners , that are used in many languages. Yahoo doesnot remove it. But a UI bug in webpage makes the results wrong.. See the below image: The bottom half of the image is the source code. We can clearly see that the closing bold tag is placed in between the word instead of putting at the end of the word. [Read More]

Bugs yahoo

KDE spellchecker not working for Indian Languages

Posted on December 1, 2008 | Santhosh Thottingal

As I mentioned in my blog post on Language detection the sonnet spellchecker of KDE is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. And eventually we get all words wrong. This is the logic used in Sonnet to recognize the word boundaries [Read More]

kde spell checker

Youtube to MPEG or Ogg video conversion

Posted on November 23, 2008 | Santhosh Thottingal

Here is the two line method to convert a youtube video to oggvorbis video. Locate clive and ffmpeg2theora in your package and install $clive <a href="http://in.youtube.com/watch?v=6JeZ5oeAEyU">http://in.youtube.com/watch?v=6JeZ5oeAEyU </a>(replace this with the youtube address you want) It will create a flv file. Convert to mpeg video file $ffmpeg -i AmericaAmerica.flv AmericaAmerica.mpg Convert to ogg video file $ffmpeg2theora AmericaAmerica.mpg (replace it with the name of the flv file the previous command created) Done. You can see the . [Read More]

Dhvani 0.94 Released

Posted on November 16, 2008 | Santhosh Thottingal

A new version of Dhvani -The Indian Language Text to Speech System is available now. The new version comes with the following improvements/features Support for 11 languages- Hindi, Panjabi, Gujarati, Marati, Bengali, Oriya, Telugu, Kannada, Tamil , Malayalam and Pashto(Afganistan) Pitch and Tempo modification for speech Direct ogg-vorbis speech output and optional wav output format C/C++ APIs for applications to use dhvani as a shared library. Generic driver for Speech-dispatcher and Integration to Orca through speech dispatcher Python binding through speech dispatcher Improved language detection algorithm Dhvani documentation is available here. [Read More]

dhvani

Language Detection and Spellcheckers

Posted on November 14, 2008 | Santhosh Thottingal

A few weeks back there was a discussion on #indlinux IRC channel about automatic language detection. The idea is, spellcheckers or any language tools should not ask the users to select a language. Instead, they should detect the language automatically. The idea is not new. There is a KDE bug hereand Ubuntu has this as an brainstorm idea. It seems M$ word already have this. A sample use case can be this: “While preparing a document in Openoffice, I want to write in English as well as in Hindi. [Read More]

language computing spell checker

Gedit plugin for showing unicode codepoints

Posted on November 12, 2008 | Santhosh Thottingal

While working with Unicode text, it is often required to get the Unicode code points of text for debugging. Using python, it is very easy to get the unicode codepoints of the text. Following examples illustrates it. ` “സന്തോഷ്”.decode(“utf-8”) u’\u0d38\u0d28\u0d4d\u0d24\u0d4b\u0d37\u0d4d’ ` or ` str=u"സന്തോഷ്" print repr(str) u’\u0d38\u0d28\u0d4d\u0d24\u0d4b\u0d37\u0d4d’ ` Well, But we need to take python console and type/paste the text etc..How can we make it more easy? What if pressing F12 key after selecting some text gives the codepoints? [Read More]

gedit hack plugin

Screensavers in your language

Posted on October 27, 2008 | Santhosh Thottingal

I had written a blog post about hacking the glmatrix screensaver with the glyphs of our languages. Now I have those screensavers in the following languages: Hindi : Deb Package , RPM Gujarati : Deb Package , RPM Bengali : Deb Package , RPM Oriya: Deb Package , RPM Tamil : Deb Package , RPM Malayalam: Deb Package , RPM Try it and enjoy !! ps: I used the default fonts of Fedora 9 for these. [Read More]

hack screensaver

Swanalekha M17N based Input Method for 11 Languages

Posted on October 27, 2008 | Santhosh Thottingal

Swanalekha is an Input method originally designed for Malayalam. It is works with scim. as well as m17n. The input method scheme is transliteration based and it has a unique feature of candidate list menu(which I will explain shortly). Now I have extended it to 10 other Indian languages. Before explaining how swanalekha is different from other phonetic/transliteration based input methods, let me explain some of the characteristics of transliteration. Transliteration based input methods were following a strict one to one mapping from english letters to another Indian language. [Read More]

scim swanalekha input_methods