Skip to content

{ Monthly Archives } November 2008

KDE spellchecker not working for Indian Languages

As I mentioned in my blog post on Language detection the sonnet spellchecker of KDE is not working. I read the code of the Sonnet and found that it fails to determine the word boundaries in a sentence (or string buffer) and passes the parts of the words to backend spellcheckers like aspell or hunspell. […]

Tagged ,

Youtube to MPEG or Ogg video conversion

Here is the two line method to convert a youtube video to oggvorbis video. Locate clive and ffmpeg2theora in your package and install $clive http://in.youtube.com/watch?v=6JeZ5oeAEyU (replace this with the youtube address you want) It will create a flv file. Convert to mpeg video file $ffmpeg -i AmericaAmerica.flv AmericaAmerica.mpg Convert to ogg video file $ffmpeg2theora AmericaAmerica.mpg […]

Dhvani 0.94 Released

A new version of Dhvani -The Indian Language Text to Speech System is available now. The new version comes with the following improvements/features Support for 11 languages- Hindi, Panjabi, Gujarati, Marati, Bengali, Oriya, Telugu, Kannada, Tamil , Malayalam and Pashto(Afganistan) Pitch and Tempo modification for speech Direct ogg-vorbis speech output and optional wav output format […]

Tagged

Language Detection and Spellcheckers

A few weeks back there was a discussion on #indlinux IRC channel about automatic language detection. The idea is, spellcheckers or any language tools should not ask the users to select a language. Instead, they should detect the language automatically. The idea is not new. There is a KDE bug hereand Ubuntu has this as […]

Tagged ,

Gedit plugin for showing unicode codepoints

While working with Unicode text, it is often required to get the Unicode code points of text for debugging. Using python, it is very easy to get the unicode codepoints of the text. Following examples illustrates it. >>> “സന്തോഷ്”.decode(“utf-8″) u’\u0d38\u0d28\u0d4d\u0d24\u0d4b\u0d37\u0d4d’ or >>> str=u”സന്തോഷ്” >>> print repr(str) u’\u0d38\u0d28\u0d4d\u0d24\u0d4b\u0d37\u0d4d’ Well, But we need to take python console […]

Tagged , ,