Hyphenation of Indian Languages in Webpages

In my last blogpost I explained hyphenation of Indian language text in openoffice. In this blogpost I will explain how hyphenation can be done in webpages.

As I explained importance of hyphenation come into picture when we justify the text. The length of the lines are controlled by the parent tags…. Unicode had defined a special character called soft hyphen for hyphenation denoted by ­ . In HTML, the plain hy­phen is rep­re­sent­ed by the “-” char­ac­ter (- or-). The soft hy­phen is rep­re­sent­ed by the char­ac­ter en­ti­ty ref­er­ence ­ (­ or ­)

User agents-browsers can break the line whenever a soft hyphen is found. So if we have a javascript based implemenation, which insert the softhyphen in between the words based on language specific rules, we can achieve hyphenation in webpages too.

Hyphenator is a project which does exactly the same. “Hyphenator.js brings client-side hyphenation of HTML-Documents on to every browser by inserting soft hyphens using hyphenation patterns and Frank M. Liangs hyphenation algorithm commonly known from LaTeX and Openoffice. “

Hyphenator was not tested for any non-latin languages so far. I tried to add support for Indian languages and the result was satisfactory. I used the
same rules I defined for openoffice. Unlike latin languages, the number of hyphenation patterns for Indian languages is very less and the performance is good because of that.

I have added Malayalam, Tamil, Hindi, Oriya, Kannda, Telugu, Bengali, Gujarati and Panjabi support to it. You can see a working example here. (I wanted to embed one example here. But livejournal doesnot allow javascript inside blog body ). The column layout is done by CSS. Try resizing the browser windows and try a print preview too..

Don’t forget to read the source code of that page. It is very simple. If you want hyphenation in your webpage, all you need is to include the javascript as done in the example. We need to provide the lang attributes for nodes so that the required patterns for that language can be loaded. I placed the new language patterns temporarily in download area of SMC. I will ask the author of Hyphenator to include it in upstream itself. Code is available here

Update(18-Dec-2008):Thanks to Mathias Nater, author of hyphenator, the patterns were added to upstream.

5 thoughts on “Hyphenation of Indian Languages in Webpages”

  1. Empty Squares

    In the hyphanated text,I got the last but one language part as empty squares.What could be the reason?
    Mahesh Mangalat

Leave a Reply

Your email address will not be published. Required fields are marked *