Hyphenation in web

This is a follow up of a 4 year old blog post about hyphenation. Hyphenation allows the controlled splitting of words to improve the layout of paragraphs, typically splitting words at syllabic or morphemic boundaries and visually indicating the split (usually with a hyphen).

I wrote about how a webpage can use Hyphenator javascript library to achieve hyphenation for a text with ‘justify‘ style. Along with the hyphenation rules I wrote for many Indian languages, this solution works and some websites already use it. The Hyphenator library helps to insert Soft hyphens in appropriate positions inside the text.

[Example showing the difference between Malayalam text hyphenated and not hyphenated. You can see lot of line space wasted with white space in non-hyphenated text][1]
Example showing the difference between Malayalam text hyphenated and not hyphenated. You can see lot of line space wasted with white space in non-hyphenated text

 

More recently browsers such as Firefox, Safari and Chrome have begun to support the CSS3 hyphens property, with hyphenation dictionaries for a range of languages, to support automatic hyphenation.

For hyphenation to work correctly, the text must be marked up with language information, using the language tags described earlier. This is because hyphenation rules vary by language, not by script. The description of the hyphens property in CSS says “Correct automatic hyphenation requires a hyphenation resource appropriate to the language of the text being broken. The user agents is therefore only required to automatically hyphenate text for which the author has declared a language (e.g. via HTML lang or XML xml:lang) and for which it has an appropriate hyphenation resource.”

CSS Example

-webkit-hyphens: auto;
-moz-hyphens: auto;
-ms-hyphens: auto;
-o-hyphens: auto;
hyphens: auto;

Browser Compatibility

  • Chrome 13+ with -webkit prefix
  • Firefox 6.0+ with -moz prefix
  • IE 10+ with -ms prefix.

Hyphenation rules

CSS Text Level 3 does not define the exact rules for hyphenation, however user agents are strongly encouraged to optimize their line-breaking implementation to choose good break points and appropriate hyphenation points.

Firefox has hyphenation rules for about 40 languages. A complete list of languages supported in FF and IE is available at Mozilla wiki

You can see that none of the Indian languages are listed there. Hyphenation rules can be reused from the TeX hyphenation rules.  Jonathan Kew was importing the hyphenation rules from TeX and I had requested importing the hyphenation rules for Indian languages too.  But that was more than a year back, not much progress in that. Apparently there was a licensing issue with derived work but looks like it is resolved already.

CSS4 Text

While this is all well and good, it doesn’t provide the fine grain control you may require to get professional results. For this CSS4 Text introduce more features.

  • Limiting the number of hyphens in a row using hyphenate-limit-lines. This property is currently supported by IE10 and Safari, using the -ms- and -webkit- prefix respectively.
  • Limiting the word length, and number of characters before and after the hyphen using hyphenate-limit-chars
  • Setting the hyphenation character using hyphenate-character. Helps to override the default soft hyphen character

More reading

PS: Sometimes hyphenation can be very challenging. For example hyphenating the 746 letter long name of Wolfe+585, Senior.

comments powered by Disqus