Hyphenation of Indian languages

The latest version of Firefox - Firefox 97 - supports hyphenation of Indian languages. I had filed a bug report to include the hyphenation patterns I prepared in Firefox. That 6 year old bug report is now resolved.

Hyphenation is the process inserting hyphens in between the syllables of a word so that when the text is justified, maximum space is utilized.

Following languages are supported:

  • Assamese
  • Bengali
  • Gujarati
  • Hindi
  • Kannada
  • Malayalam
  • Marati
  • Odia
  • Panjabi
  • Tamil
  • Telugu

I had written several articles about how to do hyphenation for Indian languages in various applications. Now that Firefox also gets hyphenation support, I would like to summarize the hyphenation support in applications here.

Web browsers

Hyphenation in web browsers is supported by the css hyphens property. With proper lang attribute annotation in the html elements, the following style declaration in CSS makes the text justified with hyphenation.

text-align: justify;
hyphens: auto;

All browsers support the hyphens property, but the availability of hyphenation patterns for languages vary across browsers. I already mentioned that Firefox added Indian language hyphenation in Firefox 97. Chromium had added support for Indian languages in 2020 August. So all chromium based browsers also gets it, including Brave, Chrome and Edge.

The usage of visible hyphens is not common in Indian languages. The above CSS will produce visible hyphens at the word breaks. To avoid that CSS has hyphenate-character property. But Firefox does not support it. Chromium based browsers has the prefixed property -webkit-hyphenate-character. Providing an empty string as the value avoid visible hyphens.

text-align: justify;
hyphens: auto;
-webkit-hyphenate-character: '';
No hyphenation Visible hyphens Invisible hyphens

For other browsers or old versions a javascript library named Hyphenopoly can be used.

LibreOffice

Linux distros comes with hyphen-* packages that contain hyphenation patterns for each languages. Once these packages are present, LibreOffice can use them for hyphenation.

Android

Since March 2018, Android has the same hyphenation patterns.

XeLaTeX

Polyglossia package provides hyphenation for XeLaTex.

I have written a tutorial on how to use hyphenation with XeLaTeX typesetting.

Adobe Indesign

Indesign CC 2018 comes with Hunspell hyphenation dictionaries. I had written a tutorial on using hyphenation in Indesign. I am not sure about the latest versions of Indesign and its hyphenation support.

Scribus

In 2017, Scribus added hyphenation patterns for Malayalam and later other languages too. I had written a tutorial on scribus and hyphenation

Useful resources

Feedback

The hyphenation rules were based on inputs from native language speakers and language experts. But I do not claim they are 100% accurate. Also, these rules are based on characters and their context with in a word. There is a valid argument that hyphenation should also respect the meaning change, if any, resulting from the words formed by splitting a bigger word. That is beyond the scope of these patterns. But some of the application listed above provides ways to provide exception dictionaries(For example, please see Adapting Hyphenation section in the polyglossia manual).

Please contact me or use the issue tracker in source code repository to report bugs or provide suggestions for improvement. Thanks in advance.

comments powered by Disqus