Inkscape hyphenation extension

One year back I wrote about how to use Inkscape as a workaround solution for DTP in indic scripts. Still we don’t have any DTP software which supports Indic scripts in Unicode. Scribus still does not have the Indic support.

One issue with inkscape when used as DTP for indic script was, a few indic scripts always wanted hyphenation when text is justified. For example Malayalam has lengthy words and often space is wasted in lines if the text is not automatically hyphenated. But this feature was not available in inkscape. There is a wishlist bug for adding this feature to Inkscape.  I tried to develop an extension for Inkscape to achieve this.

It is on top of the python hyphenation code written by Wilbert  Berendsen. The hyphenation rules, also called as patterns is TeX or
Openoffice itself. So  I can support any language which has TeX hyphenation rules. But, since the hyphenation rules are language specific we need a language selection mechanism for the text first. Then only we can select the rules and do the hyphenation. But it is very tricky to implement.  Asking the language of the text every time it is justified is not a good idea. Setting a language for document is another choice, but what if the text contains multiple languages?  But for Indian languages it is very easy, we can automatically detect the scripts using unicode codepoints and load the rules accordingly. So for the time being, my extension support only English and all Indian languages.

Download the extension from http://thottingal.in/projects/inkscape_hyphenation/inkscape-hyphenation.zip . In GNU/Linux machines,  extract the zip file and copy to /usr/share/inkscape/extensions folder. In Windows , extract to [inkscape installation directory]\extensions folder.  After this close and reopen inkscape. You will see a menu named Hyphenate in Effects->Text menu.    In the document, add a text field, enter text in any indian language. Select the text and apply hyphenation by Effects->Text->Hyphenate. Then change the alignment of text to justify. You will see the text get hyphenated and occupying maximum possible space in the text field

I got satisfactory result with Malayalam and Tamil. I did not test other languages. Following images illustrates hyphenated, justified, two column layout of text done in Inkscape

Malayalam Hyphenation In inkscape
<dd class="wp-caption-dd">
  Malayalam Hyphenation In inkscape
</dd>
Tamil Hyphenation in Inkscape
<dd class="wp-caption-dd">
  Tamil Hyphenation in Inkscape
</dd>

We had a discussion about this in inkscape mailing list . Some developers suggested to have this feature built in, not as extension.  There are few issues to be solved for that. One thing is language selection as I explained. The other issue is regarding the hyphenation character to be used. Unicode standard insists to use soft hyphen – u00AD as hyphenation character. This is an invisible character. For Malayalam, visible hyphens are not required. But some other languages require the hyphen sign where the word is broken at the end of the line. The rules for whether the soft hyphen should be visible or not visible is not clear in Unicode’s specification. Pango never displays a the soft hyphen. There are criticism on this specification of softhyphen

So I think there is something to be done from Rendering engine or Unicode need to clarify the confusions.  But Openoffice and HTML rendering engines always make soft hyphen at the end of the line, which is not desired for some languages.

Try this extension, let me know the comments. For small scale DTP works, such as pamphlets, notices, brochures  inkscape is enough. But since inkscape is not primarily a DTP software and does not have paging support, for books and large scale DTP works, it may not work well.

comments powered by Disqus