Libreoffice Malayalam Hyphenation

I had developed and released hyphenation extension for Malayalam in Openoffice years back. Libreoffice was born later. Eventhough libreoffice supported the openoffice extensions, the extension repository is freshly created for libreoffice. The old extensions were not present in the libreoffice repository.

Now, I have uploaded the Malayalam hyphenation extension in libreoffice extension repository too. I will explain the installation and configuration step by step in this blog post:

All Operating systems

  • Download an extension and save it anywhere on your computer.
  • In LibreOffice, select Tools -> Extension Manager from the menu bar.
  • In the Extension Manager dialog click Add.
  • A file browser window opens. Navigate to the folder where you saved the LibreOffice extension file(s) on your system. The extension’s files have the file extension ‘OXT’.
  • Find and select the extension you want to install and click Open.
  • If this extension is already installed, you’ll be prompted to press OK to confirm whether to overwrite the current version by the new one, or press Cancel to stop the installation.
  • After you are asked whether to install the extension only for your user or for all users. If you choose the Only for me option, the extension will be installed only for your user. If you choose For all users, you need system administrator rights. In this case the extension will be available for all users. In general, choose Only for me, that doesn’t require administration rights on the operating system.

Debian and Ubuntu

The above steps works for Debian and Ubuntu too. But there is a better way. Using your package manager install hyphen-ml package. This will install hyphenation not only for libreoffice, but for typesetting packages like LaTeX.

Using the hyphenation

  • To automatically hyphenate the current or selected paragraphs, choose Format – Paragraph, and then click the Text Flow tab.

    LIbreoffice Hyphenation
  • To manually Hyphenate Single Words, click in the word where you want to add the hyphen, and then press Ctrl+Hyphen(-).
  • To manually Hyphenate Text in a Selection Select the text that you want to hyphenate. Choose Tools – Language – Hyphenation.

For detailed help, read libreoffice hyphenation documentation

A hyphenated paragraph

Known Issues

Malayalam and several other languages does not use visible hypen(-) at the end of line when a word is broken. Currently there is no way to control this in libreoffice.

I had developed hyphenation patterns for 10 other Indian languages too. Yet to upload them to libreoffice repository. But they are readily available in Debian and Ubuntu. You can install them by choosing hyphen-* package.

 

FOSS migration of electronic circuit simulation lab

My proposal for migrating basic electronic circuit simulation lab to the FOSS tool eSim has been approved. The source code and documentation of experiments can now be downloaded from here.

eSim is an open source EDA tool for circuit design, simulation, analysis and PCB design. eSim is developed by FOSSEE (Free and Open Source Software for Education) – an initiative of MHRD, Govt. of India. FOSSEE promotes the migration of labs in educational institutions from proprietary tools to FOSS only ones through lab migration projects.

I am really happy to have become a part of this project. You can read my previous post on eSim usage here.

Fontconfig language matching

I had to spend a few hours to debug a problem about fontconfig not identifiying a font for a language. Following the tradition of sharing the knowledge you acquired in hard way, let me note it down here for search engines.

The font that I am designing now has 3 style variants, thin, regular and bold. All has same family name. So if you set this family for whatever purpose, depending on context, thin, regular or bold versions will be picked up. Regular is expected by default. Also when you pick the font from font selectors, you would expect, regular being selected by default.

The problem I was facing is, instead of Regular, Bold was getting selected as default. In font selectors, Bold was listed first.

In GNU/Linux systems, this font matching and selection is done by fontconfig. I started with fc-match

$ fc-match MyFont
MyFontBold.otf: "MyFont" "Bold"

So that confirms the problem. After fiddling with os/2 properties , asking in fontconfig mailing list, and reading fontconfig documentation, I found that the lang property fontconfig calculates from Regular variant of font does not include ‘en’

$ fc-list MyFont : family : style : lang 
MyFont:style=Bold:lang=aa|ay|bi|br|ch|en|es|eu|fj|fur|gd|gl|gv|ho|ia|id|ie|io|it|mg|ml|nl|nr|nso|oc|om|pt|rm|so|sq|ss|st|sw|tl|tn|ts|uz|vo|xh|yap|zu|an|fil|ht|jv|kj|kwm|li|ms|ng|pap-an|pap-aw|rn|r
w|sc|sg|sn|su|za 
MyFont:style=Regular:lang=aa|ay|bi|br|ch|da|de|es|et|eu|fi|fj|fo|fur|fy|gl|ho|ia|id|ie|io|is|it|ki|lb|mg|ml|nb|nds|nl|nn|no|nr|nso|ny|om|rm|sma|smj|so|ss|st|sv|sw|tl|tn|ts|uz|vo|vot|xh|yap|zu|an|f
il|ht|jv|kj|kwm|li|ms|na|ng|pap-an|pap-aw|rn|rw|sc|sg|sn|su|za

I tried to find how fontconfig calculates the languages supported by a font. The minimum set of code points to be included in a font so that fontconfig declare that it supports a given language is defined in the fontconfig library. You can find them in source code. For example, mandatory code points(glyphs that match to it) to be present for English is defined in en.orth file. I cross checked each code points and one was indeed missing from my regular font variant, but bold version had everything. When I added it, all started working normally.

Later fontconfig developer Akira TAGOH told me that I can also use fc-validate to check the language coverage

$ fc-validate --lang=en MyFont.otf
MyFont.otf:0 Missing 1 glyph(s) to satisfy the coverage for en language

And after adding the missing glyph

$ fc-validate --lang=en MyFont.otf
MyFont.otf:0 Satisfy the coverage for en language

And now fc-match list Regular as default style

$ fc-match MyFont
MyFont.otf: "MyFont" "Regular"

Typesetting Malayalam using XeTeX

XeTeX is an extension of TeX with built-in support for Unicode and OpenType. In this tutorial, we are going to learn how to typeset Malayalam using XeTeX. With some learning effort, we can produce high quality typesetting using XeTeX. 

Installing XeTeX

XeTeX is packaged for all famous GNU/Linux distros. The installation method depends your distro. For ease of installation and configuration, we suggest to use a TeXLive version 2012 or above – either standalone TeXLive distribution or install from your distribution’s package manager. Windows and OSX versions are also available.

Following packages are required to install to get a working xetex environment in your computer. Note that these packages are relatively large in size and will take time and bandwidth.

  1. texlive-xetex
  2. texlive-latex-extra
  3. texlive-lang-indic

You also need reasonably good unicode compatible Malayalam fonts. These fonts also comes with GNU/Linux distros. Search for malayalam fonts in your package manager and install if not already installed. Eg fonts: Meera, Rachana etc.

Creating documents using XeTeX

A simple document to learn usage of xetex is given below.

Using a text editor like gedit or kate, create a new file with .tex as file extension. Eg: example.tex. Copy the following content as the content for that file and save.

\documentclass[11pt]{article}
\usepackage{fontspec}
\usepackage{polyglossia}
\setdefaultlanguage{malayalam}
\setmainfont[Script=Malayalam, HyphenChar="00AD]{Rachana}
% In the above line we customized Hyphenation characters since
% visbile hyphen, aka Soft Hyphen is not used for Malayalam
\lefthyphenmin=3
\righthyphenmin=4
\title{\textbf{സ്വർണം}}
\author{മലയാളം വിക്കിപീഡിയ}
\date{}
\begin{document}

\maketitle

\section{സ്വർണം}

മൃദുവും തിളക്കമുള്ളതുമായ മഞ്ഞലോഹമാണ് സ്വർണം. വിലയേറിയ ലോഹമായ സ്വർണം, നാണയമായും, ആഭരണങ്ങളുടെ രൂപത്തിലും നൂറ്റാണ്ടുകളായി മനുഷ്യൻ ഉപയോഗിച്ചു പോരുന്നു. 
ചെറിയ കഷണങ്ങളും തരികളുമായി സ്വതന്ത്രാവസ്ഥയിൽത്തന്നെ പ്രകൃതിയിൽ ഈ ലോഹം കണ്ടുവരുന്നു. ലോഹങ്ങളിൽ വച്ച് ഏറ്റവും നന്നായി രൂപഭേദം വരുത്താവുന്ന ലോഹമാണിത്.
\footnote{http://www.webelements.com/webelements/elements/text/Au/key.html "Key properties of gold" (in ഇംഗ്ലീഷ്). ശേഖരിച്ചത് 2007-06-18.}

\section{ഗുണങ്ങൾ}
സ്വർണത്തിന്റെ അണുസംഖ്യ 79-ഉം പ്രതീകം Au എന്നുമാണ്. ഔറം എന്ന ലത്തീൻ വാക്കിൽ നിന്നാണ് Au എന്ന പ്രതീകം ഉണ്ടായത്.
ഏറ്റവും നന്നായി രൂപഭേദം വരുത്താൻ സാധിക്കുന്ന ലോഹമാണ് സ്വർണ്ണം. ഒരു ഗ്രാം സ്വർണ്ണം അടിച്ചു പരത്തി ഒരു ചതുരശ്രമീറ്റർ വിസ്തീർണ്ണമുള്ള ഒരു തകിടാക്കി മാറ്റാൻ സാധിക്കും. 
അതായത് 0.000013 സെന്റീമീറ്റർ വരെ ഇതിന്റെ കനം കുറക്കാൻ കഴിയും. അതു പോലെ വെറും 29 ഗ്രാം സ്വർണ്ണം ഉപയോഗിച്ച് 100 കിലോ മീറ്റർ നീളമുള്ള കമ്പിയുണ്ടാക്കാനും സാധിക്കും. 

\section{ചരിത്രം}
ചരിത്രാതീത കാലം മുതൽക്കേ അറിയപ്പെട്ടിരുന്ന അമൂല്യലോഹമാണ്‌ സ്വർണ്ണം. ഒരുപക്ഷേ മനുഷ്യൻ ആദ്യമായി ഉപയോഗിച്ച ലോഹവും ഇതുതന്നെയായിരിക്കണം.
ബി.സി.ഇ. 2600 ലെ ഈജിപ്ഷ്യൻ ഹീറോഗ്ലിഫിക്സ് ലിഖിതങ്ങളിൽ ഈജിപ്തിൽ സ്വർണ്ണം സുലഭമായിരുന്നെന്ന് പരാമർശിക്കുന്നുണ്ട്.
ചരിത്രം പരിശോധിച്ചാൽ ഈജിപ്തും നുബിയയുമാണ്‌ ലോകത്തിൽ ഏറ്റവുമധികം സ്വർണ്ണം ഉല്പ്പാദിപ്പിച്ചിരുന്ന മേഖലകൾ. ബൈബിളിലെ പഴയ നിയമത്തിൽ സ്വർണ്ണത്തെപ്പറ്റി പലവട്ടം പരാമർശിക്കുന്നുണ്ട്.

\end{document}

Now you need to compile this document to generate PDF.

xelatex example.tex

Output of the above content can be seen here.

The above tutorial is a very basic tutorial on using XeTeX with Malayalam. For detailed tutorial, please refer any tutorial available freely in internet. Example: https://en.wikibooks.org/wiki/LaTeX