I had to spend a few hours to debug a problem about fontconfig not identifiying a font for a language. Following the tradition of sharing the knowledge you acquired in hard way, let me note it down here for search engines.
The font that I am designing now has 3 style variants, thin, regular and bold. All has same family name. So if you set this family for whatever purpose, depending on context, thin, regular or bold versions will be picked up.
[Read More]
അധിക നിമിഷം (Leap second)
ഈ വരുന്ന ജൂണ് 30 നു് ഒരു പ്രത്യേകതയുണ്ടു്. ആ ദിവസത്തിന്റെ ദൈര്ഘ്യം 24 മണിക്കൂറും ഒരു സെക്കന്റും ആണു്. അധികം വരുന്ന ഈ ഒരു സെക്കന്റിനെ ലീപ് സെക്കന്റ് അല്ലെങ്കില് അധിക നിമിഷം എന്നാണു് വിളിക്കുന്നതു്. നമ്മള് സാധാരണ ഉപയോഗിക്കുന്ന കൈയില് കെട്ടുന്ന വാച്ചുകളിലോ ചുമര് ക്ലോക്കുകളിലോ ഒന്നും ഇതു കണ്ടെന്നു വരില്ല. അല്ലെങ്കിലും ഒരു സെക്കന്റിനൊക്കെ നമുക്കെന്തു വില അല്ലേ? പക്ഷേ അങ്ങനെ തള്ളിക്കളയാനാവില്ല ഈ അധിക സെക്കന്റിനെ. സെക്കന്റ് അളവില് കൃത്യത ആവശ്യമായ കമ്പ്യൂട്ടറുകളിലും ഉപകരണങ്ങളിലും ഇതു പ്രശ്നമുണ്ടാക്കാനുള്ള സാധ്യത വളരെ കൂടുതലായതുകൊണ്ടു് ജൂണ് 30, 11 മണി, 60 സെക്കന്റ് എന്ന സമയത്തെ, എന്നാല് ജൂലൈ 1 ആവാത്ത ആ നിമിഷത്തെ, നേരിടാന് ലോകമെമ്പാടുമുള്ള സാങ്കേതിക വിദഗ്ദ്ധര് കരുതിയിരിക്കുന്നു.
[Read More]
HOWTO: Wacom Bamboo CTH301K in Debian
This is a short documentation on getting Wacom Bamboo CTH301K working in Debian. I use Debian Sid with Linux kernel 3.16 at the time of writing this. But this should work with latest Ubuntu(14.04 or 14.10) and new kernels.
Wacom Bamboo CTH301K is an entry level touch pad with stylus – you can use it as a mouse, or drawing pad with stylus. It has multitouch features like pinch zoom and all.
[Read More]
Configurable node logger with winston
For an advanced logging system for nodejs applications, winston is very helpful. Winston is a multi-transport async logging library for node.js. Similar to famous logging systems like log4j, we can configure the log levels and winston allows to define multiple logging targets like file, console, database etc.
I wanted to configure logging as per usual nodejs production vs development environment. Of course with development mode, I am more interested in debug level logging and at production environment I am more interested in higher level logs.
[Read More]
NotoSansMalayalam and nta
NotoSansMalayalam has the following ligature rules for ന്റ (nta)- All uses Akhand Opentype feature
uni0D7B(ൻ) + uni0D4D(്) + uni0D31(റ) => ൻ + ് + റ uni0D28(ന) + uni0D4D(്) + uni200D(ZWNJ) +uni0D31(റ) => ന് + റ uni0D28(ന) + uni0D4D(്) + uni0D31(റ) => ന് +റ The first one is what is defined in Unicode chapter 09 section 9.9[pdf]. The second is what Microsoft Kartika used to use for /nta/ as a bug. The last one is what all other fonts follows.
[Read More]
Hyphenation in web
This is a follow up of a 4 year old blog post about hyphenation. Hyphenation allows the controlled splitting of words to improve the layout of paragraphs, typically splitting words at syllabic or morphemic boundaries and visually indicating the split (usually with a hyphen).
I wrote about how a webpage can use Hyphenator javascript library to achieve hyphenation for a text with ‘justify‘ style. Along with the hyphenation rules I wrote for many Indian languages, this solution works and some websites already use it.
[Read More]
Malayalam Wikisource Offline version
Malayalam Wikisource community today released the first offline version of Malayalam wikisource during the 4th annual wiki meetup of Malayalam wikimedians. To the best of our knowledge, this is the first time a wikisource project release its offline version. Malayalam wiki community had released the first version of Malayalam wikipedia one year back.
Releasing the offline version of a wikisource is a challenging project. The technical aspects of the project was designed and implemented by myself.
[Read More]
On Machine Translation and God
I was reading an article named “Why Can’t a Computer Translate More Like a Person?” by Alan K. Melby. The article is about the challenges that machine translation technology face to reach a acceptable quality of translation. He explains the importance of culture sensitivity required for machine translation programs. Article lists a number of examples where MT can go wrong if context , culture etc are not taken into consideration.
[Read More]
PDFBox : Extract Text from PDF
Recently I had to extract text from PDF files for indexing the content using Apache Lucene. Apache PDFBox was the obvious choice for the java library to be used. Apache PDFBox is an opensource java library for working with PDF files. The PDFBox library allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities. There is no latest build available for PDFBox.
[Read More]