Experimenting eSim- A tool for Electronic Circuit Simulation

I did not have much exposure to open source Electronic Design Automation tools during my graduation course in Electronics and Communication Engineering. My institute had proprietary EDA tools in the lab and all my experiences were limited to them.  I must confess I never tried to explore the FOSS world for alternatives until I was in a need to offer a lab course on basic circuit simulation.

Web searches took me to the  design suite eSim . It  is an open source EDA tool for circuit design, simulation, analysis and PCB design. It is an integrated tool built using open source software such as KiCad and Ngspice. eSim is released under GPL. It’s GUI guides the user through the steps of schematic creation, netlist generation, PCB design and simulation. eSim source code iis hosted at: https://github.com/FOSSEE/eSim .

eSim is developed by FOSSEE (Free and Open Source Software for Education) – an initiative of MHRD, Govt. of India. FOSSEE promotes the migration of labs in educational institutions from proprietary tools to FOSS only ones through lab migration projects. The source code of lab experiments are crowd sourced from faculties and students under lab migration project. These are made available by FOSSEE under  Creative Commons Attribution-ShareAlike 4.0 International Licence.

My proposal for migrating the basic electronics lab to eSim is under review. There was good technical support from the eSim team during solving various experimental issues. The user’s guide for carrying out the experiments proposed under this project is published here.   It is under  Creative Commons Attribution-ShareAlike 4.0 India Licence.This guide provides solutions to specific simulation problems using eSim. Experimental procedures are explained with screen shots.

Have a look and propose suggestions. If you have ideas on improving the contents, feel free to contribute. Git repository of user guide: https://github.com/kavyamanohar/e-design-simulation-guide

 

അധിക നിമിഷം (Leap second)

ഈ വരുന്ന ജൂണ്‍ 30 നു് ഒരു പ്രത്യേകതയുണ്ടു്. ആ ദിവസത്തിന്റെ ദൈര്‍ഘ്യം 24 മണിക്കൂറും ഒരു സെക്കന്റും ആണു്. അധികം വരുന്ന ഈ ഒരു സെക്കന്റിനെ ലീപ് സെക്കന്റ് അല്ലെങ്കില്‍ അധിക നിമിഷം എന്നാണു് വിളിക്കുന്നതു്. നമ്മള്‍ സാധാരണ ഉപയോഗിക്കുന്ന കൈയില്‍ കെട്ടുന്ന വാച്ചുകളിലോ ചുമര്‍ ക്ലോക്കുകളിലോ ഒന്നും ഇതു കണ്ടെന്നു വരില്ല. അല്ലെങ്കിലും ഒരു സെക്കന്റിനൊക്കെ നമുക്കെന്തു വില അല്ലേ? പക്ഷേ അങ്ങനെ തള്ളിക്കളയാനാവില്ല ഈ അധിക സെക്കന്റിനെ. സെക്കന്റ് അളവില്‍ കൃത്യത ആവശ്യമായ കമ്പ്യൂട്ടറുകളിലും ഉപകരണങ്ങളിലും ഇതു പ്രശ്നമുണ്ടാക്കാനുള്ള സാധ്യത വളരെ കൂടുതലായതുകൊണ്ടു് ജൂണ്‍ 30, 11 മണി, 60 സെക്കന്റ് എന്ന സമയത്തെ, എന്നാല്‍ ജൂലൈ 1 ആവാത്ത ആ നിമിഷത്തെ, നേരിടാന്‍ ലോകമെമ്പാടുമുള്ള സാങ്കേതിക വിദഗ്ദ്ധര്‍ കരുതിയിരിക്കുന്നു.

ഈ അധിക നിമിഷം എവിടെനിന്നു വന്നു? വളരെ ചുരുക്കിപ്പറഞ്ഞാല്‍ ഭൂമിയുടെ കറക്കത്തിന്റെ വേഗത എല്ലാ കാലത്തും ഒരുപോലെയല്ലാത്തതുകൊണ്ടാണു് ഈ അഡ്ജസ്റ്റ് മെന്റ് വേണ്ടിവരുന്നതു്. ഭൂമിയുടെ കറക്കത്തിന്റെ വേഗത കുറയാന്‍ ഭൌമപാളികളുടെ ചലനങ്ങള്‍ അല്ലെങ്കില്‍ ഭൂചലനങ്ങള്‍ പ്രധാനകാരണമാണു് . ഭൂമിയുടെ കറക്കത്തെ അടിസ്ഥാനമാക്കി ഒരു ദിവസത്തെ 24 മണിക്കൂറുകളായും ഒരു മണിക്കൂറിനെ 60 മിനിറ്റായും ഓരോ മിനിറ്റിനെയും 60 സെക്കന്റായും വിഭജിച്ചാണല്ലോ നമ്മുടെ സമയം. ഇതിനെ ആസ്ട്രോണമിക്കല്‍ സമയം എന്നും വിളിക്കാം. പക്ഷേ കൃത്യതയാര്‍ന്ന സെക്കന്റിന്റെ നിര്‍വചനം ഈ വിഭജനങ്ങളെ ആസ്പദമാക്കിയല്ല ചെയ്തിരിക്കുന്നതു്. ഒരു സീസിയം-133 ആറ്റം, സ്ഥിരാവസ്ഥയിലിരിക്കുമ്പോൾ (Ground State) അതിന്റെ രണ്ട് അതിസൂക്ഷ്മസ്തരങ്ങൾ (Hyper Levels) തമ്മിലുള്ള മാറ്റത്തിനനുസരിച്ചുള്ള വികിരണത്തിന്റെ സമയദൈർഘ്യത്തിന്റെ, 9,192,631,770 മടങ്ങ് എന്നാണു് സെക്കന്റിന്റെ ശാസ്ത്രീയവും ഔദ്യോഗികവുമായ നിര്‍വചനം.

ലോകത്തിലെ ക്ലോക്കുകളെല്ലാം കൃത്യസമയം പാലിക്കുന്നതു് കോര്‍ഡിനേറ്റഡ് യൂണിവേഴ്സല്‍ ടൈം (UTC) സ്റ്റാന്‍ഡേഡ് അനുസരിച്ചാണു്. ഇതിനെ ആസ്പദമാക്കിയാണു് സമയമേഖലകളില്‍( Timezones) സമയം കണക്കാക്കുന്നതും കമ്പ്യൂട്ടറുകളിലെ സമയക്രമീകരണവും. ഗ്രീനിച്ച് മാനക സമയമടിസ്ഥാനമാക്കി ശാസ്ത്രലോകം അംഗീകരിച്ച സമയഗണനസമ്പ്രദായമാണു് UTC. ഇന്ത്യയിലെ സമയമേഖല UTC+5.30 എന്നാണു് കുറിക്കാറുള്ളതു്. ഗ്രീനിച്ച് സമയത്തില്‍ നിന്നും അഞ്ചരമണിക്കൂര്‍ കൂടുതല്‍ എന്ന അര്‍ത്ഥത്തില്‍. 1972 മുതല്‍ UTC, ഇന്റര്‍നാഷണല്‍ അറ്റോമിക് ടൈമിനെ പിന്തുടരുന്നു. ഇന്റര്‍നാഷണല്‍ അറ്റോമിക് ടൈം സീസിയം ആറ്റത്തിന്റെ വികിരണത്തെ അടിസ്ഥാനമാക്കിയാണു്.

ഞാന്‍ തുടക്കത്തില്‍ പറഞ്ഞ ജൂണ്‍ 30 നു് അധിക സെക്കന്റ് എന്നതു് UTC സമയമാണെന്നു വ്യക്തമാക്കട്ടെ. ശരിക്കും ഇന്ത്യയിലപ്പോള്‍ ജൂലൈ 1 രാവിലെ 5.30 ആയിരിക്കും.

നിത്യജീവിതത്തിലെ സമയം എന്ന ആശയം രാത്രി-പകല്‍ മാറ്റങ്ങളെ അടിസ്ഥാനമാക്കിയാണല്ലോ. UTC യും നിത്യജീവിതത്തിലെ ആവശ്യങ്ങള്‍ക്കുള്ളതായതുകൊണ്ടു് ഒരേ സമയം അറ്റോമിക് ടൈമിന്റെ കൃത്യത പാലിക്കാനും അതേ സമയം ഭൂമിയുടെ കറക്കത്തിനൊപ്പമാവാനും വേണ്ടിയാണു് ഇടക്ക് ഇങ്ങനെ സെക്കന്റുകള്‍ ചേര്‍ക്കുന്നതു്. ഇങ്ങനത്തെ 26-ാമത്തെ അഡ്ജസ്റ്റ്മെന്റ് ആണു് 2015 ജൂണ്‍ 30നു നടക്കാന്‍ പോകുന്നതു്. 2012 ജൂണ്‍ 30നായിരുന്നു അവസാനമായി ലീപ് സെക്കന്റ് വന്നതു്.

കൃത്യമായി പറഞ്ഞാല്‍ ജൂണ്‍ 30നു് പതിനൊന്നുമണി 59 മിനിറ്റ്, 59 സെക്കന്റ് കഴിഞ്ഞാല്‍ ജൂലൈ 1, 00:00:00 സമയം ആവേണ്ടതിനു പകരം ജൂണ്‍ 30, 11 മണി, 59 മിനിറ്റ്, 60 സെക്കന്റ് എന്ന സമയം നില്‍നില്‍ക്കും. അതിനു ശേഷമേ ജൂലൈ ആവൂ.

ലീപ് സെക്കന്റ് കുഴപ്പക്കാരനാവുന്നതു് പല രീതികളിലാണു്. കമ്പ്യൂട്ടറുകളില്‍ ഏതുതരത്തിലുള്ള ഓപ്പറേഷനുകളുടെ രേഖീയ ക്രമം(linear sequencing) ടൈം സ്റ്റാമ്പുകളെ അടിസ്ഥാനമാക്കിയാണു്. ഓപ്പറേറ്റിങ്ങ് സിസ്റ്റമാണു് ഈ മിടിപ്പുകള്‍(ticks) ഉണ്ടാക്കിക്കൊണ്ടു് അതിനുമുകളിലെ അപ്ലിക്കേഷനുകളെ സഹായിക്കുന്നതു്. മിടിപ്പുകളുടെ എണ്ണം മിനിറ്റ്, മണിക്കൂര്‍, ദിവസം ഒക്കെ കണക്കാക്കാന്‍ ഉപയോഗിക്കുമെന്നു പ്രത്യേകം പറയേണ്ടതില്ലല്ലോ. 12:59:60 നു ജൂലൈ ഒന്നാണോ ജൂണ്‍ 30 ആണോ തുടങ്ങിയ കണ്‍ഫ്യൂഷന്‍ മുതല്‍ എന്തൊക്കെ തരത്തിലുള്ള പ്രശ്നമാണു് ഇവ ഉണ്ടാക്കുന്നതെന്നു പറയാന്‍ കഴിയില്ല. ലിനക്സ് കെര്‍ണലില്‍ ഇതു കൈകാര്യം ചെയ്യാനുള്ള സംവിധാനം ഉണ്ടായിരുന്നെങ്കിലും 2012ലെ ലീപ് സെക്കന്റ് സമയത്തു് അതു് നേരാവണ്ണം പ്രവര്‍ത്തിച്ചില്ല. ജൂണ്‍ 30നു ന്യൂയോര്‍ക്ക് സ്റ്റോക് എക്ചേഞ്ച് ഒരു മണിക്കൂറോളം പ്രവര്‍ത്തനം നിര്‍ത്തുമെന്നു് അറിയിച്ചു കഴിഞ്ഞു.

വലിയ വെബ്സൈറ്റുകള്‍ ലീപ് സെക്കന്റിനെ നേരിടാന്‍ തയ്യാറെടുത്തുകഴിഞ്ഞു. വിക്കിപീഡീയ അതിന്റെ സെര്‍വറുകളില്‍ UTC ടൈമുമായുള്ള ഏകോപനം താത്കാലികമായി നിര്‍ത്തിവെച്ചു് ഹാര്‍ഡ്‌വെയര്‍ ക്ലോക്കില്‍ സെര്‍വറുകള്‍ ഓടിക്കും. ലീപ് സെക്കന്റ് ഒക്കെ കഴിഞ്ഞ ശേഷം സെര്‍വറുകളെ പല ഘട്ടങ്ങളിലായി വീണ്ടും UTC യുമായി ഏകോപിപ്പിക്കും. ഗൂഗിള്‍ വേറൊരു രീതിയാണു് ഉപയോഗിക്കുന്നതു്. അവര്‍ ലീപ് സെക്കന്റിനോടടുത്തു് വരുന്ന സെക്കന്റുകളെ കുറേശ്ശേ വലിച്ചു നീട്ടും, ചില്ലറ മില്ലി സെക്കന്റുകള്‍ അധികമുള്ള സെക്കന്റുകള്‍ എല്ലാം കൂടി കൂട്ടിവെച്ചാല്‍ ഒരു സെക്കന്റിന്റെ ഗുണം ചെയ്യും, അതേ സമയം പുതിയൊരു സെക്കന്റിന്റെ വരവ് ഇല്ലാതാക്കുകയും ചെയ്യും.

ഈ തലവേദന എങ്ങനെയെങ്കിലും ഒഴിവാക്കാനുള്ള ചര്‍ച്ചകളും ആരംഭിച്ചിട്ടുണ്ടു്. ഭൂമിയില്‍ നമ്മള്‍ ലീപ് സെക്കന്റ് കണക്കാക്കിയാലും നമ്മുടെ ബഹിരാകാശ നിരീക്ഷണങ്ങള്‍ക്കു് ആസ്ട്രോണമിക്കല്‍ ക്ലോക്ക് തന്നെ വേണമല്ലോ. ലീപ് സെക്കന്റ് എന്നു വേണം എന്നു് ഏകദേശം ആറുമാസം മുമ്പേ തീരുമാനിക്കാനും പറ്റു. International Earth Rotation and Reference Systems Service (IERS) ആണു് ലീപ് സെക്കന്റ് എപ്പോള്‍ വേണമെന്നു തീരുമാനിക്കുന്നതു്.

കൂടുതല്‍ വായനയ്ക്ക്:  https://en.wikipedia.org/wiki/Leap_second

Translating HTML content using a plain text supporting machine translation engine

At Wikimedia, I am currently working on ContentTranslation tool, a machine aided translation system to help translating articles from one language to another. The tool is deployed in several wikipedias now and people are creating new articles sucessfully.

The ContentTranslation tool provides machine translation as one of the translation tool, so that editors can use it as an initial version to improve up on. We used Apertium as machine translation backend and planning to support more machine translation services soon.

A big difference in editing using ContentTranslation, is it does not involve Wiki Markup. Instead, editors can edit rich text. Basically it is contenteditable HTML elements. This also means, what you translate is HTML sections of articles.

The HTML contains all possible markups that a typical Wikipedia article has. This means, the machine translation is on HTML content. But, not all MT engines support HTML content.

Some MT engines, such as Moses, output subsentence alignment information directly, showing which source words correspond to which target words.

$ echo 'das ist ein kleines haus' | moses -f phrase-model/moses.ini -t
this is |0-1| a |2-2| small |3-3| house |4-4|

The Apertium MT engine does not translate formatted text faithfully. Markup such as HTML tags is treated as a form of blank space. This can lead to semantic changes (if words are reordered), or syntactic errors (if mappings are not one-to-one).

$ echo 'legal <b>persons</b>' | apertium en-es -f html
Personas <b>legales</b>
$ echo 'I <b>am</b> David' | apertium en-es -f html
Soy</b> David 

Other MT engines exhibit similar problems. This makes it challenging to provide machine translations of formatted text. This blog post explains how this challenge is tackled in ContentTranslation.

As we saw in the examples above, a machine translation engine can cause the following errors in the translated HTML. The errors are listed in descending order of severity.

  1. Corrupt markup – If the machine translation engine is unaware of HTML structure, they can potentially move the HTML tags randomly, causing corrupted markup in the MT result
  2. Wrongly placed annotations – The two examples given above illustrate this. It is more severe if content includes links and link targets were swapped or randomly given in the MT output.
  3. Missing annotations – Sometimes the MT engine may eat up some tags in the translation process.
  4. Split annotations -During translation a single word can be translated to more than one word. If the source word has a mark up, say <a> tag. Will the MT engine apply the <a> tag wrapping both words or apply to each word?

All of the above issues can cause bad experience to translators.

Apart from potential issues with markup transfer, there is another aspect about sending HTML content to MT engines. Compared to plain text version of a paragraph, HTML version is bigger in terms of size(bytes). Most of these extra addition is tags and attributes which should be unaffected by the translation. This is unnecessary bandwidth usage. If the MT engine is a metered engine(non-free, API access is measured and limited), we are not being economic.

An outline of the algorithm we used to transfer markups from source content to translated content is given below.

  1. The input HTML content is translated into a LinearDoc, with inline markup (such as bold and links) stored as attributes on a linear array of text chunks. This linearized format is convenient for important text manipulation operations, such as reordering and slicing, which are challenging to perform on an HTML string or a DOM tree.
  2. Plain text sentences (with all inline markup stripped away) are sent to the MT engine for translation.
  3. The MT engine returns a plain text translation, together with subsentence alignment information (saying which parts of the source text correspond to which parts of the translated text).
  4. The alignment information is used to reapply markup to the translated text.

This make sure that MT engines are translating only plain text and mark up is applied as a post-MT processing.

Essentially the algorithm does a fuzzy match to find the target locations in translated text to apply annotations. Here also content given to MT engines is plain text only.

The steps are given below.

  1. For the text to translate, find the text of inline annotations like bold, italics, links etc. We call it subsequences.
  2. Pass the full text and subsequences to the plain text machine translation engine. Use some delimiter so that we can do the array mapping between source items(full text and subsequences) and translated items.
  3. The translated full text will have the subsequences somewhere in the text. To locate the subsequence translation in full text translation, use an approximate search algorithm
  4. The approximate search algorithm will return the start position of match and length of match. To that range we map the annotation from the source html.
  5. The approximate match involves calculating the edit distance between words in translated full text and translated subsequence. It is not strings being searched, but ngrams with n=number of words in subsequence. Each word in ngram will be matched independently.

To understand this, let us try the algorithm in some example sentences.

  1. Translating the Spanish sentence <p>Es <s>además</s> de Valencia.</p> to Catalan: The plain text version is Es además de Valencia.. And the subsequence with annotation is  además . We give both the full text and subsequence to MT. The full text translation is A més de València.. and the word  además  is translated as a més. We do a search for a més in the full text translation. The search will be successfull and the <s> tag will be applied, resulting <p>És <s>a més</s> de València.</p>.The seach performed in this example is plain text exact search. But the following example illustrate why it cannot be an exact search.
  2. Translating an English sentence <p>A <b>Japanese</b> <i>BBC</i> article</p> to Spanish. The full text translation of this is Un artículo de BBC japonés  One of the subsequenceJapanese will get translated as Japonés. The case of J differs and search should be smart enough to identify japonés as a match for Japonés. The word order in source text and translation is already handled by the algorithm. The following example will illustrate that is not just case change that happens.
  3. Translating <p>A <b>modern</b> Britain.</p> to Spanish. The plain text version get translated as Una Gran Bretaña moderna.  and the word with annotation modern get translated as  Moderno. We need a match for moderna and Moderno. We get <p>Una Gran Bretaña <b>moderna</b>.</p>. This is a case of word inflection. A single letter at the end of the word changes.
  4. Now let us see an example where the subsequence is more than one word and the case of nested subsequences. Translating English sentence <p>The <b>big <i>red</i></b> dog</p> to Spanish. Here, the subsequnce Big red is in bold, and inside that, the red is in italics. In this case we need to translate the full text, sub sequence big red and red. So we have,   El perro rojo grande as full translation, Rojo grande and Rojo as translations of sub sequences. Rojo grande need to be first located and bold tag should be applied. Then search for Rojo and apply Italic. Then we get <p>El perro <b><i>rojo</i> grande</b></p>.
  5. How does it work with heavily inflected languages like Malayalam? Suppose we translate <p>I am from <a href=”x”>Kerala<a></p> to Malayalam. The plain text translation is ഞാന്‍ കേരളത്തില്‍ നിന്നാണു്. And the sub sequence Kerala get translated to കേരളം. So we need to match കേരളം and കേരളത്തില്‍. They differ by an edit distance of 7 and changes are at the end of the word. This shows that we will require language specific tailoring to satisfy a reasonable output.

The algorithm to do an approximate string match can be a simple levenshtein distance , but what would be the acceptable edit distance? That must be configurable per language modules. And the following example illustrate that just doing an edit distance based matching wont work.

Translating <p>Los Budistas no <b>comer</b> carne</p> to English. Plain text translation is The Buddhists not eating meat. Comer translates as eat. With an edit distance approach, eat will match more with meat than eating. To address this kind of cases, we mix a second criteria that the words should start with same letter. So this also illustrate that the algorithm should have language specific modules.

Still there are cases that cannot be solved by the algorithm we mentioned above. Consider the following example

Translating <p>Bees <b>cannot</b> swim</p>. Plain text translation to Spanish is Las Abejas no pueden nadar and the phrase cannot translates as Puede no. Here we need to match Puede no andno pueden which of course wont match with the approach we explained so far.

To address this case, we do not consider sub sequence as a string, but an n-gram where n= number of words in the sequence. The fuzzy matching should be per word in the n-gram and should not be for the entire string. ie. Puede to be fuzzy matched with no and pueden, and no to be fuzzy matched wth no and pueden– left to right, till a match is found. This will take care of word order changes as welll as inflections

Revisiting the 4 type of errors that happen in annotation transfer, with the algorithm explained so far, we see that in worst case, we will miss annotations. There is no case of corrupted markup.

As and when ContentTranslation add more language support, language specific customization of above approach will be required.

You can see the algorithm in action by watching the video linked above. And here is a ascreenshot:

Translation of a paragraph from Palak Paneer article of Spanish Wikipedia to Catalan. Note the links, bold etc applied in correct position in translation at right side

If anybody interested in the code, See https://github.com/wikimedia/mediawiki-services-cxserver/tree/master/mt – It is a javascript module in a nodejs server which powers ContentTranslation.

Credits: David Chan, my colleague at Wikimedia,  for extensive help on providing lot of example sentences with varying complexity to fine tune the algorithm. The LinearDoc model that make the whole algorithm work is written by him. David also wrote an algorithm to handle the HTML translation using an upper casing algorithm, you can read it from here. The approximation based algorithm explained above replaced it.

Making of Keraleeyam font: From ASCII to Unicode

Keraleeyam is a new unicode malayalam font designed for titles.  It was originally designed in 2005 for ‘Keraleeyam’, a magazine supporting environmental movements in Kerala, with ASCII encoding and was distributed along  with Rachana editor software.

Unicode font feature tables for malayalam are complex, which include diverse rules for ligature formation and glyph positioning. Keraleeyam which was originally ASCII encoded, contained no such rules. It would have been a herculian task to manually add the rules for each glyph. Keraleeyam has 792 glyphs in it. Also rules needed to be duplicated to support both the latest and old open type specifications. It ensures that the font is rendered correctly by all applications in new and reasonably old operating systems.

Happy to say that font featuring was done without much difficulty as one would expect. Thanks to the existing unicode font Rachana with little known bugs and extensive glyph set of 1083 glyphs. And thanks to Hussain K. H. who designed and named every glyph with the same name as the corresponding glyph in Rachana. Rajeesh K. V. imported the feature tables of Rachana and applied it over Keraleeyam, in a semi- automated manner.

Then remained the optimization tasks of kerning and positioning. I contributed to such fine tuning stuff. The beta version of the Keraleeyam font was released as a part of 13th anniversary celebrations of Swathanthra Malayalam Computing by Murali Thummarukudi at Vylopilli Samskrithi Bhavan on 16th December 2014.

The project is hosted here. Seeking comments and feedbacks for the release of stable version soon.

 

New handwriting style font for Malayalam: Chilanka

A new handwriting style font for Malayalam is in development. The font is named as “Chilanka”(ചിലങ്ക).

This is a alpha version release. Following is a sample rendering.

More samples here.

You may try the font using this edtiable page http://smc.org.in/downloads/fonts/chilanka/tests/ -It has the font embedded

Download the latest version: http://smc.org.in/downloads/fonts/chilanka/Chilanka.ttf

Chilanka/ചിലങ്ക is a musical anklet

A brief note on the workflow I used for font development is as follows

  1. Prepared a template svg in Inkscape that has all guidelines and grid setup.
  2. Draw the glyphs. This is the hardest part. For this font, I used bezier tool of inkscape. SVG with stroke alone is saved. Did not prepare outline in Inkscape, this helped me to rework on the drawing several times easily. To visualize how the stroke will look like in outlined version, I set stroke width as 130, with rounded end points. All SVGs are version tracked. SVGs are saved as inkscape svgs so that I can retain my guidelines and grids.
  3. In fontforge, import this svgs and create the outline using expand stroke, with stroke width 130, stroke height 130,  pen angle 45 degree, line cap and line join as round.
  4. Simplify the glyph automatically and manually to reduce the impact of conversion of Cubic bezier to quadratic bezier.
  5. Metrics tuning. Set both left and right bearings as 100 units(In general, there are glyph specfic tuning)
  6. The opentype tables are the complex part. But for this font, it did not take much time since I used SMC’s already existing well maintained feature tables. I could just focus on design part.
  7. Test using test scripts

Some more details:

  • Design: Santhosh Thottingal
  • Technology: Santhosh Thottingal and Kavya Manohar
  • Total number of glyphs: 676. Includes basic latin glyphs.
  • Project started on September 15, 2014
  • Number of svgs prepared: 271
  • Em size: 2048. Ascend: 1434. Descend: 614
  • 242 commits so far.
  • Latest version: 1.0.0-alpha.20141027
  • All drawings are in inkscape. No paper involved, no tracing.

Thanks for all my friends who are helping me testing and for their encouragement.
Stay tuned for first version announcement :)

(Cross posted from http://blog.smc.org.in/new-handwriting-style-font-for-malayalam-chilanka/ )

HOWTO: Wacom Bamboo CTH301K in Debian

This is a short documentation on getting Wacom Bamboo CTH301K working in Debian. I use Debian Sid with Linux kernel 3.16 at the time of writing this. But this should work with latest Ubuntu(14.04 or 14.10) and new kernels.

Wacom Bamboo CTH301K is an entry level touch pad with stylus – you can use it as a mouse, or drawing pad with stylus. It has multitouch features like pinch zoom and all. I got all working.

Eventhough wacom has drivers for their many models in linux kernel, this particular model with device id: 056a:0318 does not have a driver in kernel. When you connect it, you will see it is listed in the lsusb output as
Bus 003 Device 016: ID 056a:0318 Wacom Co., Ltd

But touch or stylus wont work because of missing driver. First step to get stylus working is adding usbhid.quirks=0x056a:0x0318:0x40000000 to the grub boot cmdline. For this, edit /etc/default/grub. Append the above string to GRUB_CMDLINE_LINUX_DEFAULT. In my system it looked like as follows:

GRUB_CMDLINE_LINUX_DEFAULT="quiet init=/bin/systemd usbhid.quirks=0x056a:0x0318:0x40000000"

You need to save this file and run update-grub command to get this updated in grub. There are alternate ways to pass this string to modprob, but this method make sure it works always in every system restart. Once done, you will see the stylus getting detected and working. Touch will not work still-This is because the default wacom driver picked up does not know about this device.

To get touch working, open /usr/share/X11/xorg.conf.d/50-wacom.conf and add MatchIsTablet "on" to the first section of that file. In my machine it looked like

Section "InputClass"
        Identifier "Wacom USB device class"
        MatchUSBID "056a:*"
        MatchIsTablet "on"
        MatchDevicePath "/dev/input/event*"
        Driver "wacom"
EndSection

With this the “evdev” driver will be managing the device’s touch part. Restart your X – like restrarting KDM or GDM. Or just restart the machine.

You will see stylus and touch working now. You may need to use xsetwacom command to adjust the preferences, but you can find documentation of that elsewhere.

The above method also works with wireless model, just replace the device id 0x056a:0x0318 with 0x056a:0x0319

 Update

  • 24/04/2015: Bamboo Pad pen support accepted into Linus’ repository on the “master” branch (commit 61e9e7e). Expected release: Linux 4.0.
  • Bamboo Pad touch support accepted into Jiri’s HID repository on the “for-4.1/wacom” branch (commit 8c97a76). Expected release: Linux 4.1.

Updated Swanalekha JavaScript Library

Six years back I wrote a javascript version of popular Swanalekha input method. Friends were requesting web based version that they can use in Windows platforms too. Swanalekha was initially written as a SCIM input method and later m17n input method, readily available in GNU/Linux platform. I had provided a bookmarklet version too. Later Rajiv Nair and Nishan Nasir wrote Chrome and Firefox extensions.

Yesterday I noticed that the updated Marunadan Malayali news portal using my javascript library to power Malayalam input method in their search boxes. I don’t track who else or how many use these libraries and tools. But I quickly realized how bad the code is. Or in other words, How better I can write them today. 6 years is a long span of time anyway.

So I did some cleanup and rewrite, added documentation, example and here it is: http://thottingal.in/projects/swanalekha/swanalekha-ml.html

Code: https://github.com/smc/input-methods/tree/master/swanalekha-js

Enjoy.

Video of our presentation from 7th Multilingual Workshop by W3C

Video of our presentation from 7th Multilingual Workshop by W3C, Madrid, Spain, May 7-8


Best Practices on the Design of Translation- Pau Giner, David Chan and Santhosh Thottingal.

Abstract: Wikipedia is one of the most multilingual projects on the web today. In order to provide access to knowledge to everyone, Wikipedia is available in more than 280 languages. However, the coverage of topics and detail varies from language to language. The Language Engineering team from the Wikimedia Foundation is building open source tools to facilitate the translation of content when creating new articles to facilitate the diffusion of quality content across languages. The translation process in Wikipedia presents many different challenges. Translation tools are aimed at making the translation processes more fluent by integrating different tools such as translation services, dictionaries, and information from semantic databases as Wikidata.org. In addition to the technical challenges, ensuring content quality is one of the most important aspects considered during the design of the tool since any translation that does not read natural is not acceptable for a community focused on content quality. This talk will cover the design (from both technical and user experience perspectives) of the translation tools, and their expected impact on Wikipedia and the Web as a whole.