POS Tagging: A review of BIS POS tagset and ILCI-II Malayalam Text Corpus
The Bureau of Indian Standards(BIS) had published a Part of Speech(POS) tagset for Indian languages. POS is the process of assigning a part of speech marker to each word in a given text. In this article, I am reviewing the tag set defined in it. While developing mlmorph project I had explored a candidate POS tagging schema for Malayalam. I did not choose BIS tagset for the reasons I am going to explian in this article.
[Read More]