Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!
Abstract:We introduced the contemporary Amharic corpus, which is automatically tagged for morpho-syntactic information. Texts are collected from 25,199 documents from different domains and about 24 million orthographic words are tokenized. Since it is partly a web corpus, we made some automatic spelling error correction. We have also modified the existing morphological analyzer, HornMorpho, to use it for the automatic tagging.
* Proceedings of the First Workshop on Linguistic Resources for
Natural Language Processing, pp. 65-70. 2018 * Published in Proceedings of the First Workshop on Linguistic
Resources for Natural Language Processing at COLING 2018