Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jeroen De Gussem

Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Aug 03, 2017

Mike Kestemont, Jeroen De Gussem

Figure 1 for Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Figure 2 for Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Figure 3 for Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Figure 4 for Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning

Abstract:In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.

* Journal of Data Mining & Digital Humanities, Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages, Towards a Digital Ecosystem: NLP. Corpus infrastructure. Methods for Retrieving Texts and Computing Text Similarities (August 6, 2017) jdmdh:3835

Via

Access Paper or Ask Questions