Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Álvaro Pérez Pozo

ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

Jul 03, 2023

Javier de la Rosa, Álvaro Pérez Pozo, Salvador Ros, Elena González-Blanco

Figure 1 for ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

Figure 2 for ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

Figure 3 for ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

Figure 4 for ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis

Abstract:The computational analysis of poetry is limited by the scarcity of tools to automatically analyze and scan poems. In a multilingual settings, the problem is exacerbated as scansion and rhyme systems only exist for individual languages, making comparative studies very challenging and time consuming. In this work, we present \textsc{Alberti}, the first multilingual pre-trained large language model for poetry. Through domain-specific pre-training (DSP), we further trained multilingual BERT on a corpus of over 12 million verses from 12 languages. We evaluated its performance on two structural poetry tasks: Spanish stanza type classification, and metrical pattern prediction for Spanish, English and German. In both cases, \textsc{Alberti} outperforms multilingual BERT and other transformers-based models of similar sizes, and even achieves state-of-the-art results for German when compared to rule-based systems, demonstrating the feasibility and effectiveness of DSP in the poetry domain.

* Accepted for publication at SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing

Via

Access Paper or Ask Questions