Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Modeling Orthographic Variation in Occitan's Dialects

Apr 30, 2024

Zachary William Hopton, Noëmi Aepli

Figure 1 for Modeling Orthographic Variation in Occitan's Dialects

Figure 2 for Modeling Orthographic Variation in Occitan's Dialects

Figure 3 for Modeling Orthographic Variation in Occitan's Dialects

Figure 4 for Modeling Orthographic Variation in Occitan's Dialects

Share this with someone who'll enjoy it:

Abstract:Effectively normalizing textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model's representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects. Intrinsic evaluations of the model's embeddings revealed that surface similarity between the dialects strengthened representations. When the model was further fine-tuned for part-of-speech tagging and Universal Dependency parsing, its performance was robust to dialectical variation, even when trained solely on part-of-speech data from a single dialect. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.

* Accepted at VarDial 2024: The Eleventh Workshop on NLP for Similar Languages, Varieties and Dialects

View paper on

Share this with someone who'll enjoy it:

Title:Modeling Orthographic Variation in Occitan's Dialects

Paper and Code