Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maria Khvalchik

El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks

Jul 03, 2020

Maria Khvalchik, Mikhail Galkin

Figure 1 for El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks

Figure 2 for El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks

Figure 3 for El Departamento de Nosotros: How Machine Translated Corpora Affects Language Models in MRC Tasks

Abstract:Pre-training large-scale language models (LMs) requires huge amounts of text corpora. LMs for English enjoy ever growing corpora of diverse language resources. However, less resourced languages and their mono- and multilingual LMs often struggle to obtain bigger datasets. A typical approach in this case implies using machine translation of English corpora to a target language. In this work, we study the caveats of applying directly translated corpora for fine-tuning LMs for downstream natural language processing tasks and demonstrate that careful curation along with post-processing lead to improved performance and overall LMs robustness. In the empirical evaluation, we perform a comparison of directly translated against curated Spanish SQuAD datasets on both user and system levels. Further experimental results on XQuAD and MLQA transfer-learning evaluation question answering tasks show that presumably multilingual LMs exhibit more resilience to machine translation artifacts in terms of the exact match score.

Via

Access Paper or Ask Questions

Orchestrating NLP Services for the Legal Domain

Mar 28, 2020

Julián Moreno-Schneider, Georg Rehm, Elena Montiel-Ponsoda, Víctor Rodriguez-Doncel, Artem Revenko, Sotirios Karampatakis, Maria Khvalchik, Christian Sageder, Jorge Gracia, Filippo Maganza

Figure 1 for Orchestrating NLP Services for the Legal Domain

Figure 2 for Orchestrating NLP Services for the Legal Domain

Figure 3 for Orchestrating NLP Services for the Legal Domain

Figure 4 for Orchestrating NLP Services for the Legal Domain

Abstract:Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.

* Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020). To appear

Via

Access Paper or Ask Questions