Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Karin Becker

UFRGS Participation on the WMT Biomedical Translation Shared Task

May 06, 2019

Felipe Soares, Karin Becker

Figure 1 for UFRGS Participation on the WMT Biomedical Translation Shared Task

Figure 2 for UFRGS Participation on the WMT Biomedical Translation Shared Task

Figure 3 for UFRGS Participation on the WMT Biomedical Translation Shared Task

Figure 4 for UFRGS Participation on the WMT Biomedical Translation Shared Task

Abstract:This paper describes the machine translation systems developed by the Universidade Federal do Rio Grande do Sul (UFRGS) team for the biomedical translation shared task. Our systems are based on statistical machine translation and neural machine translation, using the Moses and OpenNMT toolkits, respectively. We participated in four translation directions for the English/Spanish and English/Portuguese language pairs. To create our training data, we concatenated several parallel corpora, both from in-domain and out-of-domain sources, as well as terminological resources from UMLS. Our systems achieved the best BLEU scores according to the official shared task evaluation.

* Published on the Third Conference on Machine Translation (WMT18)

Via

Access Paper or Ask Questions

A Large Parallel Corpus of Full-Text Scientific Articles

May 06, 2019

Felipe Soares, Viviane Pereira Moreira, Karin Becker

Figure 1 for A Large Parallel Corpus of Full-Text Scientific Articles

Figure 2 for A Large Parallel Corpus of Full-Text Scientific Articles

Figure 3 for A Large Parallel Corpus of Full-Text Scientific Articles

Figure 4 for A Large Parallel Corpus of Full-Text Scientific Articles

Abstract:The Scielo database is an important source of scientific information in Latin America, containing articles from several research domains. A striking characteristic of Scielo is that many of its full-text contents are presented in more than one language, thus being a potential source of parallel corpora. In this article, we present the development of a parallel corpus from Scielo in three languages: English, Portuguese, and Spanish. Sentences were automatically aligned using the Hunalign algorithm for all language pairs, and for a subset of trilingual articles also. We demonstrate the capabilities of our corpus by training a Statistical Machine Translation system (Moses) for each language pair, which outperformed related works on scientific articles. Sentence alignment was also manually evaluated, presenting an average of 98.8% correctly aligned sentences across all languages. Our parallel corpus is freely available in the TMX format, with complementary information regarding article metadata.

* Published in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Via

Access Paper or Ask Questions