Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eetu Sjöblom

Grammatical Error Generation Based on Translated Fragments

Apr 20, 2021

Eetu Sjöblom, Mathias Creutz, Teemu Vahtola

Figure 1 for Grammatical Error Generation Based on Translated Fragments

Figure 2 for Grammatical Error Generation Based on Translated Fragments

Figure 3 for Grammatical Error Generation Based on Translated Fragments

Abstract:We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction. Our method aims at simulating mistakes made by second language learners, and produces a wider range of non-native style language in comparison to state-of-the-art synthetic data creation methods. In addition to purely grammatical errors, our approach generates other types of errors, such as lexical errors. We perform grammatical error correction experiments using neural sequence-to-sequence models, and carry out quantitative and qualitative evaluation. A model trained on data created using our proposed method is shown to outperform a baseline model on test data with a high proportion of errors.

* Accepted for NoDaLiDa 2021

Via

Access Paper or Ask Questions

Paraphrase Detection on Noisy Subtitles in Six Languages

Sep 21, 2018

Eetu Sjöblom, Mathias Creutz, Mikko Aulamo

Figure 1 for Paraphrase Detection on Noisy Subtitles in Six Languages

Figure 2 for Paraphrase Detection on Noisy Subtitles in Six Languages

Figure 3 for Paraphrase Detection on Noisy Subtitles in Six Languages

Figure 4 for Paraphrase Detection on Noisy Subtitles in Six Languages

Abstract:We perform automatic paraphrase detection on subtitle data from the Opusparcus corpus comprising six European languages: German, English, Finnish, French, Russian, and Swedish. We train two types of supervised sentence embedding models: a word-averaging (WA) model and a gated recurrent averaging network (GRAN) model. We find out that GRAN outperforms WA and is more robust to noisy training data. Better results are obtained with more and noisier data than less and cleaner data. Additionally, we experiment on other datasets, without reaching the same level of performance, because of domain mismatch between training and test data.

* To appear in Proceedings of W-NUT at EMNLP 2018, Brussels, Belgium, 1 November 2018

Via

Access Paper or Ask Questions