Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Samuel Läubli

A comparison of translation performance between DeepL and Supertext

Feb 04, 2025

Alex Flückiger, Chantal Amrhein, Tim Graf, Philippe Schläpfer, Florian Schottmann, Samuel Läubli

Abstract:As strong machine translation (MT) systems are increasingly based on large language models (LLMs), reliable quality benchmarking requires methods that capture their ability to leverage extended context. This study compares two commercial MT systems -- DeepL and Supertext -- by assessing their performance on unsegmented texts. We evaluate translation quality across four language directions with professional translators assessing segments with full document-level context. While segment-level assessments indicate no strong preference between the systems in most cases, document-level analysis reveals a preference for Supertext in three out of four language directions, suggesting superior consistency across longer texts. We advocate for more context-sensitive evaluation methodologies to ensure that MT quality assessments reflect real-world usability. We release all evaluation data and scripts for further analysis and reproduction at https://github.com/supertext/evaluation_deepl_supertext.

Via

Access Paper or Ask Questions

Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model

May 18, 2023

Chantal Amrhein, Florian Schottmann, Rico Sennrich, Samuel Läubli

Figure 1 for Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model

Figure 2 for Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model

Figure 3 for Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model

Figure 4 for Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model

Abstract:Natural language generation models reproduce and often amplify the biases present in their training data. Previous research explored using sequence-to-sequence rewriting models to transform biased model outputs (or original texts) into more gender-fair language by creating pseudo training data through linguistic rules. However, this approach is not practical for languages with more complex morphology than English. We hypothesise that creating training data in the reverse direction, i.e. starting from gender-fair text, is easier for morphologically complex languages and show that it matches the performance of state-of-the-art rewriting models for English. To eliminate the rule-based nature of data creation, we instead propose using machine translation models to create gender-biased text from real gender-fair text via round-trip translation. Our approach allows us to train a rewriting model for German without the need for elaborate handcrafted rules. The outputs of this model increased gender-fairness as shown in a human evaluation study.

* accepted to ACL 2023

Via

Access Paper or Ask Questions

The Impact of Text Presentation on Translator Performance

Nov 11, 2020

Samuel Läubli, Patrick Simianer, Joern Wuebker, Geza Kovacs, Rico Sennrich, Spence Green

Figure 1 for The Impact of Text Presentation on Translator Performance

Figure 2 for The Impact of Text Presentation on Translator Performance

Figure 3 for The Impact of Text Presentation on Translator Performance

Figure 4 for The Impact of Text Presentation on Translator Performance

Abstract:Widely used computer-aided translation (CAT) tools divide documents into segments such as sentences and arrange them in a side-by-side, spreadsheet-like view. We present the first controlled evaluation of these design choices on translator performance, measuring speed and accuracy in three experimental text processing tasks. We find significant evidence that sentence-by-sentence presentation enables faster text reproduction and within-sentence error identification compared to unsegmented text, and that a top-and-bottom arrangement of source and target sentences enables faster text reproduction compared to a side-by-side arrangement. For revision, on the other hand, our results suggest that presenting unsegmented text results in the highest accuracy and time efficiency. Our findings have direct implications for best practices in designing CAT tools.

* Accepted for publication in Target

Via

Access Paper or Ask Questions

What's the Difference Between Professional Human and Machine Translation? A Blind Multi-language Study on Domain-specific MT

Jun 08, 2020

Lukas Fischer, Samuel Läubli

Figure 1 for What's the Difference Between Professional Human and Machine Translation? A Blind Multi-language Study on Domain-specific MT

Figure 2 for What's the Difference Between Professional Human and Machine Translation? A Blind Multi-language Study on Domain-specific MT

Figure 3 for What's the Difference Between Professional Human and Machine Translation? A Blind Multi-language Study on Domain-specific MT

Figure 4 for What's the Difference Between Professional Human and Machine Translation? A Blind Multi-language Study on Domain-specific MT

Abstract:Machine translation (MT) has been shown to produce a number of errors that require human post-editing, but the extent to which professional human translation (HT) contains such errors has not yet been compared to MT. We compile pre-translated documents in which MT and HT are interleaved, and ask professional translators to flag errors and post-edit these documents in a blind evaluation. We find that the post-editing effort for MT segments is only higher in two out of three language pairs, and that the number of segments with wrong terminology, omissions, and typographical problems is similar in HT.

* EAMT 2020 (Research Track)

Via

Access Paper or Ask Questions

A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

Apr 03, 2020

Samuel Läubli, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, Antonio Toral

Figure 1 for A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

Figure 2 for A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

Figure 3 for A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

Figure 4 for A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

Abstract:The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.

* Journal of Artificial Intelligence Research 67 (2020) 653-672

Via

Access Paper or Ask Questions

Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain

Jun 04, 2019

Samuel Läubli, Chantal Amrhein, Patrick Düggelin, Beatriz Gonzalez, Alena Zwahlen, Martin Volk

Figure 1 for Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain

Figure 2 for Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain

Figure 3 for Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain

Abstract:Neural machine translation (NMT) has set new quality standards in automatic translation, yet its effect on post-editing productivity is still pending thorough investigation. We empirically test how the inclusion of NMT, in addition to domain-specific translation memories and termbases, impacts speed and quality in professional translation of financial texts. We find that even with language pairs that have received little attention in research settings and small amounts of in-domain data for system adaptation, NMT post-editing allows for substantial time savings and leads to equal or slightly better quality.

* MT Summit 2019 (Research Track)

Via

Access Paper or Ask Questions

Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Aug 21, 2018

Samuel Läubli, Rico Sennrich, Martin Volk

Figure 1 for Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Figure 2 for Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Figure 3 for Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Figure 4 for Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation

Abstract:Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese--English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.

* EMNLP 2018

Via

Access Paper or Ask Questions

Nematus: a Toolkit for Neural Machine Translation

Mar 13, 2017

Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry(+1 more)

Figure 1 for Nematus: a Toolkit for Neural Machine Translation

Figure 2 for Nematus: a Toolkit for Neural Machine Translation

Abstract:We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.

* EACL 2017 demo track

Via

Access Paper or Ask Questions

Automatic TM Cleaning through MT and POS Tagging: Autodesk's Submission to the NLP4TM 2016 Shared Task

May 19, 2016

Alena Zwahlen, Olivier Carnal, Samuel Läubli

Figure 1 for Automatic TM Cleaning through MT and POS Tagging: Autodesk's Submission to the NLP4TM 2016 Shared Task

Figure 2 for Automatic TM Cleaning through MT and POS Tagging: Autodesk's Submission to the NLP4TM 2016 Shared Task

Figure 3 for Automatic TM Cleaning through MT and POS Tagging: Autodesk's Submission to the NLP4TM 2016 Shared Task

Abstract:We describe a machine learning based method to identify incorrect entries in translation memories. It extends previous work by Barbu (2015) through incorporating recall-based machine translation and part-of-speech-tagging features. Our system ranked first in the Binary Classification (II) task for two out of three language pairs: English-Italian and English-Spanish.

* Presented at the 2nd Workshop on Natural Language Processing for Translation Memories (NLP4TM 2016)

Via

Access Paper or Ask Questions