Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Koel Dutta Chowdhury

When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation

Jan 28, 2026

David Tan, Pinzhen Chen, Josef van Genabith, Koel Dutta Chowdhury

Abstract:Large language models (LLMs) can be benchmark-contaminated, resulting in inflated scores that mask memorization as generalization, and in multilingual settings, this memorization can even transfer to "uncontaminated" languages. Using the FLORES-200 translation benchmark as a diagnostic, we study two 7-8B instruction-tuned multilingual LLMs: Bloomz, which was trained on FLORES, and Llama as an uncontaminated control. We confirm Bloomz's FLORES contamination and demonstrate that machine translation contamination can be cross-directional, artificially boosting performance in unseen translation directions due to target-side memorization. Further analysis shows that recall of memorized references often persists despite various source-side perturbation efforts like paraphrasing and named entity replacement. However, replacing named entities leads to a consistent decrease in BLEU, suggesting an effective probing method for memorization in contaminated models.

* 5 pages of content, 15 total. 5 figures, 12 tables total. Accepted to EACL 2026 main conference. Code can be found here: github.com/Mr-Ao-25/cross-ling-contamination

Via

Access Paper or Ask Questions

Translating away Translationese without Parallel Data

Oct 28, 2023

Rricha Jalota, Koel Dutta Chowdhury, Cristina España-Bonet, Josef van Genabith

Figure 1 for Translating away Translationese without Parallel Data

Figure 2 for Translating away Translationese without Parallel Data

Figure 3 for Translating away Translationese without Parallel Data

Figure 4 for Translating away Translationese without Parallel Data

Abstract:Translated texts exhibit systematic linguistic differences compared to original texts in the same language, and these differences are referred to as translationese. Translationese has effects on various cross-lingual natural language processing tasks, potentially leading to biased results. In this paper, we explore a novel approach to reduce translationese in translated texts: translation-based style transfer. As there are no parallel human-translated and original data in the same language, we use a self-supervised approach that can learn from comparable (rather than parallel) mono-lingual original and translated data. However, even this self-supervised approach requires some parallel data for validation. We show how we can eliminate the need for parallel validation data by combining the self-supervised loss with an unsupervised loss. This unsupervised loss leverages the original language model loss over the style-transferred output and a semantic similarity loss between the input and style-transferred output. We evaluate our approach in terms of original vs. translationese binary classification in addition to measuring content preservation and target-style fluency. The results show that our approach is able to reduce translationese classifier accuracy to a level of a random classifier after style transfer while adequately preserving the content and fluency in the target original style.

* Accepted at EMNLP 2023, Main Conference

Via

Access Paper or Ask Questions

Towards Debiasing Translation Artifacts

May 16, 2022

Koel Dutta Chowdhury, Rricha Jalota, Cristina España-Bonet, Josef van Genabith

Figure 1 for Towards Debiasing Translation Artifacts

Figure 2 for Towards Debiasing Translation Artifacts

Figure 3 for Towards Debiasing Translation Artifacts

Figure 4 for Towards Debiasing Translation Artifacts

Abstract:Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets. However, compared to original texts in the same language, translations possess distinct qualities referred to as translationese. Previous research has shown that these translation artifacts influence the performance of a variety of cross-lingual tasks. In this work, we propose a novel approach to reducing translationese by extending an established bias-removal technique. We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level. We evaluate the utility of debiasing translationese on a natural language inference (NLI) task, and show that by reducing this bias, NLI accuracy improves. To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.

* Accepted to NAACL 2022, Main Conference

Via

Access Paper or Ask Questions

EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT

Sep 29, 2021

Svetlana Tchistiakova, Jesujoba Alabi, Koel Dutta Chowdhury, Sourav Dutta, Dana Ruiter

Figure 1 for EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT

Figure 2 for EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT

Figure 3 for EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT

Abstract:We describe the EdinSaar submission to the shared task of Multilingual Low-Resource Translation for North Germanic Languages at the Sixth Conference on Machine Translation (WMT2021). We submit multilingual translation models for translations to/from Icelandic (is), Norwegian-Bokmal (nb), and Swedish (sv). We employ various experimental approaches, including multilingual pre-training, back-translation, fine-tuning, and ensembling. In most translation directions, our models outperform other submitted systems.

* To be published WMT2021

Via

Access Paper or Ask Questions

Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification

Sep 15, 2021

Daria Pylypenko, Kwabena Amponsah-Kaakyire, Koel Dutta Chowdhury, Josef van Genabith, Cristina España-Bonet

Figure 1 for Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification

Figure 2 for Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification

Figure 3 for Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification

Figure 4 for Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification

Abstract:Traditional hand-crafted linguistically-informed features have often been used for distinguishing between translated and original non-translated texts. By contrast, to date, neural architectures without manual feature engineering have been less explored for this task. In this work, we (i) compare the traditional feature-engineering-based approach to the feature-learning-based one and (ii) analyse the neural architectures in order to investigate how well the hand-crafted features explain the variance in the neural models' predictions. We use pre-trained neural word embeddings, as well as several end-to-end neural architectures in both monolingual and multilingual settings and compare them to feature-engineering-based SVM classifiers. We show that (i) neural architectures outperform other approaches by more than 20 accuracy points, with the BERT-based model performing the best in both the monolingual and multilingual settings; (ii) while many individual hand-crafted translationese features correlate with neural model predictions, feature importance analysis shows that the most important features for neural and classical architectures differ; and (iii) our multilingual experiments provide empirical evidence for translationese universals across languages.

* 9 pages, 5 pages appendix, 2 figures, 7 tables. The first 3 authors contributed equally. Accepted to EMNLP 2021, Main Conference

Via

Access Paper or Ask Questions

The RGNLP Machine Translation Systems for WAT 2018

Dec 03, 2018

Atul Kr. Ojha, Koel Dutta Chowdhury, Chao-Hong Liu, Karan Saxena

Figure 1 for The RGNLP Machine Translation Systems for WAT 2018

Figure 2 for The RGNLP Machine Translation Systems for WAT 2018

Figure 3 for The RGNLP Machine Translation Systems for WAT 2018

Figure 4 for The RGNLP Machine Translation Systems for WAT 2018

Abstract:This paper presents the system description of Machine Translation (MT) system(s) for Indic Languages Multilingual Task for the 2018 edition of the WAT Shared Task. In our experiments, we (the RGNLP team) explore both statistical and neural methods across all language pairs. (We further present an extensive comparison of language-related problems for both the approaches in the context of low-resourced settings.) Our PBSMT models were highest score on all automatic evaluation metrics in the English into Telugu, Hindi, Bengali, Tamil portion of the shared task.

* Short-Paper at WAT Shared Task 2018, In Proceedings of the 5th Workshop on Asian Translation (WAT2018), Hong Kong, China, December

Via

Access Paper or Ask Questions