Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stephan Peitz

Overcoming Vocabulary Constraints with Pixel-level Fallback

Apr 02, 2025

Jonas F. Lotz, Hendra Setiawan, Stephan Peitz, Yova Kementchedjhieva

Figure 1 for Overcoming Vocabulary Constraints with Pixel-level Fallback

Figure 2 for Overcoming Vocabulary Constraints with Pixel-level Fallback

Figure 3 for Overcoming Vocabulary Constraints with Pixel-level Fallback

Figure 4 for Overcoming Vocabulary Constraints with Pixel-level Fallback

Abstract:Subword tokenization requires balancing computational efficiency and vocabulary coverage, which often leads to suboptimal performance on languages and scripts not prioritized during training. We propose to augment pretrained language models with a vocabulary-free encoder that generates input embeddings from text rendered as pixels. Through experiments on English-centric language models, we demonstrate that our approach substantially improves machine translation performance and facilitates effective cross-lingual transfer, outperforming tokenizer-based methods. Furthermore, we find that pixel-based representations outperform byte-level approaches and standard vocabulary expansion. Our approach enhances the multilingual capabilities of monolingual language models without extensive retraining and reduces decoding latency via input compression.

Via

Access Paper or Ask Questions

Learning Language-Specific Layers for Multilingual Machine Translation

May 04, 2023

Telmo Pessoa Pires, Robin M. Schmidt, Yi-Hsiu Liao, Stephan Peitz

Abstract:Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.

* Accepted at ACL 2023

Via

Access Paper or Ask Questions

State Spaces Aren't Enough: Machine Translation Needs Attention

Apr 25, 2023

Ali Vardasbi, Telmo Pessoa Pires, Robin M. Schmidt, Stephan Peitz

Abstract:Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT'14 and WMT'16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4's inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.

Via

Access Paper or Ask Questions

Non-Autoregressive Neural Machine Translation: A Call for Clarity

May 21, 2022

Robin M. Schmidt, Telmo Pires, Stephan Peitz, Jonas Lööf

Figure 1 for Non-Autoregressive Neural Machine Translation: A Call for Clarity

Figure 2 for Non-Autoregressive Neural Machine Translation: A Call for Clarity

Abstract:Non-autoregressive approaches aim to improve the inference speed of translation models by only requiring a single forward pass to generate the output sequence instead of iteratively producing each predicted token. Consequently, their translation quality still tends to be inferior to their autoregressive counterparts due to several issues involving output token interdependence. In this work, we take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models and compare their combined translation quality and speed implications under third-party testing environments. We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants and contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks, which crucially have been missing as inconsistencies in the use of tokenized BLEU lead to deviations of up to 1.7 BLEU points. Our open-sourced code is integrated into fairseq for reproducibility.

Via

Access Paper or Ask Questions

Jointly Learning to Align and Translate with Transformer Models

Sep 04, 2019

Sarthak Garg, Stephan Peitz, Udhyakumar Nallasamy, Matthias Paulik

Figure 1 for Jointly Learning to Align and Translate with Transformer Models

Figure 2 for Jointly Learning to Align and Translate with Transformer Models

Figure 3 for Jointly Learning to Align and Translate with Transformer Models

Figure 4 for Jointly Learning to Align and Translate with Transformer Models

Abstract:The state of the art in machine translation (MT) is governed by neural approaches, which typically provide superior translation accuracy over statistical approaches. However, on the closely related task of word alignment, traditional statistical word alignment models often remain the go-to solution. In this paper, we present an approach to train a Transformer model to produce both accurate translations and alignments. We extract discrete alignments from the attention probabilities learnt during regular neural machine translation model training and leverage them in a multi-task framework to optimize towards translation and alignment objectives. We demonstrate that our approach produces competitive results compared to GIZA++ trained IBM alignment models without sacrificing translation accuracy and outperforms previous attempts on Transformer model based word alignment. Finally, by incorporating IBM model alignments into our multi-task training, we report significantly better alignment accuracies compared to GIZA++ on three publicly available data sets.

* 10 pages, 2 figures. To appear at EMNLP 2019

Via

Access Paper or Ask Questions