Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julius Cheng

Early-Exit and Instant Confidence Translation Quality Estimation

Feb 20, 2025

Vilém Zouhar, Maike Züfle, Beni Egressy, Julius Cheng, Jan Niehues

Abstract:Quality estimation is omnipresent in machine translation, for both evaluation and generation. Unfortunately, quality estimation models are often opaque and computationally expensive, making them impractical to be part of large-scale pipelines. In this work, we tackle two connected challenges: (1) reducing the cost of quality estimation at scale, and (2) developing an inexpensive uncertainty estimation method for quality estimation. To address the latter, we introduce Instant Confidence COMET, an uncertainty-aware quality estimation model that matches the performance of previous approaches at a fraction of their costs. We extend this to Early-Exit COMET, a quality estimation model that can compute quality scores and associated confidences already at early model layers, allowing us to early-exit computations and reduce evaluation costs. We also apply our model to machine translation reranking. We combine Early-Exit COMET with an upper confidence bound bandit algorithm to find the best candidate from a large pool without having to run the full evaluation model on all candidates. In both cases (evaluation and reranking) our methods reduce the required compute by 50% with very little degradation in performance.

Via

Access Paper or Ask Questions

A Bayesian Optimization Approach to Machine Translation Reranking

Nov 14, 2024

Julius Cheng, Maike Züfle, Vilém Zouhar, Andreas Vlachos

Figure 1 for A Bayesian Optimization Approach to Machine Translation Reranking

Figure 2 for A Bayesian Optimization Approach to Machine Translation Reranking

Figure 3 for A Bayesian Optimization Approach to Machine Translation Reranking

Figure 4 for A Bayesian Optimization Approach to Machine Translation Reranking

Abstract:Reranking a list of candidates from a machine translation system with an external scoring model and returning the highest-scoring candidate remains a simple and effective method for improving the overall output quality. Translation scoring models continue to grow in size, with the best models being comparable to generation models. Thus, reranking can add substantial computational cost to the translation pipeline. In this work, we pose reranking as a Bayesian optimization (BayesOpt) problem. By strategically selecting candidates to score based on a balance of exploration and exploitation, we show that it is possible to find top-scoring candidates when scoring only a fraction of the candidate list. For instance, our method achieves the same CometKiwi score using only 70 scoring evaluations compared a baseline system using 180. We present a multi-fidelity setting for BayesOpt, where the candidates are first scored with a cheaper but noisier proxy scoring model, which further improves the cost-performance tradeoff when using smaller but well-trained distilled proxy scorers.

* v1: Preprint version

Via

Access Paper or Ask Questions

Faster Minimum Bayes Risk Decoding with Confidence-based Pruning

Nov 25, 2023

Julius Cheng, Andreas Vlachos

Figure 1 for Faster Minimum Bayes Risk Decoding with Confidence-based Pruning

Figure 2 for Faster Minimum Bayes Risk Decoding with Confidence-based Pruning

Figure 3 for Faster Minimum Bayes Risk Decoding with Confidence-based Pruning

Figure 4 for Faster Minimum Bayes Risk Decoding with Confidence-based Pruning

Abstract:Minimum Bayes risk (MBR) decoding outputs the hypothesis with the highest expected utility over the model distribution for some utility function. It has been shown to improve accuracy over beam search in conditional language generation problems and especially neural machine translation, in both human and automatic evaluations. However, the standard sampling-based algorithm for MBR is substantially more computationally expensive than beam search, requiring a large number of samples as well as a quadratic number of calls to the utility function, limiting its applicability. We describe an algorithm for MBR which gradually grows the number of samples used to estimate the utility while pruning hypotheses that are unlikely to have the highest utility according to confidence estimates obtained with bootstrap sampling. Our method requires fewer samples and drastically reduces the number of calls to the utility function compared to standard MBR while being statistically indistinguishable in terms of accuracy. We demonstrate the effectiveness of our approach in experiments on three language pairs, using chrF++ and COMET as utility/evaluation metrics.

* Updated from EMNLP 2023 version: typo fix, minor math notation change, updated citation

Via

Access Paper or Ask Questions