Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ondřej Bojar

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Dec 24, 2024

Sara Papi, Peter Polak, Ondřej Bojar, Dominik Macháček

Figure 1 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Figure 2 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Figure 3 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Figure 4 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Abstract:Simultaneous speech-to-text translation (SimulST) translates source-language speech into target-language text concurrently with the speaker's speech, ensuring low latency for better user comprehension. Despite its intended application to unbounded speech, most research has focused on human pre-segmented speech, simplifying the task and overlooking significant challenges. This narrow focus, coupled with widespread terminological inconsistencies, is limiting the applicability of research outcomes to real-world applications, ultimately hindering progress in the field. Our extensive literature review of 110 papers not only reveals these critical issues in current research but also serves as the foundation for our key contributions. We 1) define the steps and core components of a SimulST system, proposing a standardized terminology and taxonomy; 2) conduct a thorough analysis of community trends, and 3) offer concrete recommendations and future directions to bridge the gaps in existing literature, from evaluation frameworks to system architectures, for advancing the field towards more realistic and effective SimulST solutions.

* Accepted at TACL

Via

Access Paper or Ask Questions

Findings of the IWSLT 2024 Evaluation Campaign

Nov 07, 2024

Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico(+35 more)

Abstract:This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in 26 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

* IWSLT 2024; 59 pages

Via

Access Paper or Ask Questions

Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Oct 17, 2024

Patrik Zavoral, Dušan Variš, Ondřej Bojar

Figure 1 for Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Figure 2 for Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Figure 3 for Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Figure 4 for Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Abstract:The Transformer model has a tendency to overfit various aspects of the training data, such as the overall sequence length. We study elementary string edit functions using a defined set of error indicators to interpret the behaviour of the sequence-to-sequence Transformer. We show that generalization to shorter sequences is often possible, but confirm that longer sequences are highly problematic, although partially correct answers are often obtained. Additionally, we find that other structural characteristics of the sequences, such as subsegment length, may be equally important. We hypothesize that the models learn algorithmic aspects of the tasks simultaneously with structural aspects but adhering to the structural aspects is unfortunately often preferred by Transformer when they come into conflict.

* 9 pages, 8 figures, 2 tables; to be published

Via

Access Paper or Ask Questions

Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

Jun 06, 2024

Matthias Sperber, Ondřej Bojar, Barry Haddow, Dávid Javorský, Xutai Ma, Matteo Negri, Jan Niehues, Peter Polák, Elizabeth Salesky, Katsuhito Sudoh(+1 more)

Figure 1 for Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

Figure 2 for Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

Figure 3 for Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

Figure 4 for Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

Abstract:Human evaluation is a critical component in machine translation system development and has received much attention in text translation research. However, little prior work exists on the topic of human evaluation for speech translation, which adds additional challenges such as noisy data and segmentation mismatches. We take first steps to fill this gap by conducting a comprehensive human evaluation of the results of several shared tasks from the last International Workshop on Spoken Language Translation (IWSLT 2023). We propose an effective evaluation strategy based on automatic resegmentation and direct assessment with segment context. Our analysis revealed that: 1) the proposed evaluation strategy is robust and scores well-correlated with other types of human judgements; 2) automatic metrics are usually, but not always, well-correlated with direct assessment scores; and 3) COMET as a slightly stronger automatic metric than chrF, despite the segmentation noise introduced by the resegmentation step systems. We release the collected human-annotated data in order to encourage further investigation.

* Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
* LREC-COLING2024 publication (with corrections for Table 3)

Via

Access Paper or Ask Questions

Understanding the role of FFNs in driving multilingual behaviour in LLMs

Apr 22, 2024

Sunit Bhattacharya, Ondřej Bojar

Figure 1 for Understanding the role of FFNs in driving multilingual behaviour in LLMs

Figure 2 for Understanding the role of FFNs in driving multilingual behaviour in LLMs

Figure 3 for Understanding the role of FFNs in driving multilingual behaviour in LLMs

Figure 4 for Understanding the role of FFNs in driving multilingual behaviour in LLMs

Abstract:Multilingualism in Large Language Models (LLMs) is an yet under-explored area. In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of a Large Language Model, examining its architecture, activation patterns, and processing mechanisms across languages. We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on the impact of architectural choices on multilingual processing. Our findings reveal different patterns of multilinugal processing in the sublayers of Feed-Forward Networks of the models. Furthermore, we uncover the phenomenon of "over-layerization" in certain model configurations, where increasing layer depth without corresponding adjustments to other parameters may degrade model performance. Through comparisons within and across languages, we demonstrate the interplay between model architecture, layer depth, and multilingual processing capabilities of LLMs trained on multiple languages.

* 10 pages

Via

Access Paper or Ask Questions

On Difficulties of Attention Factorization through Shared Memory

Mar 31, 2024

Uladzislau Yorsh, Martin Holeňa, Ondřej Bojar, David Herel

Abstract:Transformers have revolutionized deep learning in numerous fields, including natural language processing, computer vision, and audio processing. Their strength lies in their attention mechanism, which allows for the discovering of complex input relationships. However, this mechanism's quadratic time and memory complexity pose challenges for larger inputs. Researchers are now investigating models like Linear Unified Nested Attention (Luna) or Memory Augmented Transformer, which leverage external learnable memory to either reduce the attention computation complexity down to linear, or to propagate information between chunks in chunk-wise processing. Our findings challenge the conventional thinking on these models, revealing that interfacing with the memory directly through an attention operation is suboptimal, and that the performance may be considerably improved by filtering the input signal before communicating with memory.

* 2 pages of main content, 8 pages in total, published as a Tiny Paper at ICLR 2024

Via

Access Paper or Ask Questions

Quality and Quantity of Machine Translation References for Automated Metrics

Jan 08, 2024

Vilém Zouhar, Ondřej Bojar

Abstract:Automatic machine translation metrics often use human translations to determine the quality of system translations. Common wisdom in the field dictates that the human references should be of very high quality. However, there are no cost-benefit analyses that could be used to guide practitioners who plan to collect references for machine translation evaluation. We find that higher-quality references lead to better metric correlations with humans at the segment-level. Having up to 7 references per segment and taking their average helps all metrics. Interestingly, the references from vendors of different qualities can be mixed together and improve metric success. Higher quality references, however, cost more to create and we frame this as an optimization problem: given a specific budget, what references should be collected to maximize metric success. These findings can be used by evaluators of shared tasks when references need to be created under a certain budget.

Via

Access Paper or Ask Questions

Evaluating Optimal Reference Translations

Nov 28, 2023

Vilém Zouhar, Věra Kloudová, Martin Popel, Ondřej Bojar

Figure 1 for Evaluating Optimal Reference Translations

Figure 2 for Evaluating Optimal Reference Translations

Figure 3 for Evaluating Optimal Reference Translations

Figure 4 for Evaluating Optimal Reference Translations

Abstract:The overall translation quality reached by current machine translation (MT) systems for high-resourced language pairs is remarkably good. Standard methods of evaluation are not suitable nor intended to uncover the many translation errors and quality deficiencies that still persist. Furthermore, the quality of standard reference translations is commonly questioned and comparable quality levels have been reached by MT alone in several language pairs. Navigating further research in these high-resource settings is thus difficult. In this article, we propose a methodology for creating more reliable document-level human reference translations, called "optimal reference translations," with the simple aim to raise the bar of what should be deemed "human translation quality." We evaluate the obtained document-level optimal reference translations in comparison with "standard" ones, confirming a significant quality increase and also documenting the relationship between evaluation and translation editing.

* To appear in Natural Language Engineering 2024

Via

Access Paper or Ask Questions

Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Oct 22, 2023

Ivana Kvapilíková, Ondřej Bojar

Figure 1 for Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Figure 2 for Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Figure 3 for Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Figure 4 for Boosting Unsupervised Machine Translation with Pseudo-Parallel Data

Abstract:Even with the latest developments in deep learning and large-scale language modeling, the task of machine translation (MT) of low-resource languages remains a challenge. Neural MT systems can be trained in an unsupervised way without any translation resources but the quality lags behind, especially in truly low-resource conditions. We propose a training strategy that relies on pseudo-parallel sentence pairs mined from monolingual corpora in addition to synthetic sentence pairs back-translated from monolingual corpora. We experiment with different training schedules and reach an improvement of up to 14.5 BLEU points (English to Ukrainian) over a baseline trained on back-translated data only.

* Ivana Kvapil\'ikov\'a, Ond\v{r}ej Bojar (2023): Boosting Unsupervised Machine Translation with Pseudo-Parallel Data. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 135-147, AAMT, Kyoto, Japan
* MT Summit 2023

Via

Access Paper or Ask Questions

Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

Sep 20, 2023

Peter Polák, Ondřej Bojar

Abstract:Current simultaneous speech translation models can process audio only up to a few seconds long. Contemporary datasets provide an oracle segmentation into sentences based on human-annotated transcripts and translations. However, the segmentation into sentences is not available in the real world. Current speech segmentation approaches either offer poor segmentation quality or have to trade latency for quality. In this paper, we propose a novel segmentation approach for a low-latency end-to-end speech translation. We leverage the existing speech translation encoder-decoder architecture with ST CTC and show that it can perform the segmentation task without supervision or additional parameters. To the best of our knowledge, our method is the first that allows an actual end-to-end simultaneous speech translation, as the same model is used for translation and segmentation at the same time. On a diverse set of language pairs and in- and out-of-domain data, we show that the proposed approach achieves state-of-the-art quality at no additional computational cost.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions