Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dominik Macháček

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Dec 24, 2024

Sara Papi, Peter Polak, Ondřej Bojar, Dominik Macháček

Figure 1 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Figure 2 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Figure 3 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Figure 4 for How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Abstract:Simultaneous speech-to-text translation (SimulST) translates source-language speech into target-language text concurrently with the speaker's speech, ensuring low latency for better user comprehension. Despite its intended application to unbounded speech, most research has focused on human pre-segmented speech, simplifying the task and overlooking significant challenges. This narrow focus, coupled with widespread terminological inconsistencies, is limiting the applicability of research outcomes to real-world applications, ultimately hindering progress in the field. Our extensive literature review of 110 papers not only reveals these critical issues in current research but also serves as the foundation for our key contributions. We 1) define the steps and core components of a SimulST system, proposing a standardized terminology and taxonomy; 2) conduct a thorough analysis of community trends, and 3) offer concrete recommendations and future directions to bridge the gaps in existing literature, from evaluation frameworks to system architectures, for advancing the field towards more realistic and effective SimulST solutions.

* Accepted at TACL

Via

Access Paper or Ask Questions

Teaching LLMs at Charles University: Assignments and Activities

Jul 29, 2024

Jindřich Helcl, Zdeněk Kasner, Ondřej Dušek, Tomasz Limisiewicz, Dominik Macháček, Tomáš Musil, Jindřich Libovický

Abstract:This paper presents teaching materials, particularly assignments and ideas for classroom activities, from a new course on large language models (LLMs) taught at Charles University. The assignments include experiments with LLM inference for weather report generation and machine translation. The classroom activities include class quizzes, focused research on downstream tasks and datasets, and an interactive "best paper" session aimed at reading and comprehension of research papers.

* 6th TeachNLP workshop at ACL 2024

Via

Access Paper or Ask Questions

Turning Whisper into Real-Time Transcription System

Jul 27, 2023

Dominik Macháček, Raj Dabre, Ondřej Bojar

Figure 1 for Turning Whisper into Real-Time Transcription System

Figure 2 for Turning Whisper into Real-Time Transcription System

Figure 3 for Turning Whisper into Real-Time Transcription System

Figure 4 for Turning Whisper into Real-Time Transcription System

Abstract:Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for real time transcription. In this paper, we build on top of Whisper and create Whisper-Streaming, an implementation of real-time speech transcription and translation of Whisper-like models. Whisper-Streaming uses local agreement policy with self-adaptive latency to enable streaming transcription. We show that Whisper-Streaming achieves high quality and 3.3 seconds latency on unsegmented long-form speech transcription test set, and we demonstrate its robustness and practical usability as a component in live transcription service at a multilingual conference.

* system demonstration pre-print

Via

Access Paper or Ask Questions

Robustness of Multi-Source MT to Transcription Errors

May 26, 2023

Dominik Macháček, Peter Polák, Ondřej Bojar, Raj Dabre

Abstract:Automatic speech translation is sensitive to speech recognition errors, but in a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling. In this paper, we hypothesize that leveraging multiple sources will improve translation quality if the sources complement one another in terms of correct information they contain. To this end, we first show that on a 10-hour ESIC corpus, the ASR errors in the original English speech and its simultaneous interpreting into German and Czech are mutually independent. We then use two sources, English and German, in a multi-source setting for translation into Czech to establish its robustness to ASR errors. Furthermore, we observe this robustness when translating both noisy sources together in a simultaneous translation setting. Our results show that multi-source neural machine translation has the potential to be useful in a real-time simultaneous translation setting, thereby motivating further investigation in this area.

* ACL 2023 Findings

Via

Access Paper or Ask Questions

MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

Nov 16, 2022

Dominik Macháček, Ondřej Bojar, Raj Dabre

Figure 1 for MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

Figure 2 for MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

Figure 3 for MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

Figure 4 for MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

Abstract:There have been several studies on the correlation between human ratings and metrics such as BLEU, chrF2 and COMET in machine translation. Most, if not all consider full-sentence translation. It is unclear whether human ratings of simultaneous speech translation Continuous Rating (CR) correlate with these metrics or not. Therefore, we conduct an extensive correlation analysis of CR and the aforementioned automatic metrics on evaluations of candidate systems at English-German simultaneous speech translation task at IWSLT 2022. Our studies reveal that the offline MT metrics correlate with CR and can be reliably used for evaluating machine translation in the simultaneous mode, with some limitations on the test set size. This implies that automatic metrics can be used as proxies for CR, thereby alleviating the need for human evaluation.

* Technical Report

Via

Access Paper or Ask Questions

Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation

Mar 04, 2022

Dávid Javorský, Dominik Macháček, Ondřej Bojar

Figure 1 for Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation

Figure 2 for Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation

Figure 3 for Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation

Figure 4 for Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation

Abstract:In simultaneous speech translation, one can vary the size of the output window, system latency and sometimes the allowed level of rewriting. The effect of these properties on readability and comprehensibility has not been tested with modern neural translation systems. In this work, we propose an evaluation method and investigate the effects on comprehension and user preferences. It is a pilot study with 14 users on 2 hours of German documentaries or speeches with online translations into Czech. We collect continuous feedback and answers on factual questions. Our results show that the subtitling layout or flicker have a little effect on comprehension, in contrast to machine translation itself and individual competence. Other results show that users with a limited knowledge of the source language have different preferences to stability and latency than the users with zero knowledge. The results are statistically insignificant, however, we show that our method works and can be reproduced in larger volume.

Via

Access Paper or Ask Questions

The Reality of Multi-Lingual Machine Translation

Feb 25, 2022

Tom Kocmi, Dominik Macháček, Ondřej Bojar

Figure 1 for The Reality of Multi-Lingual Machine Translation

Figure 2 for The Reality of Multi-Lingual Machine Translation

Figure 3 for The Reality of Multi-Lingual Machine Translation

Figure 4 for The Reality of Multi-Lingual Machine Translation

Abstract:Our book "The Reality of Multi-Lingual Machine Translation" discusses the benefits and perils of using more than two languages in machine translation systems. While focused on the particular task of sequence-to-sequence processing and multi-task learning, the book targets somewhat beyond the area of natural language processing. Machine translation is for us a prime example of deep learning applications where human skills and learning capabilities are taken as a benchmark that many try to match and surpass. We document that some of the gains observed in multi-lingual translation may result from simpler effects than the assumed cross-lingual transfer of knowledge. In the first, rather general part, the book will lead you through the motivation for multi-linguality, the versatility of deep neural networks especially in sequence-to-sequence tasks to complications of this learning. We conclude the general part with warnings against too optimistic and unjustified explanations of the gains that neural networks demonstrate. In the second part, we fully delve into multi-lingual models, with a particularly careful examination of transfer learning as one of the more straightforward approaches utilizing additional languages. The recent multi-lingual techniques, including massive models, are surveyed and practical aspects of deploying systems for many languages are discussed. The conclusion highlights the open problem of machine understanding and reminds of two ethical aspects of building large-scale models: the inclusivity of research and its ecological trace.

* ISBN 978-80-88132-11-0. arXiv admin note: substantial text overlap with arXiv:2001.01622

Via

Access Paper or Ask Questions

Lost in Interpreting: Speech Translation from Source or Interpreter?

Jun 17, 2021

Dominik Macháček, Matúš Žilinec, Ondřej Bojar

Figure 1 for Lost in Interpreting: Speech Translation from Source or Interpreter?

Figure 2 for Lost in Interpreting: Speech Translation from Source or Interpreter?

Figure 3 for Lost in Interpreting: Speech Translation from Source or Interpreter?

Figure 4 for Lost in Interpreting: Speech Translation from Source or Interpreter?

Abstract:Interpreters facilitate multi-lingual meetings but the affordable set of languages is often smaller than what is needed. Automatic simultaneous speech translation can extend the set of provided languages. We investigate if such an automatic system should rather follow the original speaker, or an interpreter to achieve better translation quality at the cost of increased delay. To answer the question, we release Europarl Simultaneous Interpreting Corpus (ESIC), 10 hours of recordings and transcripts of European Parliament speeches in English, with simultaneous interpreting into Czech and German. We evaluate quality and latency of speaker-based and interpreter-based spoken translation systems from English to Czech. We study the differences in implicit simplification and summarization of the human interpreter compared to a machine translation system trained to shorten the output to some extent. Finally, we perform human evaluation to measure information loss of each of these approaches.

* to be published at INTERSPEECH 2021

Via

Access Paper or Ask Questions

Presenting Simultaneous Translation in Limited Space

Sep 18, 2020

Dominik Macháček, Ondřej Bojar

Figure 1 for Presenting Simultaneous Translation in Limited Space

Figure 2 for Presenting Simultaneous Translation in Limited Space

Figure 3 for Presenting Simultaneous Translation in Limited Space

Figure 4 for Presenting Simultaneous Translation in Limited Space

Abstract:Some methods of automatic simultaneous translation of a long-form speech allow revisions of outputs, trading accuracy for low latency. Deploying these systems for users faces the problem of presenting subtitles in a limited space, such as two lines on a television screen. The subtitles must be shown promptly, incrementally, and with adequate time for reading. We provide an algorithm for subtitling. Furthermore, we propose a way how to estimate the overall usability of the combination of automatic translation and subtitling by measuring the quality, latency, and stability on a test set, and propose an improved measure for translation latency.

* ITAT WAFNL 2020

Via

Access Paper or Ask Questions

ELITR Non-Native Speech Translation at IWSLT 2020

Jun 05, 2020

Dominik Macháček, Jonáš Kratochvíl, Sangeet Sagar, Matúš Žilinec, Ondřej Bojar, Thai-Son Nguyen, Felix Schneider, Philip Williams, Yuekun Yao

Figure 1 for ELITR Non-Native Speech Translation at IWSLT 2020

Figure 2 for ELITR Non-Native Speech Translation at IWSLT 2020

Figure 3 for ELITR Non-Native Speech Translation at IWSLT 2020

Abstract:This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020. We describe systems for offline ASR, real-time ASR, and our cascaded approach to offline SLT and real-time SLT. We select our primary candidates from a pool of pre-existing systems, develop a new end-to-end general ASR system, and a hybrid ASR trained on non-native speech. The provided small validation set prevents us from carrying out a complex validation, but we submit all the unselected candidates for contrastive evaluation on the test set.

* IWSLT 2020

Via

Access Paper or Ask Questions