Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jindřich Helcl

Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics

SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes

Apr 16, 2025

Raúl Vázquez, Timothee Mickus, Elaine Zosa, Teemu Vahtola, Jörg Tiedemann, Aman Sinha, Vincent Segonne, Fernando Sánchez-Vega, Alessandro Raganato, Jindřich Libovický(+8 more)

Abstract:We present the Mu-SHROOM shared task which is focused on detecting hallucinations and other overgeneration mistakes in the output of instruction-tuned large language models (LLMs). Mu-SHROOM addresses general-purpose LLMs in 14 languages, and frames the hallucination detection problem as a span-labeling task. We received 2,618 submissions from 43 participating teams employing diverse methodologies. The large number of submissions underscores the interest of the community in hallucination detection. We present the results of the participating systems and conduct an empirical analysis to identify key factors contributing to strong performance in this task. We also emphasize relevant current challenges, notably the varying degree of hallucinations across languages and the high annotator disagreement when labeling hallucination spans.

* Mu-SHROOM is part of SemEval-2025 (Task 3). TBP: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Via

Access Paper or Ask Questions

Atyaephyra at SemEval-2025 Task 4: Low-Rank NPO

Mar 17, 2025

Jan Bronec, Jindřich Helcl

Abstract:We present a submission to the SemEval 2025 shared task on unlearning sensitive content from LLMs. Our approach employs negative preference optimization using low-rank adaptation. We show that we can utilize this combination to cheaply compute additional regularization terms, which help with unlearning stabilization. The results of our approach significantly exceed the shared task baselines.

* 5 pages, 1 figure, 1 table, submitted to SemEval proceedings for ACL Anthology

Via

Access Paper or Ask Questions

Teaching LLMs at Charles University: Assignments and Activities

Jul 29, 2024

Jindřich Helcl, Zdeněk Kasner, Ondřej Dušek, Tomasz Limisiewicz, Dominik Macháček, Tomáš Musil, Jindřich Libovický

Abstract:This paper presents teaching materials, particularly assignments and ideas for classroom activities, from a new course on large language models (LLMs) taught at Charles University. The assignments include experiments with LLM inference for weather report generation and machine translation. The classroom activities include class quizzes, focused research on downstream tasks and datasets, and an interactive "best paper" session aimed at reading and comprehension of research papers.

* 6th TeachNLP workshop at ACL 2024

Via

Access Paper or Ask Questions

Lexically Grounded Subword Segmentation

Jun 19, 2024

Jindřich Libovický, Jindřich Helcl

Figure 1 for Lexically Grounded Subword Segmentation

Figure 2 for Lexically Grounded Subword Segmentation

Figure 3 for Lexically Grounded Subword Segmentation

Figure 4 for Lexically Grounded Subword Segmentation

Abstract:We present three innovations in tokenization and subword segmentation. First, we propose to use unsupervised morphological analysis with Morfessor as pre-tokenization. Second, we present an algebraic method for obtaining subword embeddings grounded in a word embedding space. Based on that, we design a novel subword segmentation algorithm that uses the embeddings, ensuring that the procedure considers lexical meaning. Third, we introduce an efficient segmentation algorithm based on a subword bigram model that can be initialized with the lexically aware segmentation method to avoid using Morfessor and large embedding tables at inference time. We evaluate the proposed approaches using two intrinsic metrics and measure their performance on two downstream tasks: part-of-speech tagging and machine translation. Our experiments show significant improvements in the morphological plausibility of the segmentation when evaluated using segmentation precision on morpheme boundaries and improved R\'enyi efficiency in 8 languages. Although the proposed tokenization methods do not have a large impact on automatic translation quality, we observe consistent performance gains in the arguably more morphological task of part-of-speech tagging.

* 8 pages (+ 8 pages appendix), 2 figures

Via

Access Paper or Ask Questions

Charles Translator: A Machine Translation System between Ukrainian and Czech

Apr 10, 2024

Martin Popel, Lucie Poláková, Michal Novák, Jindřich Helcl, Jindřich Libovický, Pavel Straňák, Tomáš Krabač, Jaroslava Hlaváčová, Mariia Anisimova, Tereza Chlaňová

Figure 1 for Charles Translator: A Machine Translation System between Ukrainian and Czech

Figure 2 for Charles Translator: A Machine Translation System between Ukrainian and Czech

Figure 3 for Charles Translator: A Machine Translation System between Ukrainian and Czech

Figure 4 for Charles Translator: A Machine Translation System between Ukrainian and Czech

Abstract:We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society. The system was developed in the spring of 2022 with the help of many language data providers in order to quickly meet the demand for such a service, which was not available at the time in the required quality. The translator was later implemented as an online web interface and as an Android app with speech input, both featuring Cyrillic-Latin script transliteration. The system translates directly, compared to other available systems that use English as a pivot, and thus take advantage of the typological similarity of the two languages. It uses the block back-translation method, which allows for efficient use of monolingual training data. The paper describes the development process, including data collection and implementation, evaluation, mentions several use cases, and outlines possibilities for the further development of the system for educational purposes.

Via

Access Paper or Ask Questions

OpusCleaner and OpusTrainer, open source toolkits for training Machine Translation and Large language models

Nov 24, 2023

Nikolay Bogoychev, Jelmer van der Linde, Graeme Nail, Barry Haddow, Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Lukas Weymann, Tudor Nicolae Mateiu, Jindřich Helcl, Mikko Aulamo

Abstract:Developing high quality machine translation systems is a labour intensive, challenging and confusing process for newcomers to the field. We present a pair of tools OpusCleaner and OpusTrainer that aim to simplify the process, reduce the amount of work and lower the entry barrier for newcomers. OpusCleaner is a data downloading, cleaning, and proprocessing toolkit. It is designed to allow researchers to quickly download, visualise and preprocess bilingual (or monolingual) data that comes from many different sources, each of them with different quality, issues, and unique filtering/preprocessing requirements. OpusTrainer is a data scheduling and data augmenting tool aimed at building large scale, robust machine translation systems and large language models. It features deterministic data mixing from many different sources, on-the-fly data augmentation and more. Using these tools, we showcase how we can use it to create high quality machine translation model robust to noisy user input; multilingual models and terminology aware models.

* Code on Github: https://github.com/hplt-project/OpusCleaner and https://github.com/hplt-project/OpusTrainer

Via

Access Paper or Ask Questions

CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval

Oct 25, 2023

Jindřich Helcl, Jindřich Libovický

Abstract:We present the Charles University system for the MRL~2023 Shared Task on Multi-lingual Multi-task Information Retrieval. The goal of the shared task was to develop systems for named entity recognition and question answering in several under-represented languages. Our solutions to both subtasks rely on the translate-test approach. We first translate the unlabeled examples into English using a multilingual machine translation model. Then, we run inference on the translated data using a strong task-specific model. Finally, we project the labeled data back into the original language. To keep the inferred tags on the correct positions in the original language, we propose a method based on scoring the candidate positions using a label-sensitive translation model. In both settings, we experiment with finetuning the classification models on the translated data. However, due to a domain mismatch between the development data and the shared task validation and test sets, the finetuned models could not outperform our baselines.

* 8 pages, 2 figures; System description paper at the MRL 2023 workshop at EMNLP 2023

Via

Access Paper or Ask Questions

CUNI Systems for the WMT22 Czech-Ukrainian Translation Task

Dec 01, 2022

Martin Popel, Jindřich Libovický, Jindřich Helcl

Abstract:We present Charles University submissions to the WMT22 General Translation Shared Task on Czech-Ukrainian and Ukrainian-Czech machine translation. We present two constrained submissions based on block back-translation and tagged back-translation and experiment with rule-based romanization of Ukrainian. Our results show that the romanization only has a minor effect on the translation quality. Further, we describe Charles Translator, a system that was developed in March 2022 as a response to the migration from Ukraine to the Czech Republic. Compared to our constrained systems, it did not use the romanization and used some proprietary data sources.

* 6 pages; System description paper at WMT22

Via

Access Paper or Ask Questions

CUNI Non-Autoregressive System for the WMT 22 Efficient Translation Shared Task

Dec 01, 2022

Jindřich Helcl

Abstract:We present a non-autoregressive system submission to the WMT 22 Efficient Translation Shared Task. Our system was used by Helcl et al. (2022) in an attempt to provide fair comparison between non-autoregressive and autoregressive models. This submission is an effort to establish solid baselines along with sound evaluation methodology, particularly in terms of measuring the decoding speed. The model itself is a 12-layer Transformer model trained with connectionist temporal classification on knowledge-distilled dataset by a strong autoregressive teacher model.

Via

Access Paper or Ask Questions

Non-Autoregressive Machine Translation: It's Not as Fast as it Seems

May 04, 2022

Jindřich Helcl, Barry Haddow, Alexandra Birch

Figure 1 for Non-Autoregressive Machine Translation: It's Not as Fast as it Seems

Figure 2 for Non-Autoregressive Machine Translation: It's Not as Fast as it Seems

Figure 3 for Non-Autoregressive Machine Translation: It's Not as Fast as it Seems

Figure 4 for Non-Autoregressive Machine Translation: It's Not as Fast as it Seems

Abstract:Efficient machine translation models are commercially important as they can increase inference speeds, and reduce costs and carbon emissions. Recently, there has been much interest in non-autoregressive (NAR) models, which promise faster translation. In parallel to the research on NAR models, there have been successful attempts to create optimized autoregressive models as part of the WMT shared task on efficient translation. In this paper, we point out flaws in the evaluation methodology present in the literature on NAR models and we provide a fair comparison between a state-of-the-art NAR model and the autoregressive submissions to the shared task. We make the case for consistent evaluation of NAR models, and also for the importance of comparing NAR models with other widely used methods for improving efficiency. We run experiments with a connectionist-temporal-classification-based (CTC) NAR model implemented in C++ and compare it with AR models using wall clock times. Our results show that, although NAR models are faster on GPUs, with small batch sizes, they are almost always slower under more realistic usage conditions. We call for more realistic and extensive evaluation of NAR models in future work.

* NAACL 2022, Camera-ready

Via

Access Paper or Ask Questions