Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniil Moskovskiy

<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Sep 10, 2025

Sergey Pletenev, Daniil Moskovskiy, Alexander Panchenko

Abstract:Modern Large Language Models (LLMs) are excellent at generating synthetic data. However, their performance in sensitive domains such as text detoxification has not received proper attention from the scientific community. This paper explores the possibility of using LLM-generated synthetic toxic data as an alternative to human-generated data for training models for detoxification. Using Llama 3 and Qwen activation-patched models, we generated synthetic toxic counterparts for neutral texts from ParaDetox and SST-2 datasets. Our experiments show that models fine-tuned on synthetic data consistently perform worse than those trained on human data, with a drop in performance of up to 30% in joint metrics. The root cause is identified as a critical lexical diversity gap: LLMs generate toxic content using a small, repetitive vocabulary of insults that fails to capture the nuances and variety of human toxicity. These findings highlight the limitations of current LLMs in this domain and emphasize the continued importance of diverse, human-annotated data for building robust detoxification systems.

Via

Access Paper or Ask Questions

Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models

May 23, 2025

Viktoriia Chekalina, Daniil Moskovskiy, Daria Cherniuk, Maxim Kurkin, Andrey Kuznetsov, Evgeny Frolov

Abstract:The Fisher information is a fundamental concept for characterizing the sensitivity of parameters in neural networks. However, leveraging the full observed Fisher information is too expensive for large models, so most methods rely on simple diagonal approximations. While efficient, this approach ignores parameter correlations, often resulting in reduced performance on downstream tasks. In this work, we mitigate these limitations and propose Generalized Fisher-Weighted SVD (GFWSVD), a post-training LLM compression technique that accounts for both diagonal and off-diagonal elements of the Fisher information matrix, providing a more accurate reflection of parameter importance. To make the method tractable, we introduce a scalable adaptation of the Kronecker-factored approximation algorithm for the observed Fisher information. We demonstrate the effectiveness of our method on LLM compression, showing improvements over existing compression baselines. For example, at a 20 compression rate on the MMLU benchmark, our method outperforms FWSVD, which is based on a diagonal approximation of the Fisher information, by 5 percent, SVD-LLM by 3 percent, and ASVD by 6 percent compression rate.

Via

Access Paper or Ask Questions

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Feb 20, 2025

Sergey Pletenev, Maria Marina, Daniil Moskovskiy, Vasily Konovalov, Pavel Braslavski, Alexander Panchenko, Mikhail Salnikov

Figure 1 for How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Figure 2 for How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Figure 3 for How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Figure 4 for How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Abstract:The performance of Large Language Models (LLMs) on many tasks is greatly limited by the knowledge learned during pre-training and stored in the model's parameters. Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of LLMs. In this study, we investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge. We fine-tuned Llama-3.1-8B-instruct using LoRA with varying amounts of new knowledge. Our experiments have shown that the best results are obtained when the training data contains a mixture of known and new facts. However, this approach is still potentially harmful because the model's performance on external question-answering benchmarks declines after such fine-tuning. When the training data is biased towards certain entities, the model tends to regress to few overrepresented answers. In addition, we found that the model becomes more confident and refuses to provide an answer in only few cases. These findings highlight the potential pitfalls of LoRA-based LLM updates and underscore the importance of training data composition and tuning parameters to balance new knowledge integration and general model capabilities.

Via

Access Paper or Ask Questions

SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators

Feb 10, 2025

Daniil Moskovskiy, Nikita Sushko, Sergey Pletenev, Elena Tutubalina, Alexander Panchenko

Abstract:Existing approaches to multilingual text detoxification are hampered by the scarcity of parallel multilingual datasets. In this work, we introduce a pipeline for the generation of multilingual parallel detoxification data. We also introduce SynthDetoxM, a manually collected and synthetically generated multilingual parallel text detoxification dataset comprising 16,000 high-quality detoxification sentence pairs across German, French, Spanish and Russian. The data was sourced from different toxicity evaluation datasets and then rewritten with nine modern open-source LLMs in few-shot setting. Our experiments demonstrate that models trained on the produced synthetic datasets have superior performance to those trained on the human-annotated MultiParaDetox dataset even in data limited setting. Models trained on SynthDetoxM outperform all evaluated LLMs in few-shot setting. We release our dataset and code to help further research in multilingual text detoxification.

* Accepted to NAACL 2025 Main Conference

Via

Access Paper or Ask Questions

Multilingual and Explainable Text Detoxification with Parallel Corpora

Dec 16, 2024

Daryna Dementieva, Nikolay Babakov, Amit Ronen, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Daniil Moskovskiy, Elisei Stakovskii(+4 more)

Abstract:Even with various regulations in place across countries and social media platforms (Government of India, 2021; European Parliament and Council of the European Union, 2022, digital abusive speech remains a significant issue. One potential approach to address this challenge is automatic text detoxification, a text style transfer (TST) approach that transforms toxic language into a more neutral or non-toxic form. To date, the availability of parallel corpora for the text detoxification task (Logachevavet al., 2022; Atwell et al., 2022; Dementievavet al., 2024a) has proven to be crucial for state-of-the-art approaches. With this work, we extend parallel text detoxification corpus to new languages -- German, Chinese, Arabic, Hindi, and Amharic -- testing in the extensive multilingual setup TST baselines. Next, we conduct the first of its kind an automated, explainable analysis of the descriptive features of both toxic and non-toxic sentences, diving deeply into the nuances, similarities, and differences of toxicity and detoxification across 9 languages. Finally, based on the obtained insights, we experiment with a novel text detoxification method inspired by the Chain-of-Thoughts reasoning approach, enhancing the prompting process through clustering on relevant descriptive attributes.

* COLING 2025, main conference, long

Via

Access Paper or Ask Questions

Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

Jun 27, 2024

Seid Muhie Yimam, Daryna Dementieva, Tim Fischer, Daniil Moskovskiy, Naquee Rizwan, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann(+1 more)

Abstract:Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcation scoring abusive speech based on four aspect -- (i) severity scale; (ii) presence of a target; (iii) context scale; (iv) legal scale -- and suggesting more options of actions like detoxification, counter speech generation, blocking, or, as a final measure, human intervention. Through a thorough analysis of abusive speech regulations across diverse jurisdictions, platforms, and research papers we highlight the gap in preventing measures and advocate for tailored proactive steps to combat its multifaceted manifestations. Our work aims to inform future strategies for effectively addressing abusive speech online.

Via

Access Paper or Ask Questions

MERA: A Comprehensive LLM Evaluation in Russian

Jan 12, 2024

Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev(+9 more)

Figure 1 for MERA: A Comprehensive LLM Evaluation in Russian

Figure 2 for MERA: A Comprehensive LLM Evaluation in Russian

Figure 3 for MERA: A Comprehensive LLM Evaluation in Russian

Figure 4 for MERA: A Comprehensive LLM Evaluation in Russian

Abstract:Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks.

* The paper version comparable with the release code v.1.1.0 of the benchmark. The links and scores are updated

Via

Access Paper or Ask Questions

Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

Nov 23, 2023

Daryna Dementieva, Daniil Moskovskiy, David Dale, Alexander Panchenko

Figure 1 for Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

Figure 2 for Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

Figure 3 for Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

Figure 4 for Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

Abstract:Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detoxification -- given a parallel detoxification corpus for one language; the goal is to transfer detoxification ability to another language for which we do not have such a corpus. Moreover, we are the first to explore a new task where text translation and detoxification are performed simultaneously, providing several strong baselines for this task. Finally, we introduce new automatic detoxification evaluation metrics with higher correlations with human judgments than previous benchmarks. We assess the most promising approaches also with manual markup, determining the answer for the best strategy to transfer the knowledge of text detoxification between languages.

* AACL 2023, main conference, long paper

Via

Access Paper or Ask Questions

Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Jun 05, 2022

Daniil Moskovskiy, Daryna Dementieva, Alexander Panchenko

Figure 1 for Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Figure 2 for Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Figure 3 for Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Figure 4 for Exploring Cross-lingual Textual Style Transfer with Large Multilingual Language Models

Abstract:Detoxification is a task of generating text in polite style while preserving meaning and fluency of the original toxic text. Existing detoxification methods are designed to work in one exact language. This work investigates multilingual and cross-lingual detoxification and the behavior of large multilingual models like in this setting. Unlike previous works we aim to make large language models able to perform detoxification without direct fine-tuning in given language. Experiments show that multilingual models are capable of performing multilingual style transfer. However, models are not able to perform cross-lingual detoxification and direct fine-tuning on exact language is inevitable.

Via

Access Paper or Ask Questions

Methods for Detoxification of Texts for the Russian Language

May 19, 2021

Daryna Dementieva, Daniil Moskovskiy, Varvara Logacheva, David Dale, Olga Kozlova, Nikita Semenov, Alexander Panchenko

Figure 1 for Methods for Detoxification of Texts for the Russian Language

Figure 2 for Methods for Detoxification of Texts for the Russian Language

Figure 3 for Methods for Detoxification of Texts for the Russian Language

Figure 4 for Methods for Detoxification of Texts for the Russian Language

Abstract:We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models - unsupervised approach based on BERT architecture that performs local corrections and supervised approach based on pretrained language GPT-2 model - and compare them with several baselines. In addition, we describe evaluation setup providing training datasets and metrics for automatic evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.

Via

Access Paper or Ask Questions