Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Natalia Loukachevitch

Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models

Jun 07, 2025

Mikhail Salnikov, Dmitrii Korzh, Ivan Lazichny, Elvir Karimov, Artyom Iudin, Ivan Oseledets, Oleg Y. Rogov, Alexander Panchenko, Natalia Loukachevitch, Elena Tutubalina

Abstract:This paper evaluates geopolitical biases in LLMs with respect to various countries though an analysis of their interpretation of historical events with conflicting national perspectives (USA, UK, USSR, and China). We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries. Our findings show significant geopolitical biases, with models favoring specific national narratives. Additionally, simple debiasing prompts had a limited effect in reducing these biases. Experiments with manipulated participant labels reveal models' sensitivity to attribution, sometimes amplifying biases or recognizing inconsistencies, especially with swapped labels. This work highlights national narrative biases in LLMs, challenges the effectiveness of simple debiasing methods, and offers a framework and dataset for future geopolitical bias research.

Via

Access Paper or Ask Questions

Methods for Recognizing Nested Terms

Apr 22, 2025

Igor Rozhkov, Natalia Loukachevitch

Abstract:In this paper, we describe our participation in the RuTermEval competition devoted to extracting nested terms. We apply the Binder model, which was previously successfully applied to the recognition of nested named entities, to extract nested terms. We obtained the best results of term recognition in all three tracks of the RuTermEval competition. In addition, we study the new task of recognition of nested terms from flat training data annotated with terms without nestedness. We can conclude that several approaches we proposed in this work are viable enough to retrieve nested terms effectively without nested labeling of them.

* To be published in Computational Linguistics and Intellectual Technologies proceedings

Via

Access Paper or Ask Questions

Building Russian Benchmark for Evaluation of Information Retrieval Models

Apr 17, 2025

Grigory Kovalev, Mikhail Tikhomirov, Evgeny Kozhevnikov, Max Kornilov, Natalia Loukachevitch

Abstract:We introduce RusBEIR, a comprehensive benchmark designed for zero-shot evaluation of information retrieval (IR) models in the Russian language. Comprising 17 datasets from various domains, it integrates adapted, translated, and newly created datasets, enabling systematic comparison of lexical and neural models. Our study highlights the importance of preprocessing for lexical models in morphologically rich languages and confirms BM25 as a strong baseline for full-document retrieval. Neural models, such as mE5-large and BGE-M3, demonstrate superior performance on most datasets, but face challenges with long-document retrieval due to input size constraints. RusBEIR offers a unified, open-source framework that promotes research in Russian-language information retrieval.

Via

Access Paper or Ask Questions

RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts

Apr 09, 2025

Natalia Loukachevitch, Natalia Tkachenko, Anna Lapanitsyna, Mikhail Tikhomirov, Nicolay Rusnachenko

Abstract:In this paper, we introduce the Dialogue Evaluation shared task on extraction of structured opinions from Russian news texts. The task of the contest is to extract opinion tuples for a given sentence; the tuples are composed of a sentiment holder, its target, an expression and sentiment from the holder to the target. In total, the task received more than 100 submissions. The participants experimented mainly with large language models in zero-shot, few-shot and fine-tuning formats. The best result on the test set was obtained with fine-tuning of a large language model. We also compared 30 prompts and 11 open source language models with 3-32 billion parameters in the 1-shot and 10-shot settings and found the best models and prompts.

* RuOpinionNE-2024 represent a proceeding of RuSentNE-2023. It contributes with extraction and evaluation of factual statements that support the assigned sentiment

Via

Access Paper or Ask Questions

Large Language Models in Targeted Sentiment Analysis

Apr 18, 2024

Nicolay Rusnachenko, Anton Golubev, Natalia Loukachevitch

Figure 1 for Large Language Models in Targeted Sentiment Analysis

Figure 2 for Large Language Models in Targeted Sentiment Analysis

Figure 3 for Large Language Models in Targeted Sentiment Analysis

Figure 4 for Large Language Models in Targeted Sentiment Analysis

Abstract:In this paper we investigate the use of decoder-based generative transformers for extracting sentiment towards the named entities in Russian news articles. We study sentiment analysis capabilities of instruction-tuned large language models (LLMs). We consider the dataset of RuSentNE-2023 in our study. The first group of experiments was aimed at the evaluation of zero-shot capabilities of LLMs with closed and open transparencies. The second covers the fine-tuning of Flan-T5 using the "chain-of-thought" (CoT) three-hop reasoning framework (THoR). We found that the results of the zero-shot approaches are similar to the results achieved by baseline fine-tuned encoder-based transformers (BERT-base). Reasoning capabilities of the fine-tuned Flan-T5 models with THoR achieve at least 5% increment with the base-size model compared to the results of the zero-shot experiment. The best results of sentiment analysis on RuSentNE-2023 were achieved by fine-tuned Flan-T5-xl, which surpassed the results of previous state-of-the-art transformer-based classifiers. Our CoT application framework is publicly available: https://github.com/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework

* Fine-tuned Flan-T5-xl outperforms the top #1 results of transformer-based classifier in RuSentNE-2023 competition, to appear in Lobachevskii Journal of Mathematics No.8/2024 proceedings

Via

Access Paper or Ask Questions

Exploring Prompt-Based Methods for Zero-Shot Hypernym Prediction with Large Language Models

Jan 09, 2024

Mikhail Tikhomirov, Natalia Loukachevitch

Abstract:This article investigates a zero-shot approach to hypernymy prediction using large language models (LLMs). The study employs a method based on text probability calculation, applying it to various generated prompts. The experiments demonstrate a strong correlation between the effectiveness of language model prompts and classic patterns, indicating that preliminary prompt selection can be carried out using smaller models before moving to larger ones. We also explore prompts for predicting co-hyponyms and improving hypernymy predictions by augmenting prompts with additional information through automatically identified co-hyponyms. An iterative approach is developed for predicting higher-level concepts, which further improves the quality on the BLESS dataset (MAP = 0.8).

Via

Access Paper or Ask Questions

RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts

May 28, 2023

Anton Golubev, Nicolay Rusnachenko, Natalia Loukachevitch

Figure 1 for RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts

Figure 2 for RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts

Figure 3 for RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts

Figure 4 for RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts

Abstract:The paper describes the RuSentNE-2023 evaluation devoted to targeted sentiment analysis in Russian news texts. The task is to predict sentiment towards a named entity in a single sentence. The dataset for RuSentNE-2023 evaluation is based on the Russian news corpus RuSentNE having rich sentiment-related annotation. The corpus is annotated with named entities and sentiments towards these entities, along with related effects and emotional states. The evaluation was organized using the CodaLab competition framework. The main evaluation measure was macro-averaged measure of positive and negative classes. The best results achieved were of 66% Macro F-measure (Positive+Negative classes). We also tested ChatGPT on the test set from our evaluation and found that the zero-shot answers provided by ChatGPT reached 60% of the F-measure, which corresponds to 4th place in the evaluation. ChatGPT also provided detailed explanations of its conclusion. This can be considered as quite high for zero-shot application.

* 12 pages, 5 tables, 3 figures

Via

Access Paper or Ask Questions

RuArg-2022: Argument Mining Evaluation

Jun 18, 2022

Evgeny Kotelnikov, Natalia Loukachevitch, Irina Nikishina, Alexander Panchenko

Figure 1 for RuArg-2022: Argument Mining Evaluation

Figure 2 for RuArg-2022: Argument Mining Evaluation

Figure 3 for RuArg-2022: Argument Mining Evaluation

Figure 4 for RuArg-2022: Argument Mining Evaluation

Abstract:Argumentation analysis is a field of computational linguistics that studies methods for extracting arguments from texts and the relationships between them, as well as building argumentation structure of texts. This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts within the framework of the Dialogue conference. During the competition, the participants were offered two tasks: stance detection and argument classification. A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic (vaccination, quarantine, and wearing masks) was prepared, annotated, and used for training and testing. The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture, automatic translation into English to apply a specialized BERT model, retrained on Twitter posts discussing COVID-19, as well as additional masking of target entities. This system showed the following results: for the stance detection task an F1-score of 0.6968, for the argument classification task an F1-score of 0.7404. We hope that the prepared dataset and baselines will help to foster further research on argument mining for the Russian language.

* Accepted by Dialogue-2022 conference

Via

Access Paper or Ask Questions

RuNNE-2022 Shared Task: Recognizing Nested Named Entities

May 23, 2022

Ekaterina Artemova, Maxim Zmeev, Natalia Loukachevitch, Igor Rozhkov, Tatiana Batura, Vladimir Ivanov, Elena Tutubalina

Figure 1 for RuNNE-2022 Shared Task: Recognizing Nested Named Entities

Figure 2 for RuNNE-2022 Shared Task: Recognizing Nested Named Entities

Figure 3 for RuNNE-2022 Shared Task: Recognizing Nested Named Entities

Figure 4 for RuNNE-2022 Shared Task: Recognizing Nested Named Entities

Abstract:The RuNNE Shared Task approaches the problem of nested named entity recognition. The annotation schema is designed in such a way, that an entity may partially overlap or even be nested into another entity. This way, the named entity "The Yermolova Theatre" of type "organization" houses another entity "Yermolova" of type "person". We adopt the Russian NEREL dataset for the RuNNE Shared Task. NEREL comprises news texts written in the Russian language and collected from the Wikinews portal. The annotation schema includes 29 entity types. The nestedness of named entities in NEREL reaches up to six levels. The RuNNE Shared Task explores two setups. (i) In the general setup all entities occur more or less with the same frequency. (ii) In the few-shot setup the majority of entity types occur often in the training set. However, some of the entity types are have lower frequency, being thus challenging to recognize. In the test set the frequency of all entity types is even. This paper reports on the results of the RuNNE Shared Task. Overall the shared task has received 156 submissions from nine teams. Half of the submissions outperform a straightforward BERT-based baseline in both setups. This paper overviews the shared task setup and discusses the submitted systems, discovering meaning insights for the problem of nested NER. The links to the evaluation platform and the data from the shared task are available in our github repository: https://github.com/dialogue-evaluation/RuNNE.

* To appear in Dialogue 2022

Via

Access Paper or Ask Questions

Taxonomy Enrichment with Text and Graph Vector Representations

Jan 21, 2022

Irina Nikishina, Mikhail Tikhomirov, Varvara Logacheva, Yuriy Nazarov, Alexander Panchenko, Natalia Loukachevitch

Figure 1 for Taxonomy Enrichment with Text and Graph Vector Representations

Figure 2 for Taxonomy Enrichment with Text and Graph Vector Representations

Figure 3 for Taxonomy Enrichment with Text and Graph Vector Representations

Figure 4 for Taxonomy Enrichment with Text and Graph Vector Representations

Abstract:Knowledge graphs such as DBpedia, Freebase or Wikidata always contain a taxonomic backbone that allows the arrangement and structuring of various concepts in accordance with the hypo-hypernym ("class-subclass") relationship. With the rapid growth of lexical resources for specific domains, the problem of automatic extension of the existing knowledge bases with new words is becoming more and more widespread. In this paper, we address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy. We present a new method that allows achieving high results on this task with little effort. It uses the resources which exist for the majority of languages, making the method universal. We extend our method by incorporating deep representations of graph structures like node2vec, Poincar\'e embeddings, GCN etc. that have recently demonstrated promising results on various NLP tasks. Furthermore, combining these representations with word embeddings allows us to beat the state of the art. We conduct a comprehensive study of the existing approaches to taxonomy enrichment based on word and graph vector representations and their fusion approaches. We also explore the ways of using deep learning architectures to extend the taxonomic backbones of knowledge graphs. We create a number of datasets for taxonomy extension for English and Russian. We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.

Via

Access Paper or Ask Questions