Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ivan Smirnov

RLBenchNet: The Right Network for the Right Reinforcement Learning Task

May 21, 2025

Ivan Smirnov, Shangding Gu

Abstract:Reinforcement learning (RL) has seen significant advancements through the application of various neural network architectures. In this study, we systematically investigate the performance of several neural networks in RL tasks, including Long Short-Term Memory (LSTM), Multi-Layer Perceptron (MLP), Mamba/Mamba-2, Transformer-XL, Gated Transformer-XL, and Gated Recurrent Unit (GRU). Through comprehensive evaluation across continuous control, discrete decision-making, and memory-based environments, we identify architecture-specific strengths and limitations. Our results reveal that: (1) MLPs excel in fully observable continuous control tasks, providing an optimal balance of performance and efficiency; (2) recurrent architectures like LSTM and GRU offer robust performance in partially observable environments with moderate memory requirements; (3) Mamba models achieve a 4.5x higher throughput compared to LSTM and a 3.9x increase over GRU, all while maintaining comparable performance; and (4) only Transformer-XL, Gated Transformer-XL, and Mamba-2 successfully solve the most challenging memory-intensive tasks, with Mamba-2 requiring 8x less memory than Transformer-XL. These findings provide insights for researchers and practitioners, enabling more informed architecture selection based on specific task characteristics and computational constraints. Code is available at: https://github.com/SafeRL-Lab/RLBenchNet

Via

Access Paper or Ask Questions

Mental Disorders Detection in the Era of Large Language Models

Oct 09, 2024

Gleb Kuzmin, Petr Strepetov, Maksim Stankevich, Ivan Smirnov, Artem Shelmanov

Figure 1 for Mental Disorders Detection in the Era of Large Language Models

Figure 2 for Mental Disorders Detection in the Era of Large Language Models

Figure 3 for Mental Disorders Detection in the Era of Large Language Models

Figure 4 for Mental Disorders Detection in the Era of Large Language Models

Abstract:This paper compares the effectiveness of traditional machine learning methods, encoder-based models, and large language models (LLMs) on the task of detecting depression and anxiety. Five datasets were considered, each differing in format and the method used to define the target pathology class. We tested AutoML models based on linguistic features, several variations of encoder-based Transformers such as BERT, and state-of-the-art LLMs as pathology classification models. The results demonstrated that LLMs outperform traditional methods, particularly on noisy and small datasets where training examples vary significantly in text length and genre. However, psycholinguistic features and encoder-based models can achieve performance comparable to language models when trained on texts from individuals with clinically confirmed depression, highlighting their potential effectiveness in targeted clinical applications.

Via

Access Paper or Ask Questions

Inference-Time Selective Debiasing

Jul 27, 2024

Gleb Kuzmin, Nemeesh Yadav, Ivan Smirnov, Timothy Baldwin, Artem Shelmanov

Figure 1 for Inference-Time Selective Debiasing

Figure 2 for Inference-Time Selective Debiasing

Figure 3 for Inference-Time Selective Debiasing

Figure 4 for Inference-Time Selective Debiasing

Abstract:We propose selective debiasing -- an inference-time safety mechanism that aims to increase the overall quality of models in terms of prediction performance and fairness in the situation when re-training a model is prohibitive. The method is inspired by selective prediction, where some predictions that are considered low quality are discarded at inference time. In our approach, we identify the potentially biased model predictions and, instead of discarding them, we debias them using LEACE -- a post-processing debiasing method. To select problematic predictions, we propose a bias quantification approach based on KL divergence, which achieves better results than standard UQ methods. Experiments with text classification datasets demonstrate that selective debiasing helps to close the performance gap between post-processing methods and at-training and pre-processing debiasing techniques.

Via

Access Paper or Ask Questions

A Language Model for Grammatical Error Correction in L2 Russian

Jul 04, 2023

Nikita Remnev, Sergei Obiedkov, Ekaterina Rakhilina, Ivan Smirnov, Anastasia Vyrenkova

Abstract:Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, we propose a pipeline involving a language model intended for correcting errors in L2 Russian writing. The language model proposed is trained on untagged texts of the Newspaper subcorpus of the Russian National Corpus, and the quality of the model is validated against the RULEC-GEC corpus.

Via

Access Paper or Ask Questions

Light Coreference Resolution for Russian with Hierarchical Discourse Features

Jun 02, 2023

Elena Chistova, Ivan Smirnov

Figure 1 for Light Coreference Resolution for Russian with Hierarchical Discourse Features

Figure 2 for Light Coreference Resolution for Russian with Hierarchical Discourse Features

Figure 3 for Light Coreference Resolution for Russian with Hierarchical Discourse Features

Figure 4 for Light Coreference Resolution for Russian with Hierarchical Discourse Features

Abstract:Coreference resolution is the task of identifying and grouping mentions referring to the same real-world entity. Previous neural models have mainly focused on learning span representations and pairwise scores for coreference decisions. However, current methods do not explicitly capture the referential choice in the hierarchical discourse, an important factor in coreference resolution. In this study, we propose a new approach that incorporates rhetorical information into neural coreference resolution models. We collect rhetorical features from automated discourse parses and examine their impact. As a base model, we implement an end-to-end span-based coreference resolver using a partially fine-tuned multilingual entity-aware language model LUKE. We evaluate our method on the RuCoCo-23 Shared Task for coreference resolution in Russian. Our best model employing rhetorical distance between mentions has ranked 1st on the development set (74.6% F1) and 2nd on the test set (73.3% F1) of the Shared Task. We hope that our work will inspire further research on incorporating discourse information in neural coreference resolution models.

* Accepted at Dialogue-2023 conference

Via

Access Paper or Ask Questions

Toxic comments reduce the activity of volunteer editors on Wikipedia

Apr 26, 2023

Ivan Smirnov, Camelia Oprea, Markus Strohmaier

Abstract:Wikipedia is one of the most successful collaborative projects in history. It is the largest encyclopedia ever created, with millions of users worldwide relying on it as the first source of information as well as for fact-checking and in-depth research. As Wikipedia relies solely on the efforts of its volunteer-editors, its success might be particularly affected by toxic speech. In this paper, we analyze all 57 million comments made on user talk pages of 8.5 million editors across the six most active language editions of Wikipedia to study the potential impact of toxicity on editors' behaviour. We find that toxic comments consistently reduce the activity of editors, leading to an estimated loss of 0.5-2 active days per user in the short term. This amounts to multiple human-years of lost productivity when considering the number of active contributors to Wikipedia. The effects of toxic comments are even greater in the long term, as they significantly increase the risk of editors leaving the project altogether. Using an agent-based model, we demonstrate that toxicity attacks on Wikipedia have the potential to impede the progress of the entire project. Our results underscore the importance of mitigating toxic speech on collaborative platforms such as Wikipedia to ensure their continued success.

Via

Access Paper or Ask Questions

The FairCeptron: A Framework for Measuring Human Perceptions of Algorithmic Fairness

Feb 08, 2021

Georg Ahnert, Ivan Smirnov, Florian Lemmerich, Claudia Wagner, Markus Strohmaier

Figure 1 for The FairCeptron: A Framework for Measuring Human Perceptions of Algorithmic Fairness

Abstract:Measures of algorithmic fairness often do not account for human perceptions of fairness that can substantially vary between different sociodemographics and stakeholders. The FairCeptron framework is an approach for studying perceptions of fairness in algorithmic decision making such as in ranking or classification. It supports (i) studying human perceptions of fairness and (ii) comparing these human perceptions with measures of algorithmic fairness. The framework includes fairness scenario generation, fairness perception elicitation and fairness perception analysis. We demonstrate the FairCeptron framework by applying it to a hypothetical university admission context where we collect human perceptions of fairness in the presence of minorities. An implementation of the FairCeptron framework is openly available, and it can easily be adapted to study perceptions of algorithmic fairness in other application contexts. We hope our work paves the way towards elevating the role of studies of human fairness perceptions in the process of designing algorithmic decision making systems.

* For source code of the implementation, see https://github.com/cssh-rwth/fairceptron

Via

Access Paper or Ask Questions

A better method to enforce monotonic constraints in regression and classification trees

Nov 02, 2020

Charles Auguste, Sean Malory, Ivan Smirnov

Figure 1 for A better method to enforce monotonic constraints in regression and classification trees

Figure 2 for A better method to enforce monotonic constraints in regression and classification trees

Figure 3 for A better method to enforce monotonic constraints in regression and classification trees

Figure 4 for A better method to enforce monotonic constraints in regression and classification trees

Abstract:In this report we present two new ways of enforcing monotone constraints in regression and classification trees. One yields better results than the current LightGBM, and has a similar computation time. The other one yields even better results, but is much slower than the current LightGBM. We also propose a heuristic that takes into account that greedily splitting a tree by choosing a monotone split with respect to its immediate gain is far from optimal. Then, we compare the results with the current implementation of the constraints in the LightGBM library, using the well known Adult public dataset. Throughout the report, we mostly focus on the implementation of our methods that we made for the LightGBM library, even though they are general and could be implemented in any regression or classification tree. The best method we propose (a smarter way to split the tree coupled to a penalization of monotone splits) consistently beats the current implementation of LightGBM. With small or average trees, the loss reduction can be as high as 1% in the early stages of training and decreases to around 0.1% at the loss peak for the Adult dataset. The results would be even better with larger trees. In our experiments, we didn't do a lot of tuning of the regularization parameters, and we wouldn't be surprised to see that increasing the performance of our methods on test sets.

Via

Access Paper or Ask Questions

Quota-based debiasing can decrease representation of already underrepresented groups

Jun 13, 2020

Ivan Smirnov, Florian Lemmerich, Markus Strohmaier

Figure 1 for Quota-based debiasing can decrease representation of already underrepresented groups

Figure 2 for Quota-based debiasing can decrease representation of already underrepresented groups

Figure 3 for Quota-based debiasing can decrease representation of already underrepresented groups

Abstract:Many important decisions in societies such as school admissions, hiring, or elections are based on the selection of top-ranking individuals from a larger pool of candidates. This process is often subject to biases, which typically manifest as an under-representation of certain groups among the selected or accepted individuals. The most common approach to this issue is debiasing, for example via the introduction of quotas that ensure proportional representation of groups with respect to a certain, often binary attribute. Cases include quotas for women on corporate boards or ethnic quotas in elections. This, however, has the potential to induce changes in representation with respect to other attributes. For the case of two correlated binary attributes we show that quota-based debiasing based on a single attribute can worsen the representation of already underrepresented groups and decrease overall fairness of selection. We use several data sets from a broad range of domains from recidivism risk assessments to scientific citations to assess this effect in real-world settings. Our results demonstrate the importance of including all relevant attributes in debiasing procedures and that more efforts need to be put into eliminating the root causes of inequalities as purely numerical solutions such as quota-based debiasing might lead to unintended consequences.

Via

Access Paper or Ask Questions

Generalizable prediction of academic performance from short texts on social media

Dec 01, 2019

Ivan Smirnov

Figure 1 for Generalizable prediction of academic performance from short texts on social media

Figure 2 for Generalizable prediction of academic performance from short texts on social media

Figure 3 for Generalizable prediction of academic performance from short texts on social media

Figure 4 for Generalizable prediction of academic performance from short texts on social media

Abstract:It has already been established that digital traces can be used to predict various human attributes. In most cases, however, predictive models rely on features that are specific to a particular source of digital trace data. In contrast, short texts written by users $-$ tweets, posts, or comments $-$ are ubiquitous across multiple platforms. In this paper, we explore the predictive power of short texts with respect to the academic performance of their authors. We use data from a representative panel of Russian students that includes information about their educational outcomes and activity on a popular networking site, VK. We build a model to predict academic performance from users' posts on VK and then apply it to a different context. In particular, we show that the model could reproduce rankings of schools and universities from the posts of their students on social media. We also find that the same model could predict academic performance from tweets as well as from VK posts. The generalizability of a model trained on a relatively small data set could be explained by the use of continuous word representations trained on a much larger corpus of social media posts. This also allows for greater interpretability of model predictions.

Via

Access Paper or Ask Questions