Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Branislav Pecher

Revisiting Algorithmic Audits of TikTok: Poor Reproducibility and Short-term Validity of Findings

Apr 25, 2025

Matej Mosnar, Adam Skurla, Branislav Pecher, Matus Tibensky, Jan Jakubcik, Adrian Bindas, Peter Sakalik, Ivan Srba

Abstract:Social media platforms are constantly shifting towards algorithmically curated content based on implicit or explicit user feedback. Regulators, as well as researchers, are calling for systematic social media algorithmic audits as this shift leads to enclosing users in filter bubbles and leading them to more problematic content. An important aspect of such audits is the reproducibility and generalisability of their findings, as it allows to draw verifiable conclusions and audit potential changes in algorithms over time. In this work, we study the reproducibility of the existing sockpuppeting audits of TikTok recommender systems, and the generalizability of their findings. In our efforts to reproduce the previous works, we find multiple challenges stemming from social media platform changes and content evolution, but also the research works themselves. These drawbacks limit the audit reproducibility and require an extensive effort altogether with inevitable adjustments to the auditing methodology. Our experiments also reveal that these one-shot audit findings often hold only in the short term, implying that the reproducibility and generalizability of the audits heavily depend on the methodological choices and the state of algorithms and content on the platform. This highlights the importance of reproducible audits that allow us to determine how the situation changes in time.

* ACM SIGIR 2025. 10 pages

Via

Access Paper or Ask Questions

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Oct 14, 2024

Jan Cegin, Branislav Pecher, Jakub Simko, Ivan Srba, Maria Bielikova, Peter Brusilovsky

Figure 1 for Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Figure 2 for Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Figure 3 for Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Figure 4 for Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Abstract:The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.

Via

Access Paper or Ask Questions

Fighting Randomness with Randomness: Mitigating Optimisation Instability of Fine-Tuning using Delayed Ensemble and Noisy Interpolation

Jun 18, 2024

Branislav Pecher, Jan Cegin, Robert Belanec, Jakub Simko, Ivan Srba, Maria Bielikova

Abstract:While fine-tuning of pre-trained language models generally helps to overcome the lack of labelled training samples, it also displays model performance instability. This instability mainly originates from randomness in initialisation or data shuffling. To address this, researchers either modify the training process or augment the available samples, which typically results in increased computational costs. We propose a new mitigation strategy, called Delayed Ensemble with Noisy Interpolation (DENI), that leverages the strengths of ensembling, noise regularisation and model interpolation, while retaining computational efficiency. We compare DENI with 9 representative mitigation strategies across 3 models, 4 tuning strategies and 7 text classification datasets. We show that: 1) DENI outperforms the best performing mitigation strategy (Ensemble), while using only a fraction of its cost; 2) the mitigation strategies are beneficial for parameter-efficient fine-tuning (PEFT) methods, outperforming full fine-tuning in specific cases; and 3) combining DENI with data augmentation often leads to even more effective instability mitigation.

Via

Access Paper or Ask Questions

Fine-Tuning, Prompting, In-Context Learning and Instruction-Tuning: How Many Labelled Samples Do We Need?

Feb 20, 2024

Branislav Pecher, Ivan Srba, Maria Bielikova

Abstract:When solving a task with limited labelled data, researchers can either use a general large language model without further update, or use the few examples to tune a specialised smaller model. When enough labels are available, the specialised models outperform the general ones on many NLP tasks. In this work, we aim to investigate how many labelled samples are required for the specialised models to achieve this superior performance, while taking the results variance into consideration. Observing the behaviour of prompting, in-context learning, fine-tuning and instruction-tuning, identifying their break-even points when increasing number of labelled training samples across three tasks of varying complexity, we find that the specialised models often need only few samples ($100-1000$) to be on par or better than the general ones. At the same time, the amount of required labelled data strongly depends on the task complexity and results variance.

Via

Access Paper or Ask Questions

On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

Feb 20, 2024

Branislav Pecher, Ivan Srba, Maria Bielikova

Figure 1 for On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

Figure 2 for On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

Figure 3 for On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

Figure 4 for On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

Abstract:While learning with limited labelled data can improve performance when the labels are lacking, it is also sensitive to the effects of uncontrolled randomness introduced by so-called randomness factors (e.g., varying order of data). We propose a method to systematically investigate the effects of randomness factors while taking the interactions between them into consideration. To measure the true effects of an individual randomness factor, our method mitigates the effects of other factors and observes how the performance varies across multiple runs. Applying our method to multiple randomness factors across in-context learning and fine-tuning approaches on 7 representative text classification tasks and meta-learning on 3 tasks, we show that: 1) disregarding interactions between randomness factors in existing works caused inconsistent findings due to incorrect attribution of the effects of randomness factors, such as disproving the consistent sensitivity of in-context learning to sample order even with random sample selection; and 2) besides mutual interactions, the effects of randomness factors, especially sample order, are also dependent on more systematic choices unexplored in existing works, such as number of classes, samples per class or choice of prompt format.

Via

Access Paper or Ask Questions

Automatic Combination of Sample Selection Strategies for Few-Shot Learning

Feb 05, 2024

Branislav Pecher, Ivan Srba, Maria Bielikova, Joaquin Vanschoren

Abstract:In few-shot learning, such as meta-learning, few-shot fine-tuning or in-context learning, the limited number of samples used to train a model have a significant impact on the overall success. Although a large number of sample selection strategies exist, their impact on the performance of few-shot learning is not extensively known, as most of them have been so far evaluated in typical supervised settings only. In this paper, we thoroughly investigate the impact of 20 sample selection strategies on the performance of 5 few-shot learning approaches over 8 image and 6 text datasets. In addition, we propose a new method for automatic combination of sample selection strategies (ACSESS) that leverages the strengths and complementary information of the individual strategies. The experimental results show that our method consistently outperforms the individual selection strategies, as well as the recently proposed method for selecting support examples for in-context learning. We also show a strong modality, dataset and approach dependence for the majority of strategies as well as their dependence on the number of shots - demonstrating that the sample selection strategies play a significant role for lower number of shots, but regresses to random selection at higher number of shots.

Via

Access Paper or Ask Questions

Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Jan 12, 2024

Jan Cegin, Branislav Pecher, Jakub Simko, Ivan Srba, Maria Bielikova, Peter Brusilovsky

Figure 1 for Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Figure 2 for Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Figure 3 for Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Figure 4 for Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Abstract:The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune the model. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream models). In this study, we investigate three text diversity incentive methods well established in crowdsourcing: taboo words, hints by previous outlier solutions, and chaining on previous outlier solutions. Using these incentive methods as part of instructions to LLMs augmenting text datasets, we measure their effects on generated texts' lexical diversity and downstream model performance. We compare the effects over 5 different LLMs and 6 datasets. We show that diversity is most increased by taboo words, while downstream model performance is highest when previously created paraphrases are used as hints.

* 18 pages, 37 figures

Via

Access Paper or Ask Questions

On the Effects of Randomness on Stability of Learning with Limited Labelled Data: A Systematic Literature Review

Dec 02, 2023

Branislav Pecher, Ivan Srba, Maria Bielikova

Figure 1 for On the Effects of Randomness on Stability of Learning with Limited Labelled Data: A Systematic Literature Review

Figure 2 for On the Effects of Randomness on Stability of Learning with Limited Labelled Data: A Systematic Literature Review

Figure 3 for On the Effects of Randomness on Stability of Learning with Limited Labelled Data: A Systematic Literature Review

Figure 4 for On the Effects of Randomness on Stability of Learning with Limited Labelled Data: A Systematic Literature Review

Abstract:Learning with limited labelled data, such as few-shot learning, meta-learning or transfer learning, aims to effectively train a model using only small amount of labelled samples. However, these approaches were observed to be excessively sensitive to the effects of uncontrolled randomness caused by non-determinism in the training process. The randomness negatively affects the stability of the models, leading to large variance in results across training runs. When such instability is disregarded, it can unintentionally, but unfortunately also intentionally, create an imaginary perception of research progress. Recently, this area started to attract a research attention and the number of relevant studies is continuously growing. In this survey, we provide a comprehensive overview of 134 papers addressing the effects of randomness on the stability of learning with limited labelled data. We distinguish between four main tasks addressed in the papers (investigate/evaluate; determine; mitigate; benchmark/compare/report randomness effects), providing findings for each one. Furthermore, we identify and discuss seven challenges and open problems together with possible directions to facilitate further research. The ultimate goal of this survey is to emphasise the importance of this growing research area, which so far has not received appropriate level of attention.

Via

Access Paper or Ask Questions

KInITVeraAI at SemEval-2023 Task 3: Simple yet Powerful Multilingual Fine-Tuning for Persuasion Techniques Detection

Apr 24, 2023

Timo Hromadka, Timotej Smolen, Tomas Remis, Branislav Pecher, Ivan Srba

Abstract:This paper presents the best-performing solution to the SemEval 2023 Task 3 on the subtask 3 dedicated to persuasion techniques detection. Due to a high multilingual character of the input data and a large number of 23 predicted labels (causing a lack of labelled data for some language-label combinations), we opted for fine-tuning pre-trained transformer-based language models. Conducting multiple experiments, we find the best configuration, which consists of large multilingual model (XLM-RoBERTa large) trained jointly on all input data, with carefully calibrated confidence thresholds for seen and surprise languages separately. Our final system performed the best on 6 out of 9 languages (including two surprise languages) and achieved highly competitive results on the remaining three languages.

* System paper within SemEval 2023 Task 3 on the subtask 3

Via

Access Paper or Ask Questions

Auditing YouTube's Recommendation Algorithm for Misinformation Filter Bubbles

Oct 18, 2022

Ivan Srba, Robert Moro, Matus Tomlein, Branislav Pecher, Jakub Simko, Elena Stefancova, Michal Kompan, Andrea Hrckova, Juraj Podrouzek, Adrian Gavornik(+1 more)

Figure 1 for Auditing YouTube's Recommendation Algorithm for Misinformation Filter Bubbles

Figure 2 for Auditing YouTube's Recommendation Algorithm for Misinformation Filter Bubbles

Figure 3 for Auditing YouTube's Recommendation Algorithm for Misinformation Filter Bubbles

Figure 4 for Auditing YouTube's Recommendation Algorithm for Misinformation Filter Bubbles

Abstract:In this paper, we present results of an auditing study performed over YouTube aimed at investigating how fast a user can get into a misinformation filter bubble, but also what it takes to "burst the bubble", i.e., revert the bubble enclosure. We employ a sock puppet audit methodology, in which pre-programmed agents (acting as YouTube users) delve into misinformation filter bubbles by watching misinformation promoting content. Then they try to burst the bubbles and reach more balanced recommendations by watching misinformation debunking content. We record search results, home page results, and recommendations for the watched videos. Overall, we recorded 17,405 unique videos, out of which we manually annotated 2,914 for the presence of misinformation. The labeled data was used to train a machine learning model classifying videos into three classes (promoting, debunking, neutral) with the accuracy of 0.82. We use the trained model to classify the remaining videos that would not be feasible to annotate manually. Using both the manually and automatically annotated data, we observe the misinformation bubble dynamics for a range of audited topics. Our key finding is that even though filter bubbles do not appear in some situations, when they do, it is possible to burst them by watching misinformation debunking content (albeit it manifests differently from topic to topic). We also observe a sudden decrease of misinformation filter bubble effect when misinformation debunking videos are watched after misinformation promoting videos, suggesting a strong contextuality of recommendations. Finally, when comparing our results with a previous similar study, we do not observe significant improvements in the overall quantity of recommended misinformation content.

* Just accepted to ACM Transactions on Recommender Systems (ACM TORS). arXiv admin note: substantial text overlap with arXiv:2203.13769

Via

Access Paper or Ask Questions