Abstract:This paper describes our system for SemEval 2025 Task 7: Previously Fact-Checked Claim Retrieval. The task requires retrieving relevant fact-checks for a given input claim from the extensive, multilingual MultiClaim dataset, which comprises social media posts and fact-checks in several languages. To address this challenge, we first evaluated zero-shot performance using state-of-the-art English and multilingual retrieval models and then fine-tuned the most promising systems, leveraging machine translation to enhance crosslingual retrieval. Our best model achieved an accuracy of 85% on crosslingual data and 92% on monolingual data.
Abstract:Sexism in online content is a pervasive issue that necessitates effective classification techniques to mitigate its harmful impact. Online platforms often have sexist comments and posts that create a hostile environment, especially for women and minority groups. This content not only spreads harmful stereotypes but also causes emotional harm. Reliable methods are essential to find and remove sexist content, making online spaces safer and more welcoming. Therefore, the sEXism Identification in Social neTworks (EXIST) challenge addresses this issue at CLEF 2024. This study aims to improve sexism identification in bilingual contexts (English and Spanish) by leveraging natural language processing models. The tasks are to determine whether a text is sexist and what the source intention behind it is. We fine-tuned the XLM-RoBERTa model and separately used GPT-3.5 with few-shot learning prompts to classify sexist content. The XLM-RoBERTa model exhibited robust performance in handling complex linguistic structures, while GPT-3.5's few-shot learning capability allowed for rapid adaptation to new data with minimal labeled examples. Our approach using XLM-RoBERTa achieved 4th place in the soft-soft evaluation of Task 1 (sexism identification). For Task 2 (source intention), we achieved 2nd place in the soft-soft evaluation.