Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elena Shushkevich

SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels

Sep 21, 2023

Elena Shushkevich, Long Mai, Manuel V. Loureiro, Steven Derby, Tri Kurniawan Wijaya

Abstract:Nowadays, the use of intelligent systems to detect redundant information in news articles has become especially prevalent with the proliferation of news media outlets in order to enhance user experience. However, the heterogeneous nature of news can lead to spurious findings in these systems: Simple heuristics such as whether a pair of news are both about politics can provide strong but deceptive downstream performance. Segmenting news similarity datasets into topics improves the training of these models by forcing them to learn how to distinguish salient characteristics under more narrow domains. However, this requires the existence of topic-specific datasets, which are currently lacking. In this article, we propose a new dataset of similar news, SPICED, which includes seven topics: Crime & Law, Culture & Entertainment, Disasters & Accidents, Economy & Business, Politics & Conflicts, Science & Technology, and Sports. Futhermore, we present four distinct approaches for generating news pairs, which are used in the creation of datasets specifically designed for news similarity detection task. We benchmarked the created datasets using MinHash, BERT, SBERT, and SimCSE models.

Via

Access Paper or Ask Questions

TUDublin team at Constraint@AAAI2021 -- COVID19 Fake News Detection

Jan 14, 2021

Elena Shushkevich, John Cardiff

Figure 1 for TUDublin team at Constraint@AAAI2021 -- COVID19 Fake News Detection

Figure 2 for TUDublin team at Constraint@AAAI2021 -- COVID19 Fake News Detection

Figure 3 for TUDublin team at Constraint@AAAI2021 -- COVID19 Fake News Detection

Figure 4 for TUDublin team at Constraint@AAAI2021 -- COVID19 Fake News Detection

Abstract:The paper is devoted to the participation of the TUDublin team in Constraint@AAAI2021 - COVID19 Fake News Detection Challenge. Today, the problem of fake news detection is more acute than ever in connection with the pandemic. The number of fake news is increasing rapidly and it is necessary to create AI tools that allow us to identify and prevent the spread of false information about COVID-19 urgently. The main goal of the work was to create a model that would carry out a binary classification of messages from social media as real or fake news in the context of COVID-19. Our team constructed the ensemble consisting of Bidirectional Long Short Term Memory, Support Vector Machine, Logistic Regression, Naive Bayes and a combination of Logistic Regression and Naive Bayes. The model allowed us to achieve 0.94 F1-score, which is within 5\% of the best result.

* 8 pages

Via

Access Paper or Ask Questions