Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Karina Shyrokykh

Short text classification with machine learning in the social sciences: The case of climate change on Twitter

Oct 03, 2023

Karina Shyrokykh, Maksym Girnyk, Lisa Dellmuth

Figure 1 for Short text classification with machine learning in the social sciences: The case of climate change on Twitter

Figure 2 for Short text classification with machine learning in the social sciences: The case of climate change on Twitter

Figure 3 for Short text classification with machine learning in the social sciences: The case of climate change on Twitter

Figure 4 for Short text classification with machine learning in the social sciences: The case of climate change on Twitter

Abstract:To analyse large numbers of texts, social science researchers are increasingly confronting the challenge of text classification. When manual labeling is not possible and researchers have to find automatized ways to classify texts, computer science provides a useful toolbox of machine-learning methods whose performance remains understudied in the social sciences. In this article, we compare the performance of the most widely used text classifiers by applying them to a typical research scenario in social science research: a relatively small labeled dataset with infrequent occurrence of categories of interest, which is a part of a large unlabeled dataset. As an example case, we look at Twitter communication regarding climate change, a topic of increasing scholarly interest in interdisciplinary social science research. Using a novel dataset including 5,750 tweets from various international organizations regarding the highly ambiguous concept of climate change, we evaluate the performance of methods in automatically classifying tweets based on whether they are about climate change or not. In this context, we highlight two main findings. First, supervised machine-learning methods perform better than state-of-the-art lexicons, in particular as class balance increases. Second, traditional machine-learning methods, such as logistic regression and random forest, perform similarly to sophisticated deep-learning methods, whilst requiring much less training time and computational resources. The results have important implications for the analysis of short texts in social science research.

* PLoS ONE 18(9): e0290762 (2023)

Via

Access Paper or Ask Questions