Abstract:Stigma toward people who use substances (PWUS) is a leading barrier to seeking treatment. Further, those in treatment are more likely to drop out if they experience higher levels of stigmatization. While related concepts of hate speech and toxicity, including those targeted toward vulnerable populations, have been the focus of automatic content moderation research, stigma and, in particular, people who use substances have not. This paper explores stigma toward PWUS using a data set of roughly 5,000 public Reddit posts. We performed a crowd-sourced annotation task where workers are asked to annotate each post for the presence of stigma toward PWUS and answer a series of questions related to their experiences with substance use. Results show that workers who use substances or know someone with a substance use disorder are more likely to rate a post as stigmatizing. Building on this, we use a supervised machine learning framework that centers workers with lived substance use experience to label each Reddit post as stigmatizing. Modeling person-level demographics in addition to comment-level language results in a classification accuracy (as measured by AUC) of 0.69 -- a 17% increase over modeling language alone. Finally, we explore the linguist cues which distinguish stigmatizing content: PWUS substances and those who don't agree that language around othering ("people", "they") and terms like "addict" are stigmatizing, while PWUS (as opposed to those who do not) find discussions around specific substances more stigmatizing. Our findings offer insights into the nature of perceived stigma in substance use. Additionally, these results further establish the subjective nature of such machine learning tasks, highlighting the need for understanding their social contexts.
Abstract:Black Lives Matter (BLM) is a grassroots movement protesting violence towards Black individuals and communities with a focus on police brutality. The movement has gained significant media and political attention following the killings of Ahmaud Arbery, Breonna Taylor, and George Floyd and the shooting of Jacob Blake in 2020. Due to its decentralized nature, the #BlackLivesMatter social media hashtag has come to both represent the movement and been used as a call to action. Similar hashtags have appeared to counter the BLM movement, such as #AllLivesMatter and #BlueLivesMatter. We introduce a data set of 41.8 million tweets from 10 million users which contain one of the following keywords: BlackLivesMatter, AllLivesMatter and BlueLivesMatter. This data set contains all currently available tweets from the beginning of the BLM movement in 2013 to June 2020. We summarize the data set and show temporal trends in use of both the BlackLivesMatter keyword and keywords associated with counter movements. In the past, similarly themed, though much smaller in scope, BLM data sets have been used for studying discourse in protest and counter protest movements, predicting retweets, examining the role of social media in protest movements and exploring narrative agency. This paper open-sources a large-scale data set to facilitate research in the areas of computational social science, communications, political science, natural language processing, and machine learning.