Abstract:Online abuse and threats towards politicians have become a significant concern in the Netherlands, like in many other countries across the world. This paper analyses gender differences in abuse received by Dutch politicians on Twitter, while taking into account the possible additional impact of ethnic minority status. All tweets directed at party leaders throughout the entire year of 2022 were collected. The effect of gender and ethnic minority status were estimated for six different linguistic measures of abuse, namely, toxicity, severe toxicity, identity attacks, profanity, insults, and threats. Contrary to expectations, male politicians received higher levels of all forms of abuse, with the exception of threats, for which no significant gender difference was found. Significant interaction effects between gender and ethnic minority status were found for a number of abuse measures. In the case of severe toxicity, identity attacks, and profanity, female ethnic minority politicians were more severely impacted than their ethnic majority female colleagues, but not worse than male politicians. Finally, female ethnic minority politicians received the highest levels of threats compared to all groups. Given that online abuse and threats are reported to have a negative effect on political participation and retention, these results are particularly worrying.
Abstract:Besides far-reaching public health consequences, the COVID-19 pandemic had a significant psychological impact on people around the world. To gain further insight into this matter, we introduce the Real World Worry Waves Dataset (RW3D). The dataset combines rich open-ended free-text responses with survey data on emotions, significant life events, and psychological stressors in a repeated-measures design in the UK over three years (2020: n=2441, 2021: n=1716 and 2022: n=1152). This paper provides background information on the data collection procedure, the recorded variables, participants' demographics, and higher-order psychological and text-based derived variables that emerged from the data. The RW3D is a unique primary data resource that could inspire new research questions on the psychological impact of the pandemic, especially those that connect modalities (here: text data, psychological survey variables and demographics) over time.
Abstract:The introduction of COVID-19 lockdown measures and an outlook on return to normality are demanding societal changes. Among the most pressing questions is how individuals adjust to the pandemic. This paper examines the emotional responses to the pandemic in a repeated-measures design. Data (n=1698) were collected in April 2020 (during strict lockdown measures) and in April 2021 (when vaccination programmes gained traction). We asked participants to report their emotions and express these in text data. Statistical tests revealed an average trend towards better adjustment to the pandemic. However, clustering analyses suggested a more complex heterogeneous pattern with a well-coping and a resigning subgroup of participants. Linguistic computational analyses uncovered that topics and n-gram frequencies shifted towards attention to the vaccination programme and away from general worrying. Implications for public mental health efforts in identifying people at heightened risk are discussed. The dataset is made publicly available.
Abstract:This paper introduces the Grievance Dictionary, a psycholinguistic dictionary which can be used to automatically understand language use in the context of grievance-fuelled violence threat assessment. We describe the development the dictionary, which was informed by suggestions from experienced threat assessment practitioners. These suggestions and subsequent human and computational word list generation resulted in a dictionary of 20,502 words annotated by 2,318 participants. The dictionary was validated by applying it to texts written by violent and non-violent individuals, showing strong evidence for a difference between populations in several dictionary categories. Further classification tasks showed promising performance, but future improvements are still needed. Finally, we provide instructions and suggestions for the use of the Grievance Dictionary by security professionals and (violence) researchers.
Abstract:The problem of online threats and abuse could potentially be mitigated with a computational approach, where sources of abuse are better understood or identified through author profiling. However, abusive language constitutes a specific domain of language for which it has not yet been tested whether differences emerge based on a text author's personality, age, or gender. This study examines statistical relationships between author demographics and abusive vs normal language, and performs prediction experiments for personality, age, and gender. Although some statistical relationships were established between author characteristics and language use, these patterns did not translate to high prediction performance. Personality traits were predicted within 15% of their actual value, age was predicted with an error margin of 10 years, and gender was classified correctly in 70% of the cases. These results are poor when compared to previous research on author profiling, therefore we urge caution in applying this within the context of abusive language and threat assessment.
Abstract:The COVID-19 pandemic is having a dramatic impact on societies and economies around the world. With various measures of lockdowns and social distancing in place, it becomes important to understand emotional responses on a large scale. In this paper, we present the first ground truth dataset of emotional responses to COVID-19. We asked participants to indicate their emotions and express these in text. This resulted in the Real World Worry Dataset of 5,000 texts (2,500 short + 2,500 long texts). Our analyses suggest that emotional responses correlated with linguistic measures. Topic modeling further revealed that people in the UK worry about their family and the economic situation. Tweet-sized texts functioned as a call for solidarity, while longer texts shed light on worries and concerns. Using predictive modeling approaches, we were able to approximate the emotional responses of participants from text within 14% of their actual value. We encourage others to use the dataset and improve how we can use automated methods to learn about emotional responses and worries about an urgent problem.
Abstract:Among the critical challenges around the COVID-19 pandemic is dealing with potentially detrimental effects on people's mental health. Designing appropriate interventions and identifying the concerns of those most at risk requires methods that can extract worries, concerns and emotional responses from text data. We examine gender differences and the effect of document length on worries about the ongoing COVID-19 situation. Our findings suggest that i) shorter texts do not offer an as adequate insight into psychological processes as longer texts. We further find ii) marked gender differences in topics concerning emotional responses. Women worried more about their loved ones and severe health concerns while men were more occupied with effects on the economy and society. The findings align with general gender differences in language found elsewhere, but the current unique circumstances likely amplified these effects. We close this paper with a call for more high-quality datasets due to the limitations of Tweet-sized data.
Abstract:The media frequently describes the 2017 Charlottesville 'Unite the Right' rally as a turning point for the alt-right and white supremacist movements. Related research into social movements also suggests that the media attention and public discourse concerning the rally may have influenced the alt-right. Empirical evidence for these claims is largely lacking. The current study investigates potential effects of the rally by examining a dataset of 7,142 YouTube video transcripts from alt-right and progressive channels. We examine sentiment surrounding the ten most frequent keywords (single words and word pairs) in transcripts from each group, eight weeks before to eight weeks after the rally. In the majority of cases, no significant differences in sentiment were found within and between the alt-right and progressive groups, both pre- and post-Charlottesville. However, we did observe more negative sentiment trends surrounding 'Bernie Sanders' and 'black people' in the alt-right and progressive groups, respectively. We also observed more negative sentiment after the rally regarding 'Democratic Party' in the alt-right videos compared to the progressive videos. We suggest that the observed results potentially reflect minor changes in political sentiment before and after the rally, as well as differences in political sentiment between the alt-right and progressive groups in general.
Abstract:Vlogs provide a rich public source of data in a novel setting. This paper examined the continuous sentiment styles employed in 27,333 vlogs using a dynamic intra-textual approach to sentiment analysis. Using unsupervised clustering, we identified seven distinct continuous sentiment trajectories characterized by fluctuations of sentiment throughout a vlog's narrative time. We provide a taxonomy of these seven continuous sentiment styles and found that vlogs whose sentiment builds up towards a positive ending are the most prevalent in our sample. Gender was associated with preferences for different continuous sentiment trajectories. This paper discusses the findings with respect to previous work and concludes with an outlook towards possible uses of the corpus, method and findings of this paper for related areas of research.