Abstract:In this project, we describe a method of modeling semantic leadership across a set of communities associated with the #BlackLivesMatter movement, which has been informed by qualitative research on the structure of social media and Black Twitter in particular. We describe our bespoke approaches to time-binning, community clustering, and connecting communities over time, as well as our adaptation of state-of-the-art approaches to semantic change detection and semantic leadership induction. We find substantial evidence of the leadership role of BLM activists and progressives, as well as Black celebrities. We also find evidence of the sustained engagement of the conservative community with this discourse, suggesting an alternative explanation for how we arrived at the present moment, in which "anti-woke" and "anti-CRT" bills are being enacted nationwide.
Abstract:This paper presents a set of intersectional feminist principles for conducting equitable, ethical, and sustainable AI research. In Data Feminism (2020), we offered seven principles for examining and challenging unequal power in data science. Here, we present a rationale for why feminism remains deeply relevant for AI research, rearticulate the original principles of data feminism with respect to AI, and introduce two potential new principles related to environmental impact and consent. Together, these principles help to 1) account for the unequal, undemocratic, extractive, and exclusionary forces at work in AI research, development, and deployment; 2) identify and mitigate predictable harms in advance of unsafe, discriminatory, or otherwise oppressive systems being released into the world; and 3) inspire creative, joyful, and collective ways to work towards a more equitable, sustainable world in which all of us can thrive.
Abstract:Large language models' (LLMs) abilities are drawn from their pretraining data, and model development begins with data curation. However, decisions around what data is retained or removed during this initial stage is under-scrutinized. In our work, we ground web text, which is a popular pretraining data source, to its social and geographic contexts. We create a new dataset of 10.3 million self-descriptions of website creators, and extract information about who they are and where they are from: their topical interests, social roles, and geographic affiliations. Then, we conduct the first study investigating how ten "quality" and English language identification (langID) filters affect webpages that vary along these social dimensions. Our experiments illuminate a range of implicit preferences in data curation: we show that some quality classifiers act like topical domain filters, and langID can overlook English content from some regions of the world. Overall, we hope that our work will encourage a new line of research on pretraining data curation practices and its social implications.
Abstract:Agents must monitor their partners' affective states continuously in order to understand and engage in social interactions. However, methods for evaluating affect recognition do not account for changes in classification performance that may occur during occlusions or transitions between affective states. This paper addresses temporal patterns in affect classification performance in the context of an infant-robot interaction, where infants' affective states contribute to their ability to participate in a therapeutic leg movement activity. To support robustness to facial occlusions in video recordings, we trained infant affect recognition classifiers using both facial and body features. Next, we conducted an in-depth analysis of our best-performing models to evaluate how performance changed over time as the models encountered missing data and changing infant affect. During time windows when features were extracted with high confidence, a unimodal model trained on facial features achieved the same optimal performance as multimodal models trained on both facial and body features. However, multimodal models outperformed unimodal models when evaluated on the entire dataset. Additionally, model performance was weakest when predicting an affective state transition and improved after multiple predictions of the same affective state. These findings emphasize the benefits of incorporating body features in continuous affect recognition for infants. Our work highlights the importance of evaluating variability in model performance both over time and in the presence of missing data when applying affect recognition to social interactions.
Abstract:The abolitionist movement of the nineteenth-century United States remains among the most significant social and political movements in US history. Abolitionist newspapers played a crucial role in spreading information and shaping public opinion around a range of issues relating to the abolition of slavery. These newspapers also serve as a primary source of information about the movement for scholars today, resulting in powerful new accounts of the movement and its leaders. This paper supplements recent qualitative work on the role of women in abolition's vanguard, as well as the role of the Black press, with a quantitative text modeling approach. Using diachronic word embeddings, we identify which newspapers tended to lead lexical semantic innovations -- the introduction of new usages of specific words -- and which newspapers tended to follow. We then aggregate the evidence across hundreds of changes into a weighted network with the newspapers as nodes; directed edge weights represent the frequency with which each newspaper led the other in the adoption of a lexical semantic change. Analysis of this network reveals pathways of lexical semantic influence, distinguishing leaders from followers, as well as others who stood apart from the semantic changes that swept through this period. More specifically, we find that two newspapers edited by women -- THE PROVINCIAL FREEMAN and THE LILY -- led a large number of semantic changes in our corpus, lending additional credence to the argument that a multiracial coalition of women led the abolitionist movement in terms of both thought and action. It also contributes additional complexity to the scholarship that has sought to tease apart the relation of the abolitionist movement to the women's suffrage movement, and the vexed racial politics that characterized their relation.