Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Brendan O'Connor

Multilingualism, Transnationality, and K-pop in the Online #StopAsianHate Movement

Mar 04, 2025

Tessa Masis, Zhangqi Duan, Weiai Wayne Xu, Ethan Zuckerman, Jane Yeahin Pyo, Brendan O'Connor

Abstract:The #StopAsianHate (SAH) movement is a broad social movement against violence targeting Asians and Asian Americans, beginning in 2021 in response to racial discrimination related to COVID-19 and sparking worldwide conversation about anti-Asian hate. However, research on the online SAH movement has focused on English-speaking participants so the spread of the movement outside of the United States is largely unknown. In addition, there have been no long-term studies of SAH so the extent to which it has been successfully sustained over time is not well understood. We present an analysis of 6.5 million "#StopAsianHate" tweets from 2.2 million users all over the globe and spanning 60 different languages, constituting the first study of the non-English and transnational component of the online SAH movement. Using a combination of topic modeling, user modeling, and hand annotation, we identify and characterize the dominant discussions and users participating in the movement and draw comparisons of English versus non-English topics and users. We discover clear differences in events driving topics, where spikes in English tweets are driven by violent crimes in the US but spikes in non-English tweets are driven by transnational incidents of anti-Asian sentiment towards symbolic representatives of Asian nations. We also find that global K-pop fans were quick to adopt the SAH movement and, in fact, sustained it for longer than any other user group. Our work contributes to understanding the transnationality and evolution of the SAH movement, and more generally to exploring upward scale shift and public attention in large-scale multilingual online activism.

* WebSci'25

Via

Access Paper or Ask Questions

A Semantic Parsing Algorithm to Solve Linear Ordering Problems

Feb 12, 2025

Maha Alkhairy, Vincent Homer, Brendan O'Connor

Abstract:We develop an algorithm to semantically parse linear ordering problems, which require a model to arrange entities using deductive reasoning. Our method takes as input a number of premises and candidate statements, parsing them to a first-order logic of an ordering domain, and then utilizes constraint logic programming to infer the truth of proposed statements about the ordering. Our semantic parser transforms Heim and Kratzer's syntax-based compositional formal semantic rules to a computational algorithm. This transformation involves introducing abstract types and templates based on their rules, and introduces a dynamic component to interpret entities within a contextual framework. Our symbolic system, the Formal Semantic Logic Inferer (FSLI), is applied to answer multiple choice questions in BIG-bench's logical_deduction multiple choice problems, achieving perfect accuracy, compared to 67.06% for the best-performing LLM (GPT-4) and 87.63% for the hybrid system Logic-LM. These promising results demonstrate the benefit of developing a semantic parsing algorithm driven by first-order logic constructs.

* 3 figures, 9 pages main paper and 6 pages references and appendix

Via

Access Paper or Ask Questions

Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time

Aug 13, 2024

Marisa Hudspeth, Brendan O'Connor, Laure Thompson

Abstract:Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the heterogeneity of these treebanks must be carefully considered to build effective and reliable data. In this work, we review existing Latin treebanks to identify the texts they draw from, identify their overlap, and document their coverage across time and genre. We additionally design automated conversions of their morphological feature annotations into the conventions of standard Latin grammar. From this, we build new time-period data splits that draw from the existing treebanks which we use to perform a broad cross-time analysis for POS and morphological feature tagging. We find that BERT-based taggers outperform existing taggers while also being more robust to cross-domain shifts.

Via

Access Paper or Ask Questions

Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

Apr 29, 2024

Tessa Masis, Brendan O'Connor

Figure 1 for Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

Figure 2 for Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

Figure 3 for Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

Figure 4 for Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input

Abstract:Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too expensive for large-scale datasets. We present a method which represents real-world locations as averaged embeddings from labeled user-input location names and allows for selective prediction via an interpretable confidence score. We show that our approach improves geo-entity linking on a global and multilingual social media dataset, and discuss progress and problems with evaluating at different geographic granularities.

* NLP+CSS workshop at NAACL 2024

Via

Access Paper or Ask Questions

A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

May 24, 2023

Erica Cai, Brendan O'Connor

Figure 1 for A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

Figure 2 for A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

Figure 3 for A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

Figure 4 for A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction

Abstract:We consider dyadic zero-shot event extraction (EE) to identify actions between pairs of actors. The \emph{zero-shot} setting allows social scientists or other non-computational researchers to extract any customized, user-specified set of events without training, resulting in a \emph{dyadic} event database, allowing insight into sociopolitical relational dynamics among actors and the higher level organizations or countries they represent. Unfortunately, we find that current zero-shot EE methods perform poorly for the task, with issues including word sense ambiguity, modality mismatch, and efficiency. Straightforward application of large language model prompting typically performs even worse. We address these challenges with a new fine-grained, multi-stage generative question-answer method, using a Monte Carlo approach to exploit and overcome the randomness of generative outputs. It performs 90\% fewer queries than a previous approach, with strong performance on the widely-used Automatic Content Extraction dataset. Finally, we extend our method to extract affiliations of actor arguments and demonstrate our method and findings on a dyadic international relations case study.

Via

Access Paper or Ask Questions

A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Feb 27, 2023

Brendan O'Connor, Simon Dixon

Figure 1 for A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Figure 2 for A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Figure 3 for A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Abstract:Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model's SVC performance. We first trained a singer identity embedding (SIE) network on mel-spectrograms of singer recordings to produce singer-specific variance encodings using contrastive learning. We subsequently trained a well-known autoencoder framework (AutoVC) conditioned on these SIEs, and measured differences in SVC performance when using different latent regressor loss components. We found that using this loss w.r.t. SIEs leads to better performance than w.r.t. bottleneck embeddings, where converted audio is more natural and specific towards target singers. The inclusion of this loss component has the advantage of explicitly forcing the network to reconstruct with timbral similarity, and also negates the effect of poor disentanglement in AutoVC's bottleneck embeddings. We demonstrate peculiar diversity between computational and human evaluations on singer-converted audio clips, which highlights the necessity of both. We also propose a pitch-matching mechanism between source and target singers to ensure these evaluations are not influenced by differences in pitch register.

* Submitted to the Sound and Music Computing Conference 2023

Via

Access Paper or Ask Questions

Examining Political Rhetoric with Epistemic Stance Detection

Jan 06, 2023

Ankita Gupta, Su Lin Blodgett, Justin H Gross, Brendan O'Connor

Abstract:Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.

* Forthcoming in Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS) at EMNLP 2022

Via

Access Paper or Ask Questions

ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Oct 13, 2022

Ankita Gupta, Marzena Karpinska, Wenlong Zhao, Kalpesh Krishna, Jack Merullo, Luke Yeh, Mohit Iyyer, Brendan O'Connor

Figure 1 for ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Figure 2 for ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Figure 3 for ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Figure 4 for ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

Abstract:Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (>90% agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines.

* preprint (19 pages), code in https://github.com/gnkitaa/ezCoref

Via

Access Paper or Ask Questions

Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Sep 15, 2022

Tessa Masis, Anissa Neal, Lisa Green, Brendan O'Connor

Figure 1 for Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Figure 2 for Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Figure 3 for Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Figure 4 for Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in Low-Resource English Varieties

Abstract:The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature's distribution across speakers, topics, and other variables, to either gain a qualitative understanding of the feature's function or systematically measure variation. In this paper, we explore the challenging task of automatic morphosyntactic feature detection in low-resource English varieties. We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits. We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.

* Field Matters Workshop at COLING 2022

Via

Access Paper or Ask Questions

Zero-shot Singing Technique Conversion

Nov 16, 2021

Brendan O'Connor, Simon Dixon, George Fazekas

Figure 1 for Zero-shot Singing Technique Conversion

Figure 2 for Zero-shot Singing Technique Conversion

Abstract:In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a decoder is conditioned during training. By swapping out a source singer's technique information for that of the target's during conversion, the input spectrogram is reconstructed with the target's technique. We document the beneficial effects of omitting the latent loss, the importance of sequential training, and our process for fine-tuning the bottleneck. We also conducted a listening study where participants rate the specificity of technique-converted voices as well as their naturalness. From this we are able to conclude how effective the technique conversions are and how different conditions affect them, while assessing the model's ability to reconstruct its input data.

* In Proceedings of the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR 2021), Tokyo, Japan, November 15-16, 2021

Via

Access Paper or Ask Questions