Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sarah Rajtmajer

The study of short texts in digital politics: Document aggregation for topic modeling

Mar 07, 2025

Nitheesha Nakka, Omer F. Yalcin, Bruce A. Desmarais, Sarah Rajtmajer, Burt Monroe

Abstract:Statistical topic modeling is widely used in political science to study text. Researchers examine documents of varying lengths, from tweets to speeches. There is ongoing debate on how document length affects the interpretability of topic models. We investigate the effects of aggregating short documents into larger ones based on natural units that partition the corpus. In our study, we analyze one million tweets by U.S. state legislators from April 2016 to September 2020. We find that for documents aggregated at the account level, topics are more associated with individual states than when using individual tweets. This finding is replicated with Wikipedia pages aggregated by birth cities, showing how document definitions can impact topic modeling results.

Via

Access Paper or Ask Questions

The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News

Oct 25, 2024

Xinyu Wang, Wenbo Zhang, Sai Koneru, Hangzhi Guo, Bonam Mingole, S. Shyam Sundar, Sarah Rajtmajer, Amulya Yadav

Figure 1 for The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News

Figure 2 for The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News

Figure 3 for The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News

Figure 4 for The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News

Abstract:With the rise of AI-generated content spewed at scale from large language models (LLMs), genuine concerns about the spread of fake news have intensified. The perceived ability of LLMs to produce convincing fake news at scale poses new challenges for both human and automated fake news detection systems. To address this gap, this work presents the findings from a university-level competition which aimed to explore how LLMs can be used by humans to create fake news, and to assess the ability of human annotators and AI models to detect it. A total of 110 participants used LLMs to create 252 unique fake news stories, and 84 annotators participated in the detection tasks. Our findings indicate that LLMs are ~68% more effective at detecting real news than humans. However, for fake news detection, the performance of LLMs and humans remains comparable (~60% accuracy). Additionally, we examine the impact of visual elements (e.g., pictures) in news on the accuracy of detecting fake news stories. Finally, we also examine various strategies used by fake news creators to enhance the credibility of their AI-generated content. This work highlights the increasing complexity of detecting AI-generated fake news, particularly in collaborative human-AI settings.

Via

Access Paper or Ask Questions

Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey

Oct 24, 2024

Xinyu Wang, Wenbo Zhang, Sarah Rajtmajer

Figure 1 for Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey

Figure 2 for Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey

Figure 3 for Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey

Figure 4 for Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey

Abstract:In today's global digital landscape, misinformation transcends linguistic boundaries, posing a significant challenge for moderation systems. While significant advances have been made in misinformation detection, the focus remains largely on monolingual high-resource contexts, with low-resource languages often overlooked. This survey aims to bridge that gap by providing a comprehensive overview of the current research on low-resource language misinformation detection in both monolingual and multilingual settings. We review the existing datasets, methodologies, and tools used in these domains, identifying key challenges related to: data resources, model development, cultural and linguistic context, real-world applications, and research efforts. We also examine emerging approaches, such as language-agnostic models and multi-modal techniques, while emphasizing the need for improved data collection practices, interdisciplinary collaboration, and stronger incentives for socially responsible AI research. Our findings underscore the need for robust, inclusive systems capable of addressing misinformation across diverse linguistic and cultural contexts.

Via

Access Paper or Ask Questions

The Unappreciated Role of Intent in Algorithmic Moderation of Social Media Content

May 17, 2024

Xinyu Wang, Sai Koneru, Pranav Narayanan Venkit, Brett Frischmann, Sarah Rajtmajer

Abstract:As social media has become a predominant mode of communication globally, the rise of abusive content threatens to undermine civil discourse. Recognizing the critical nature of this issue, a significant body of research has been dedicated to developing language models that can detect various types of online abuse, e.g., hate speech, cyberbullying. However, there exists a notable disconnect between platform policies, which often consider the author's intention as a criterion for content moderation, and the current capabilities of detection models, which typically lack efforts to capture intent. This paper examines the role of intent in content moderation systems. We review state of the art detection models and benchmark training datasets for online abuse to assess their awareness and ability to capture intent. We propose strategic changes to the design and development of automated detection and moderation systems to improve alignment with ethical and policy conceptualizations of abuse.

Via

Access Paper or Ask Questions

Inside the echo chamber: Linguistic underpinnings of misinformation on Twitter

Apr 24, 2024

Xinyu Wang, Jiayi Li, Sarah Rajtmajer

Abstract:Social media users drive the spread of misinformation online by sharing posts that include erroneous information or commenting on controversial topics with unsubstantiated arguments often in earnest. Work on echo chambers has suggested that users' perspectives are reinforced through repeated interactions with like-minded peers, promoted by homophily and bias in information diffusion. Building on long-standing interest in the social bases of language and linguistic underpinnings of social behavior, this work explores how conversations around misinformation are mediated through language use. We compare a number of linguistic measures, e.g., in-/out-group cues, readability, and discourse connectives, within and across topics of conversation and user communities. Our findings reveal increased presence of group identity signals and processing fluency within echo chambers during discussions of misinformation. We discuss the specific character of these broader trends across topics and examine contextual influences.

Via

Access Paper or Ask Questions

"Confidently Nonsensical?'': A Critical Survey on the Perspectives and Challenges of 'Hallucinations' in NLP

Apr 11, 2024

Pranav Narayanan Venkit, Tatiana Chakravorti, Vipul Gupta, Heidi Biggs, Mukund Srinath, Koustava Goswami, Sarah Rajtmajer, Shomir Wilson

Figure 1 for "Confidently Nonsensical?'': A Critical Survey on the Perspectives and Challenges of 'Hallucinations' in NLP

Figure 2 for "Confidently Nonsensical?'': A Critical Survey on the Perspectives and Challenges of 'Hallucinations' in NLP

Figure 3 for "Confidently Nonsensical?'': A Critical Survey on the Perspectives and Challenges of 'Hallucinations' in NLP

Figure 4 for "Confidently Nonsensical?'': A Critical Survey on the Perspectives and Challenges of 'Hallucinations' in NLP

Abstract:We investigate how hallucination in large language models (LLM) is characterized in peer-reviewed literature using a critical examination of 103 publications across NLP research. Through a comprehensive review of sociological and technological literature, we identify a lack of agreement with the term `hallucination.' Additionally, we conduct a survey with 171 practitioners from the field of NLP and AI to capture varying perspectives on hallucination. Our analysis underscores the necessity for explicit definitions and frameworks outlining hallucination within NLP, highlighting potential challenges, and our survey inputs provide a thematic understanding of the influence and ramifications of hallucination in society.

Via

Access Paper or Ask Questions

Exploring Trust and Risk during Online Bartering Interactions

Nov 27, 2023

Kalyani Lakkanige, Lamar Cooley-Russ, Alan R. Wagner, Sarah Rajtmajer

Figure 1 for Exploring Trust and Risk during Online Bartering Interactions

Figure 2 for Exploring Trust and Risk during Online Bartering Interactions

Figure 3 for Exploring Trust and Risk during Online Bartering Interactions

Figure 4 for Exploring Trust and Risk during Online Bartering Interactions

Abstract:This paper investigates how risk influences the way people barter. We used Minecraft to create an experimental environment in which people bartered to earn a monetary bonus. Our findings reveal that subjects exhibit risk-aversion to competitive bartering environments and deliberate over their trades longer when compared to cooperative environments. These initial experiments lay groundwork for development of agents capable of strategically trading with human counterparts in different environments.

* Paper accepted into Multittrust 2.0 @ HAI 2023 (https://multittrust.github.io/2ed/)

Via

Access Paper or Ask Questions

Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

Sep 07, 2023

Sai Koneru, Jian Wu, Sarah Rajtmajer

Figure 1 for Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

Figure 2 for Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

Figure 3 for Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

Figure 4 for Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

Abstract:Hypothesis formulation and testing are central to empirical research. A strong hypothesis is a best guess based on existing evidence and informed by a comprehensive view of relevant literature. However, with exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge. Our work explores the ability of current large language models (LLMs) to discern evidence in support or refute of specific hypotheses based on the text of scientific abstracts. We share a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences. We compare the performance of LLMs to several state-of-the-art benchmarks and highlight opportunities for future research in this area. The dataset is available at https://github.com/Sai90000/ScientificHypothesisEvidencing.git

Via

Access Paper or Ask Questions

A Study on Reproducibility and Replicability of Table Structure Recognition Methods

Apr 20, 2023

Kehinde Ajayi, Muntabir Hasan Choudhury, Sarah Rajtmajer, Jian Wu

Abstract:Concerns about reproducibility in artificial intelligence (AI) have emerged, as researchers have reported unsuccessful attempts to directly reproduce published findings in the field. Replicability, the ability to affirm a finding using the same procedures on new data, has not been well studied. In this paper, we examine both reproducibility and replicability of a corpus of 16 papers on table structure recognition (TSR), an AI task aimed at identifying cell locations of tables in digital documents. We attempt to reproduce published results using codes and datasets provided by the original authors. We then examine replicability using a dataset similar to the original as well as a new dataset, GenTSR, consisting of 386 annotated tables extracted from scientific papers. Out of 16 papers studied, we reproduce results consistent with the original in only four. Two of the four papers are identified as replicable using the similar dataset under certain IoU values. No paper is identified as replicable using the new dataset. We offer observations on the causes of irreproducibility and irreplicability. All code and data are available on Codeocean at https://codeocean.com/capsule/6680116/tree.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

A prototype hybrid prediction market for estimating replicability of published work

Mar 01, 2023

Tatiana Chakravorti, Robert Fraleigh, Timothy Fritton, Michael McLaughlin, Vaibhav Singh, Christopher Griffin, Anthony Kwasnica, David Pennock, C. Lee Giles, Sarah Rajtmajer

Figure 1 for A prototype hybrid prediction market for estimating replicability of published work

Figure 2 for A prototype hybrid prediction market for estimating replicability of published work

Figure 3 for A prototype hybrid prediction market for estimating replicability of published work

Abstract:We present a prototype hybrid prediction market and demonstrate the avenue it represents for meaningful human-AI collaboration. We build on prior work proposing artificial prediction markets as a novel machine-learning algorithm. In an artificial prediction market, trained AI agents buy and sell outcomes of future events. Classification decisions can be framed as outcomes of future events, and accordingly, the price of an asset corresponding to a given classification outcome can be taken as a proxy for the confidence of the system in that decision. By embedding human participants in these markets alongside bot traders, we can bring together insights from both. In this paper, we detail pilot studies with prototype hybrid markets for the prediction of replication study outcomes. We highlight challenges and opportunities, share insights from semi-structured interviews with hybrid market participants, and outline a vision for ongoing and future work.

Via

Access Paper or Ask Questions