Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David West Brown

Do LLMs write like humans? Variation in grammatical and rhetorical styles

Oct 21, 2024

Alex Reinhart, David West Brown, Ben Markey, Michael Laudenbach, Kachatad Pantusen, Ronald Yurko, Gordon Weinberg

Abstract:Large language models (LLMs) are capable of writing grammatical text that follows instructions, answers questions, and solves problems. As they have advanced, it has become difficult to distinguish their output from human-written text. While past research has found some differences in surface features such as word choice and punctuation, and developed classifiers to detect LLM output, none has studied the rhetorical styles of LLMs. Using several variants of Llama 3 and GPT-4o, we construct two parallel corpora of human- and LLM-written texts from common prompts. Using Douglas Biber's set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans and between different LLMs. These differences persist when moving from smaller models to larger ones, and are larger for instruction-tuned models than base models. This demonstrates that despite their advanced abilities, LLMs struggle to match human styles, and hence more advanced linguistic features can detect patterns in their behavior not previously recognized.

* 29 pages, 4 figures, 11 tables

Via

Access Paper or Ask Questions

Identity Construction in a Misogynist Incels Forum

Jul 09, 2023

Michael Miller Yoder, Chloe Perry, David West Brown, Kathleen M. Carley, Meredith L. Pruden

Figure 1 for Identity Construction in a Misogynist Incels Forum

Figure 2 for Identity Construction in a Misogynist Incels Forum

Figure 3 for Identity Construction in a Misogynist Incels Forum

Figure 4 for Identity Construction in a Misogynist Incels Forum

Abstract:Online communities of involuntary celibates (incels) are a prominent source of misogynist hate speech. In this paper, we use quantitative text and network analysis approaches to examine how identity groups are discussed on incels-dot-is, the largest black-pilled incels forum. We find that this community produces a wide range of novel identity terms and, while terms for women are most common, mentions of other minoritized identities are increasing. An analysis of the associations made with identity groups suggests an essentialist ideology where physical appearance, as well as gender and racial hierarchies, determine human value. We discuss implications for research into automated misogynist hate speech detection.

* Workshop on Online Abuse and Harms (WOAH) 2023; Minor edits to author names and abstracts in most recent version

Via

Access Paper or Ask Questions

A Weakly Supervised Classifier and Dataset of White Supremacist Language

Jun 27, 2023

Michael Miller Yoder, Ahmad Diab, David West Brown, Kathleen M. Carley

Figure 1 for A Weakly Supervised Classifier and Dataset of White Supremacist Language

Figure 2 for A Weakly Supervised Classifier and Dataset of White Supremacist Language

Figure 3 for A Weakly Supervised Classifier and Dataset of White Supremacist Language

Figure 4 for A Weakly Supervised Classifier and Dataset of White Supremacist Language

Abstract:We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incorporating anti-racist texts as counterexamples to white supremacist language mitigates bias.

* ACL 2023 short

Via

Access Paper or Ask Questions

How Hate Speech Varies by Target Identity: A Computational Analysis

Oct 19, 2022

Michael Miller Yoder, Lynnette Hui Xian Ng, David West Brown, Kathleen M. Carley

Figure 1 for How Hate Speech Varies by Target Identity: A Computational Analysis

Figure 2 for How Hate Speech Varies by Target Identity: A Computational Analysis

Figure 3 for How Hate Speech Varies by Target Identity: A Computational Analysis

Figure 4 for How Hate Speech Varies by Target Identity: A Computational Analysis

Abstract:This paper investigates how hate speech varies in systematic ways according to the identities it targets. Across multiple hate speech datasets annotated for targeted identities, we find that classifiers trained on hate speech targeting specific identity groups struggle to generalize to other targeted identities. This provides empirical evidence for differences in hate speech by target identity; we then investigate which patterns structure this variation. We find that the targeted demographic category (e.g. gender/sexuality or race/ethnicity) appears to have a greater effect on the language of hate speech than does the relative social power of the targeted identity group. We also find that words associated with hate speech targeting specific identities often relate to stereotypes, histories of oppression, current social movements, and other social contexts specific to identities. These experiments suggest the importance of considering targeted identity, as well as the social contexts associated with these identities, in automated hate speech classification.

* CoNLL 2022 camera-ready

Via

Access Paper or Ask Questions