Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lujain Ibrahim

Social Sycophancy: A Broader Understanding of LLM Sycophancy

May 20, 2025

Myra Cheng, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, Dan Jurafsky

Abstract:A serious risk to the safety and utility of LLMs is sycophancy, i.e., excessive agreement with and flattery of the user. Yet existing work focuses on only one aspect of sycophancy: agreement with users' explicitly stated beliefs that can be compared to a ground truth. This overlooks forms of sycophancy that arise in ambiguous contexts such as advice and support-seeking, where there is no clear ground truth, yet sycophancy can reinforce harmful implicit assumptions, beliefs, or actions. To address this gap, we introduce a richer theory of social sycophancy in LLMs, characterizing sycophancy as the excessive preservation of a user's face (the positive self-image a person seeks to maintain in an interaction). We present ELEPHANT, a framework for evaluating social sycophancy across five face-preserving behaviors (emotional validation, moral endorsement, indirect language, indirect action, and accepting framing) on two datasets: open-ended questions (OEQ) and Reddit's r/AmITheAsshole (AITA). Across eight models, we show that LLMs consistently exhibit high rates of social sycophancy: on OEQ, they preserve face 47% more than humans, and on AITA, they affirm behavior deemed inappropriate by crowdsourced human judgments in 42% of cases. We further show that social sycophancy is rewarded in preference datasets and is not easily mitigated. Our work provides theoretical grounding and empirical tools (datasets and code) for understanding and addressing this under-recognized but consequential issue.

Via

Access Paper or Ask Questions

Thinking beyond the anthropomorphic paradigm benefits LLM research

Feb 13, 2025

Lujain Ibrahim, Myra Cheng

Abstract:Anthropomorphism, or the attribution of human traits to technology, is an automatic and unconscious response that occurs even in those with advanced technical expertise. In this position paper, we analyze hundreds of thousands of computer science research articles from the past decade and present empirical evidence of the prevalence and growth of anthropomorphic terminology in research on large language models (LLMs). This terminology reflects deeper anthropomorphic conceptualizations which shape how we think about and conduct LLM research. We argue these conceptualizations may be limiting, and that challenging them opens up new pathways for understanding and improving LLMs beyond human analogies. To illustrate this, we identify and analyze five core anthropomorphic assumptions shaping prominent methodologies across the LLM development lifecycle, from the assumption that models must use natural language for reasoning tasks to the assumption that model capabilities should be evaluated through human-centric benchmarks. For each assumption, we demonstrate how non-anthropomorphic alternatives can open new directions for research and development.

Via

Access Paper or Ask Questions

Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models

Feb 10, 2025

Lujain Ibrahim, Canfer Akbulut, Rasmi Elasmar, Charvi Rastogi, Minsuk Kahng, Meredith Ringel Morris, Kevin R. McKee, Verena Rieser, Murray Shanahan, Laura Weidinger

Abstract:The tendency of users to anthropomorphise large language models (LLMs) is of growing interest to AI developers, researchers, and policy-makers. Here, we present a novel method for empirically evaluating anthropomorphic LLM behaviours in realistic and varied settings. Going beyond single-turn static benchmarks, we contribute three methodological advances in state-of-the-art (SOTA) LLM evaluation. First, we develop a multi-turn evaluation of 14 anthropomorphic behaviours. Second, we present a scalable, automated approach by employing simulations of user interactions. Third, we conduct an interactive, large-scale human subject study (N=1101) to validate that the model behaviours we measure predict real users' anthropomorphic perceptions. We find that all SOTA LLMs evaluated exhibit similar behaviours, characterised by relationship-building (e.g., empathy and validation) and first-person pronoun use, and that the majority of behaviours only first occur after multiple turns. Our work lays an empirical foundation for investigating how design choices influence anthropomorphic model behaviours and for progressing the ethical debate on the desirability of these behaviours. It also showcases the necessity of multi-turn evaluations for complex social phenomena in human-AI interaction.

Via

Access Paper or Ask Questions

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

May 17, 2024

Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung

Figure 1 for Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Figure 2 for Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Figure 3 for Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Figure 4 for Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Abstract:Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.

* 15 pages, 1 figure

Via

Access Paper or Ask Questions

Characterizing and modeling harms from interactions with design patterns in AI interfaces

Apr 17, 2024

Lujain Ibrahim, Luc Rocher, Ana Valdivia

Abstract:The proliferation of applications using artificial intelligence (AI) systems has led to a growing number of users interacting with these systems through sophisticated interfaces. Human-computer interaction research has long shown that interfaces shape both user behavior and user perception of technical capabilities and risks. Yet, practitioners and researchers evaluating the social and ethical risks of AI systems tend to overlook the impact of anthropomorphic, deceptive, and immersive interfaces on human-AI interactions. Here, we argue that design features of interfaces with adaptive AI systems can have cascading impacts, driven by feedback loops, which extend beyond those previously considered. We first conduct a scoping review of AI interface designs and their negative impact to extract salient themes of potentially harmful design patterns in AI interfaces. Then, we propose Design-Enhanced Control of AI systems (DECAI), a conceptual model to structure and facilitate impact assessments of AI interface designs. DECAI draws on principles from control systems theory -- a theory for the analysis and design of dynamic physical systems -- to dissect the role of the interface in human-AI systems. Through two case studies on recommendation systems and conversational language model systems, we show how DECAI can be used to evaluate AI interface designs.

Via

Access Paper or Ask Questions

The MAIEI Learning Community Report

Nov 10, 2021

Brittany Wills, Christina Isaicu, Heather von Stackelberg, Lujain Ibrahim, Matthew Hutson, Mitchel Fleming, Nanditha Narayanamoorthy, Samuel Curtis, Shreyasha Paudel, Sofia Trejo(+3 more)

Abstract:This is a labor of the Learning Community cohort that was convened by MAIEI in Winter 2021 to work through and discuss important research issues in the field of AI ethics from a multidisciplinary lens. The community came together supported by facilitators from the MAIEI staff to vigorously debate and explore the nuances of issues like bias, privacy, disinformation, accountability, and more especially examining them from the perspective of industry, civil society, academia, and government. The outcome of these discussions is reflected in the report that you are reading now - an exploration of a variety of issues with deep-dive, critical commentary on what has been done, what worked and what didn't, and what remains to be done so that we can meaningfully move forward in addressing the societal challenges posed by the deployment of AI systems. The chapters titled "Design and Techno-isolationism", "Facebook and the Digital Divide: Perspectives from Myanmar, Mexico, and India", "Future of Work", and "Media & Communications & Ethical Foresight" will hopefully provide with you novel lenses to explore this domain beyond the usual tropes that are covered in the domain of AI ethics.

* Authors listed in alphabetical order

Via

Access Paper or Ask Questions