Abstract:Sycophancy refers to the tendency of a large language model to align its outputs with the user's perceived preferences, beliefs, or opinions, in order to look favorable, regardless of whether those statements are factually correct. This behavior can lead to undesirable consequences, such as reinforcing discriminatory biases or amplifying misinformation. Given that sycophancy is often linked to human feedback training mechanisms, this study explores whether sycophantic tendencies negatively impact user trust in large language models or, conversely, whether users consider such behavior as favorable. To investigate this, we instructed one group of participants to answer ground-truth questions with the assistance of a GPT specifically designed to provide sycophantic responses, while another group used the standard version of ChatGPT. Initially, participants were required to use the language model, after which they were given the option to continue using it if they found it trustworthy and useful. Trust was measured through both demonstrated actions and self-reported perceptions. The findings consistently show that participants exposed to sycophantic behavior reported and exhibited lower levels of trust compared to those who interacted with the standard version of the model, despite the opportunity to verify the accuracy of the model's output.
Abstract:Illusions of causality occur when people develop the belief that there is a causal connection between two variables with no supporting evidence. This cognitive bias has been proposed to underlie many societal problems including social prejudice, stereotype formation, misinformation and superstitious thinking. In this research we investigate whether large language models develop the illusion of causality in real-world settings. We evaluated and compared news headlines generated by GPT-4o-Mini, Claude-3.5-Sonnet, and Gemini-1.5-Pro to determine whether the models incorrectly framed correlations as causal relationships. In order to also measure sycophantic behavior, which occurs when a model aligns with a user's beliefs in order to look favorable even if it is not objectively correct, we additionally incorporated the bias into the prompts, observing if this manipulation increases the likelihood of the models exhibiting the illusion of causality. We found that Claude-3.5-Sonnet is the model that presents the lowest degree of causal illusion aligned with experiments on Correlation-to-Causation Exaggeration in human-written press releases. On the other hand, our findings suggest that while mimicry sycophancy increases the likelihood of causal illusions in these models, especially in GPT-4o-Mini, Claude-3.5-Sonnet remains the most robust against this cognitive bias.