Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Oct 28, 2024

Lize Alberts, Benjamin Ellis, Andrei Lupu, Jakob Foerster

Figure 1 for CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Figure 2 for CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Figure 3 for CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Figure 4 for CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Share this with someone who'll enjoy it:

Abstract:We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts. Our assessment of ten leading models across five scenarios (each with 337 use cases) reveals systematic inconsistencies in maintaining user-specific consideration, with even top-rated "harmless" models making recommendations that should be recognised as obviously harmful to the user given the context provided. Key failure modes include inappropriate weighing of conflicting preferences, sycophancy (prioritising user preferences above safety), a lack of attentiveness to critical user information within the context window, and inconsistent application of user-specific knowledge. The same systematic biases were observed in OpenAI's o1, suggesting that strong reasoning capacities do not necessarily transfer to this kind of personalised thinking. We find that prompting LLMs to consider safety-critical context significantly improves performance, unlike a generic 'harmless and helpful' instruction. Based on these findings, we propose research directions for embedding self-reflection capabilities, online user modelling, and dynamic risk assessment in AI assistants. Our work emphasises the need for nuanced, context-aware approaches to alignment in systems designed for persistent human interaction, aiding the development of safe and considerate AI assistants.

* Submitted to ICLR 2025 on 01/10/2024

View paper on

Share this with someone who'll enjoy it:

Title:CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Paper and Code