Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Annotation alignment: Comparing LLM and human annotations of conversational safety

Jun 10, 2024

Rajiv Movva, Pang Wei Koh, Emma Pierson

Figure 1 for Annotation alignment: Comparing LLM and human annotations of conversational safety

Figure 2 for Annotation alignment: Comparing LLM and human annotations of conversational safety

Share this with someone who'll enjoy it:

Abstract:To what extent to do LLMs align with human perceptions of safety? We study this question via *annotation alignment*, the extent to which LLMs and humans agree when annotating the safety of user-chatbot conversations. We leverage the recent DICES dataset (Aroyo et al., 2023), in which 350 conversations are each rated for safety by 112 annotators spanning 10 race-gender groups. GPT-4 achieves a Pearson correlation of $r = 0.59$ with the average annotator rating, higher than the median annotator's correlation with the average ($r=0.51$). We show that larger datasets are needed to resolve whether GPT-4 exhibits disparities in how well it correlates with demographic groups. Also, there is substantial idiosyncratic variation in correlation *within* groups, suggesting that race & gender do not fully capture differences in alignment. Finally, we find that GPT-4 cannot predict when one demographic group finds a conversation more unsafe than another.

* Working draft, short paper. 5 pages, 1 figure

View paper on

Share this with someone who'll enjoy it:

Title:Annotation alignment: Comparing LLM and human annotations of conversational safety

Paper and Code