Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Evaluating the Prompt Steerability of Large Language Models

Nov 19, 2024

Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu

Figure 1 for Evaluating the Prompt Steerability of Large Language Models

Figure 2 for Evaluating the Prompt Steerability of Large Language Models

Figure 3 for Evaluating the Prompt Steerability of Large Language Models

Figure 4 for Evaluating the Prompt Steerability of Large Language Models

Share this with someone who'll enjoy it:

Abstract:Building pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on a formal definition of prompt steerability, which analyzes the degree to which a model's joint behavioral distribution can be shifted from its baseline behavior. By defining steerability indices and inspecting how these indices change as a function of steering effort, we can estimate the steerability of a model across various persona dimensions and directions. Our benchmark reveals that the steerability of many current models is limited -- due to both a skew in their baseline behavior and an asymmetry in their steerability across many persona dimensions. We release an implementation of our benchmark at https://github.com/IBM/prompt-steering.

View paper on

Share this with someone who'll enjoy it:

Title:Evaluating the Prompt Steerability of Large Language Models

Paper and Code