Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching

Add code
Nov 25, 2023

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: