Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?

Add code
Nov 27, 2023
Figure 1 for Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
Figure 2 for Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
Figure 3 for Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
Figure 4 for Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: