Picture for Miles Turpin

Miles Turpin

Looking Inward: Language Models Can Learn About Themselves by Introspection

Add code
Oct 17, 2024
Figure 1 for Looking Inward: Language Models Can Learn About Themselves by Introspection
Figure 2 for Looking Inward: Language Models Can Learn About Themselves by Introspection
Figure 3 for Looking Inward: Language Models Can Learn About Themselves by Introspection
Figure 4 for Looking Inward: Language Models Can Learn About Themselves by Introspection
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Figure 1 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 2 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 3 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 4 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Viaarxiv icon

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

Add code
Mar 08, 2024
Figure 1 for Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Figure 2 for Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Figure 3 for Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Figure 4 for Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Viaarxiv icon

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Add code
May 07, 2023
Viaarxiv icon