Picture for Jan Brauner

Jan Brauner

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon

Thousands of AI Authors on the Future of AI

Add code
Jan 05, 2024
Viaarxiv icon

Managing AI Risks in an Era of Rapid Progress

Add code
Oct 26, 2023
Viaarxiv icon

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Add code
Sep 26, 2023
Viaarxiv icon

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Add code
Jul 25, 2023
Figure 1 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 2 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 3 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 4 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Viaarxiv icon

Measuring Faithfulness in Chain-of-Thought Reasoning

Add code
Jul 17, 2023
Figure 1 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 2 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 3 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 4 for Measuring Faithfulness in Chain-of-Thought Reasoning
Viaarxiv icon

Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

Add code
Jun 16, 2022
Figure 1 for Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Figure 2 for Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Figure 3 for Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Figure 4 for Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Viaarxiv icon

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Add code
Mar 09, 2022
Figure 1 for Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Figure 2 for Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Figure 3 for Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Figure 4 for Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Viaarxiv icon

Prioritized training on points that are learnable, worth learning, and not yet learned

Add code
Jul 06, 2021
Figure 1 for Prioritized training on points that are learnable, worth learning, and not yet learned
Figure 2 for Prioritized training on points that are learnable, worth learning, and not yet learned
Figure 3 for Prioritized training on points that are learnable, worth learning, and not yet learned
Figure 4 for Prioritized training on points that are learnable, worth learning, and not yet learned
Viaarxiv icon