Picture for Jared Kaplan

Jared Kaplan

Shammie

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon

Clio: Privacy-Preserving Insights into Real-World AI Use

Add code
Dec 18, 2024
Viaarxiv icon

Sabotage Evaluations for Frontier Models

Add code
Oct 28, 2024
Figure 1 for Sabotage Evaluations for Frontier Models
Figure 2 for Sabotage Evaluations for Frontier Models
Figure 3 for Sabotage Evaluations for Frontier Models
Figure 4 for Sabotage Evaluations for Frontier Models
Viaarxiv icon

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

Add code
Jun 17, 2024
Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon

Evaluating and Mitigating Discrimination in Language Model Decisions

Add code
Dec 06, 2023
Viaarxiv icon

Specific versus General Principles for Constitutional AI

Add code
Oct 20, 2023
Figure 1 for Specific versus General Principles for Constitutional AI
Figure 2 for Specific versus General Principles for Constitutional AI
Figure 3 for Specific versus General Principles for Constitutional AI
Figure 4 for Specific versus General Principles for Constitutional AI
Viaarxiv icon

Studying Large Language Model Generalization with Influence Functions

Add code
Aug 07, 2023
Figure 1 for Studying Large Language Model Generalization with Influence Functions
Figure 2 for Studying Large Language Model Generalization with Influence Functions
Figure 3 for Studying Large Language Model Generalization with Influence Functions
Figure 4 for Studying Large Language Model Generalization with Influence Functions
Viaarxiv icon

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

Add code
Jul 25, 2023
Figure 1 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 2 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 3 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Figure 4 for Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Viaarxiv icon

Measuring Faithfulness in Chain-of-Thought Reasoning

Add code
Jul 17, 2023
Figure 1 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 2 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 3 for Measuring Faithfulness in Chain-of-Thought Reasoning
Figure 4 for Measuring Faithfulness in Chain-of-Thought Reasoning
Viaarxiv icon