Picture for Erik Jenner

Erik Jenner

Obfuscated Activations Bypass LLM Latent-Space Defenses

Add code
Dec 12, 2024
Viaarxiv icon

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Add code
Jun 02, 2024
Figure 1 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 2 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 3 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 4 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Viaarxiv icon

Diffusion On Syntax Trees For Program Synthesis

Add code
May 30, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Figure 1 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 2 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 3 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 4 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Viaarxiv icon

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Add code
Mar 03, 2024
Viaarxiv icon

STARC: A General Framework For Quantifying Differences Between Reward Functions

Add code
Sep 26, 2023
Viaarxiv icon

imitation: Clean Imitation Learning Implementations

Add code
Nov 22, 2022
Figure 1 for imitation: Clean Imitation Learning Implementations
Figure 2 for imitation: Clean Imitation Learning Implementations
Figure 3 for imitation: Clean Imitation Learning Implementations
Figure 4 for imitation: Clean Imitation Learning Implementations
Viaarxiv icon

Calculus on MDPs: Potential Shaping as a Gradient

Add code
Aug 20, 2022
Figure 1 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 2 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 3 for Calculus on MDPs: Potential Shaping as a Gradient
Figure 4 for Calculus on MDPs: Potential Shaping as a Gradient
Viaarxiv icon

Preprocessing Reward Functions for Interpretability

Add code
Mar 25, 2022
Figure 1 for Preprocessing Reward Functions for Interpretability
Figure 2 for Preprocessing Reward Functions for Interpretability
Figure 3 for Preprocessing Reward Functions for Interpretability
Figure 4 for Preprocessing Reward Functions for Interpretability
Viaarxiv icon

Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice

Add code
Oct 05, 2021
Figure 1 for Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
Figure 2 for Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
Figure 3 for Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
Figure 4 for Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice
Viaarxiv icon