Picture for Stuart Russell

Stuart Russell

Berkeley

Asking for Help Enables Safety Guarantees Without Sacrificing Effectiveness

Add code
Feb 19, 2025
Viaarxiv icon

How Do LLMs Perform Two-Hop Reasoning in Context?

Add code
Feb 19, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Viaarxiv icon

Observation Interference in Partially Observable Assistance Games

Add code
Dec 23, 2024
Viaarxiv icon

Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts

Add code
Dec 05, 2024
Viaarxiv icon

Will an AI with Private Information Allow Itself to Be Switched Off?

Add code
Nov 25, 2024
Figure 1 for Will an AI with Private Information Allow Itself to Be Switched Off?
Figure 2 for Will an AI with Private Information Allow Itself to Be Switched Off?
Figure 3 for Will an AI with Private Information Allow Itself to Be Switched Off?
Figure 4 for Will an AI with Private Information Allow Itself to Be Switched Off?
Viaarxiv icon

RL, but don't do anything I wouldn't do

Add code
Oct 08, 2024
Viaarxiv icon

BAMDP Shaping: a Unified Theoretical Framework for Intrinsic Motivation and Reward Shaping

Add code
Sep 09, 2024
Viaarxiv icon

Monitoring Latent World States in Language Models with Propositional Probes

Add code
Jun 27, 2024
Viaarxiv icon

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Add code
Jun 02, 2024
Figure 1 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 2 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 3 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Figure 4 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Viaarxiv icon