Picture for Rohan Subramani

Rohan Subramani

Will an AI with Private Information Allow Itself to Be Switched Off?

Add code
Nov 25, 2024
Figure 1 for Will an AI with Private Information Allow Itself to Be Switched Off?
Figure 2 for Will an AI with Private Information Allow Itself to Be Switched Off?
Figure 3 for Will an AI with Private Information Allow Itself to Be Switched Off?
Figure 4 for Will an AI with Private Information Allow Itself to Be Switched Off?
Viaarxiv icon

Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains

Add code
Nov 19, 2023
Viaarxiv icon

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

Add code
Oct 18, 2023
Viaarxiv icon