Picture for Charlie Griffin

Charlie Griffin

Subversion Strategy Eval: Evaluating AI's stateless strategic capabilities against control protocols

Add code
Dec 17, 2024
Viaarxiv icon

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols

Add code
Sep 12, 2024
Viaarxiv icon

Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features

Add code
Nov 07, 2023
Viaarxiv icon

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

Add code
Oct 18, 2023
Viaarxiv icon

Goodhart's Law in Reinforcement Learning

Add code
Oct 13, 2023
Viaarxiv icon

Lexicographic Multi-Objective Reinforcement Learning

Add code
Dec 28, 2022
Viaarxiv icon

Privacy-preserving Object Detection

Add code
Mar 11, 2021
Figure 1 for Privacy-preserving Object Detection
Figure 2 for Privacy-preserving Object Detection
Figure 3 for Privacy-preserving Object Detection
Figure 4 for Privacy-preserving Object Detection
Viaarxiv icon