Picture for Fabien Roger

Fabien Roger

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

Add code
Feb 10, 2025
Viaarxiv icon

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon

Do Unlearning Methods Remove Information from Language Model Weights?

Add code
Oct 11, 2024
Viaarxiv icon

Stress-Testing Capability Elicitation With Password-Locked Models

Add code
May 29, 2024
Viaarxiv icon

AI Control: Improving Safety Despite Intentional Subversion

Add code
Dec 14, 2023
Viaarxiv icon

Preventing Language Models From Hiding Their Reasoning

Add code
Oct 31, 2023
Viaarxiv icon

Benchmarks for Detecting Measurement Tampering

Add code
Sep 07, 2023
Viaarxiv icon

Large Language Models Sometimes Generate Purely Negatively-Reinforced Text

Add code
Jun 16, 2023
Viaarxiv icon

Language models are better than humans at next-token prediction

Add code
Dec 21, 2022
Viaarxiv icon