Picture for Rusheb Shah

Rusheb Shah

Imperial College London

Frontier Models are Capable of In-context Scheming

Add code
Dec 06, 2024
Viaarxiv icon

Towards evaluations-based safety cases for AI scheming

Add code
Nov 07, 2024
Viaarxiv icon

Structured World Representations in Maze-Solving Transformers

Add code
Dec 05, 2023
Viaarxiv icon

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

Add code
Nov 06, 2023
Viaarxiv icon

A Configurable Library for Generating and Manipulating Maze Datasets

Add code
Sep 19, 2023
Viaarxiv icon