Picture for Lisa Thiergart

Lisa Thiergart

Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation

Add code
Nov 19, 2024
Viaarxiv icon

Activation Addition: Steering Language Models Without Optimization

Add code
Sep 01, 2023
Viaarxiv icon