Picture for Lisa Thiergart

Lisa Thiergart

What AI evaluations for preventing catastrophic risks can and cannot do

Add code
Nov 26, 2024
Figure 1 for What AI evaluations for preventing catastrophic risks can and cannot do
Figure 2 for What AI evaluations for preventing catastrophic risks can and cannot do
Viaarxiv icon

Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation

Add code
Nov 19, 2024
Viaarxiv icon

Activation Addition: Steering Language Models Without Optimization

Add code
Sep 01, 2023
Viaarxiv icon