Picture for Louis Thomson

Louis Thomson

Evaluating Language Model Character Traits

Add code
Oct 05, 2024
Figure 1 for Evaluating Language Model Character Traits
Figure 2 for Evaluating Language Model Character Traits
Figure 3 for Evaluating Language Model Character Traits
Figure 4 for Evaluating Language Model Character Traits
Viaarxiv icon

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols

Add code
Sep 12, 2024
Viaarxiv icon

Towards shutdownable agents via stochastic choice

Add code
Jun 30, 2024
Viaarxiv icon