Picture for Joshua Clymer

Joshua Clymer

Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

Add code
May 11, 2024
Viaarxiv icon

Safety Cases: How to Justify the Safety of Advanced AI Systems

Add code
Mar 18, 2024
Figure 1 for Safety Cases: How to Justify the Safety of Advanced AI Systems
Figure 2 for Safety Cases: How to Justify the Safety of Advanced AI Systems
Figure 3 for Safety Cases: How to Justify the Safety of Advanced AI Systems
Figure 4 for Safety Cases: How to Justify the Safety of Advanced AI Systems
Viaarxiv icon

Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains

Add code
Nov 19, 2023
Figure 1 for Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Figure 2 for Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Figure 3 for Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Figure 4 for Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Viaarxiv icon