Picture for Janos Kramar

Janos Kramar

The Hydra Effect: Emergent Self-repair in Language Model Computations

Add code
Jul 28, 2023
Viaarxiv icon

Power-seeking can be probable and predictive for trained agents

Add code
Apr 13, 2023
Viaarxiv icon

Guidelines for Artificial Intelligence Containment

Add code
Jul 24, 2017
Figure 1 for Guidelines for Artificial Intelligence Containment
Viaarxiv icon

The AGI Containment Problem

Add code
Jul 13, 2016
Figure 1 for The AGI Containment Problem
Viaarxiv icon