Picture for Victoria Krakovna

Victoria Krakovna

Google DeepMind

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon

Evaluating Frontier Models for Dangerous Capabilities

Add code
Mar 20, 2024
Figure 1 for Evaluating Frontier Models for Dangerous Capabilities
Figure 2 for Evaluating Frontier Models for Dangerous Capabilities
Figure 3 for Evaluating Frontier Models for Dangerous Capabilities
Figure 4 for Evaluating Frontier Models for Dangerous Capabilities
Viaarxiv icon

Limitations of Agents Simulated by Predictive Models

Add code
Feb 08, 2024
Viaarxiv icon

Quantifying stability of non-power-seeking in artificial agents

Add code
Jan 07, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Power-seeking can be probable and predictive for trained agents

Add code
Apr 13, 2023
Viaarxiv icon

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Add code
Oct 04, 2022
Figure 1 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 2 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 3 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 4 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Viaarxiv icon

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

Add code
Nov 17, 2020
Figure 1 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 2 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 3 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Figure 4 for Avoiding Tampering Incentives in Deep RL via Decoupled Approval
Viaarxiv icon

REALab: An Embedded Perspective on Tampering

Add code
Nov 17, 2020
Figure 1 for REALab: An Embedded Perspective on Tampering
Figure 2 for REALab: An Embedded Perspective on Tampering
Figure 3 for REALab: An Embedded Perspective on Tampering
Figure 4 for REALab: An Embedded Perspective on Tampering
Viaarxiv icon

Avoiding Side Effects By Considering Future Tasks

Add code
Oct 15, 2020
Figure 1 for Avoiding Side Effects By Considering Future Tasks
Figure 2 for Avoiding Side Effects By Considering Future Tasks
Figure 3 for Avoiding Side Effects By Considering Future Tasks
Figure 4 for Avoiding Side Effects By Considering Future Tasks
Viaarxiv icon