Picture for Tom Everitt

Tom Everitt

DeepMind

Measuring Goal-Directedness

Add code
Dec 06, 2024
Viaarxiv icon

A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

Add code
Apr 23, 2024
Viaarxiv icon

Robust agents learn causal world models

Add code
Feb 26, 2024
Viaarxiv icon

The Reasons that Agents Act: Intention and Instrumental Goals

Add code
Feb 15, 2024
Viaarxiv icon

Honesty Is the Best Policy: Defining and Mitigating AI Deception

Add code
Dec 03, 2023
Viaarxiv icon

Characterising Decision Theories with Mechanised Causal Graphs

Add code
Jul 20, 2023
Viaarxiv icon

Human Control: Definitions and Algorithms

Add code
May 31, 2023
Viaarxiv icon

Reasoning about Causality in Games

Add code
Jan 05, 2023
Viaarxiv icon

Discovering Agents

Add code
Aug 24, 2022
Figure 1 for Discovering Agents
Figure 2 for Discovering Agents
Figure 3 for Discovering Agents
Figure 4 for Discovering Agents
Viaarxiv icon

Path-Specific Objectives for Safer Agent Incentives

Add code
Apr 21, 2022
Figure 1 for Path-Specific Objectives for Safer Agent Incentives
Figure 2 for Path-Specific Objectives for Safer Agent Incentives
Figure 3 for Path-Specific Objectives for Safer Agent Incentives
Figure 4 for Path-Specific Objectives for Safer Agent Incentives
Viaarxiv icon