Picture for Tom Everitt

Tom Everitt

DeepMind

Measuring Goal-Directedness

Add code
Dec 06, 2024
Viaarxiv icon

A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

Add code
Apr 23, 2024
Figure 1 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 2 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 3 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Figure 4 for A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI
Viaarxiv icon

Robust agents learn causal world models

Add code
Feb 26, 2024
Viaarxiv icon

The Reasons that Agents Act: Intention and Instrumental Goals

Add code
Feb 15, 2024
Viaarxiv icon

Honesty Is the Best Policy: Defining and Mitigating AI Deception

Add code
Dec 03, 2023
Viaarxiv icon

Characterising Decision Theories with Mechanised Causal Graphs

Add code
Jul 20, 2023
Viaarxiv icon

Human Control: Definitions and Algorithms

Add code
May 31, 2023
Viaarxiv icon

Reasoning about Causality in Games

Add code
Jan 05, 2023
Viaarxiv icon

Discovering Agents

Add code
Aug 24, 2022
Figure 1 for Discovering Agents
Figure 2 for Discovering Agents
Figure 3 for Discovering Agents
Figure 4 for Discovering Agents
Viaarxiv icon

Path-Specific Objectives for Safer Agent Incentives

Add code
Apr 21, 2022
Figure 1 for Path-Specific Objectives for Safer Agent Incentives
Figure 2 for Path-Specific Objectives for Safer Agent Incentives
Figure 3 for Path-Specific Objectives for Safer Agent Incentives
Figure 4 for Path-Specific Objectives for Safer Agent Incentives
Viaarxiv icon