Picture for Miljan Martic

Miljan Martic

Causal Analysis of Agent Behavior for AI Safety

Add code
Mar 05, 2021
Figure 1 for Causal Analysis of Agent Behavior for AI Safety
Figure 2 for Causal Analysis of Agent Behavior for AI Safety
Figure 3 for Causal Analysis of Agent Behavior for AI Safety
Figure 4 for Causal Analysis of Agent Behavior for AI Safety
Viaarxiv icon

Algorithms for Causal Reasoning in Probability Trees

Add code
Nov 12, 2020
Figure 1 for Algorithms for Causal Reasoning in Probability Trees
Figure 2 for Algorithms for Causal Reasoning in Probability Trees
Figure 3 for Algorithms for Causal Reasoning in Probability Trees
Figure 4 for Algorithms for Causal Reasoning in Probability Trees
Viaarxiv icon

Meta-trained agents implement Bayes-optimal agents

Add code
Oct 21, 2020
Figure 1 for Meta-trained agents implement Bayes-optimal agents
Figure 2 for Meta-trained agents implement Bayes-optimal agents
Figure 3 for Meta-trained agents implement Bayes-optimal agents
Figure 4 for Meta-trained agents implement Bayes-optimal agents
Viaarxiv icon

Avoiding Side Effects By Considering Future Tasks

Add code
Oct 15, 2020
Figure 1 for Avoiding Side Effects By Considering Future Tasks
Figure 2 for Avoiding Side Effects By Considering Future Tasks
Figure 3 for Avoiding Side Effects By Considering Future Tasks
Figure 4 for Avoiding Side Effects By Considering Future Tasks
Viaarxiv icon

Scaling shared model governance via model splitting

Add code
Dec 14, 2018
Figure 1 for Scaling shared model governance via model splitting
Figure 2 for Scaling shared model governance via model splitting
Figure 3 for Scaling shared model governance via model splitting
Figure 4 for Scaling shared model governance via model splitting
Viaarxiv icon

Scalable agent alignment via reward modeling: a research direction

Add code
Nov 19, 2018
Figure 1 for Scalable agent alignment via reward modeling: a research direction
Figure 2 for Scalable agent alignment via reward modeling: a research direction
Figure 3 for Scalable agent alignment via reward modeling: a research direction
Figure 4 for Scalable agent alignment via reward modeling: a research direction
Viaarxiv icon

Measuring and avoiding side effects using relative reachability

Add code
Jun 04, 2018
Figure 1 for Measuring and avoiding side effects using relative reachability
Figure 2 for Measuring and avoiding side effects using relative reachability
Figure 3 for Measuring and avoiding side effects using relative reachability
Figure 4 for Measuring and avoiding side effects using relative reachability
Viaarxiv icon

AI Safety Gridworlds

Add code
Nov 28, 2017
Figure 1 for AI Safety Gridworlds
Figure 2 for AI Safety Gridworlds
Figure 3 for AI Safety Gridworlds
Figure 4 for AI Safety Gridworlds
Viaarxiv icon

Deep reinforcement learning from human preferences

Add code
Jul 13, 2017
Figure 1 for Deep reinforcement learning from human preferences
Figure 2 for Deep reinforcement learning from human preferences
Figure 3 for Deep reinforcement learning from human preferences
Figure 4 for Deep reinforcement learning from human preferences
Viaarxiv icon