Picture for Tom Bewley

Tom Bewley

Interpreting Language Reward Models via Contrastive Explanations

Add code
Nov 25, 2024
Figure 1 for Interpreting Language Reward Models via Contrastive Explanations
Figure 2 for Interpreting Language Reward Models via Contrastive Explanations
Figure 3 for Interpreting Language Reward Models via Contrastive Explanations
Figure 4 for Interpreting Language Reward Models via Contrastive Explanations
Viaarxiv icon

Counterfactual Metarules for Local and Global Recourse

Add code
May 29, 2024
Viaarxiv icon

Conservative World Models

Add code
Sep 26, 2023
Viaarxiv icon

Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

Add code
May 26, 2023
Viaarxiv icon

Reward Learning with Trees: Methods and Evaluation

Add code
Oct 03, 2022
Figure 1 for Reward Learning with Trees: Methods and Evaluation
Figure 2 for Reward Learning with Trees: Methods and Evaluation
Figure 3 for Reward Learning with Trees: Methods and Evaluation
Figure 4 for Reward Learning with Trees: Methods and Evaluation
Viaarxiv icon

Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning

Add code
May 30, 2022
Figure 1 for Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning
Figure 2 for Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning
Figure 3 for Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning
Figure 4 for Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning
Viaarxiv icon

Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction

Add code
Jan 17, 2022
Figure 1 for Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction
Figure 2 for Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction
Figure 3 for Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction
Figure 4 for Summarising and Comparing Agent Dynamics with Contrastive Spatiotemporal Abstraction
Viaarxiv icon

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

Add code
Dec 20, 2021
Figure 1 for Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions
Figure 2 for Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions
Figure 3 for Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions
Figure 4 for Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions
Viaarxiv icon

TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments

Add code
Sep 21, 2020
Figure 1 for TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments
Figure 2 for TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments
Figure 3 for TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments
Figure 4 for TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments
Viaarxiv icon

Am I Building a White Box Agent or Interpreting a Black Box Agent?

Add code
Jul 08, 2020
Viaarxiv icon