Picture for Mark Rowland

Mark Rowland

Foundations of Multivariate Distributional Reinforcement Learning

Add code
Aug 31, 2024
Figure 1 for Foundations of Multivariate Distributional Reinforcement Learning
Figure 2 for Foundations of Multivariate Distributional Reinforcement Learning
Figure 3 for Foundations of Multivariate Distributional Reinforcement Learning
Figure 4 for Foundations of Multivariate Distributional Reinforcement Learning
Viaarxiv icon

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Add code
Jun 04, 2024
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Viaarxiv icon

A Distributional Analogue to the Successor Representation

Add code
Feb 13, 2024
Viaarxiv icon

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Add code
Feb 12, 2024
Figure 1 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 2 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 3 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 4 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Viaarxiv icon

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Add code
Feb 08, 2024
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Viaarxiv icon

Distributional Bellman Operators over Mean Embeddings

Add code
Dec 09, 2023
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Oct 18, 2023
Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon