Picture for Daniel Guo

Daniel Guo

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Oct 18, 2023
Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

Add code
Apr 30, 2020
Figure 1 for Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Figure 2 for Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Figure 3 for Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Figure 4 for Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Viaarxiv icon

Agent57: Outperforming the Atari Human Benchmark

Add code
Mar 30, 2020
Figure 1 for Agent57: Outperforming the Atari Human Benchmark
Figure 2 for Agent57: Outperforming the Atari Human Benchmark
Figure 3 for Agent57: Outperforming the Atari Human Benchmark
Figure 4 for Agent57: Outperforming the Atari Human Benchmark
Viaarxiv icon

Never Give Up: Learning Directed Exploration Strategies

Add code
Feb 14, 2020
Figure 1 for Never Give Up: Learning Directed Exploration Strategies
Figure 2 for Never Give Up: Learning Directed Exploration Strategies
Figure 3 for Never Give Up: Learning Directed Exploration Strategies
Figure 4 for Never Give Up: Learning Directed Exploration Strategies
Viaarxiv icon