Picture for Rémi Munos

Rémi Munos

INRIA Lille - Nord Europe

Multi-turn Reinforcement Learning from Preference Human Feedback

Add code
May 23, 2024
Figure 1 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 2 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 3 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 4 for Multi-turn Reinforcement Learning from Preference Human Feedback
Viaarxiv icon

Understanding the performance gap between online and offline alignment algorithms

Add code
May 14, 2024
Viaarxiv icon

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Add code
Feb 12, 2024
Figure 1 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 2 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 3 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 4 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Viaarxiv icon

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Add code
Feb 08, 2024
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Oct 18, 2023
Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Local and adaptive mirror descents in extensive-form games

Add code
Sep 01, 2023
Figure 1 for Local and adaptive mirror descents in extensive-form games
Viaarxiv icon

VA-learning as a more efficient alternative to Q-learning

Add code
May 29, 2023
Viaarxiv icon

Towards a Better Understanding of Representation Dynamics under TD-learning

Add code
May 29, 2023
Viaarxiv icon