Picture for Rémi Munos

Rémi Munos

INRIA Lille - Nord Europe

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Add code
Mar 25, 2025
Viaarxiv icon

Optimizing Language Models for Inference Time Objectives using Reinforcement Learning

Add code
Mar 25, 2025
Viaarxiv icon

Learning to chain-of-thought with Jensen's evidence lower bound

Add code
Mar 25, 2025
Viaarxiv icon

Temporal Difference Flows

Add code
Mar 12, 2025
Viaarxiv icon

Multi-turn Reinforcement Learning from Preference Human Feedback

Add code
May 23, 2024
Figure 1 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 2 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 3 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 4 for Multi-turn Reinforcement Learning from Preference Human Feedback
Viaarxiv icon

Understanding the performance gap between online and offline alignment algorithms

Add code
May 14, 2024
Figure 1 for Understanding the performance gap between online and offline alignment algorithms
Figure 2 for Understanding the performance gap between online and offline alignment algorithms
Figure 3 for Understanding the performance gap between online and offline alignment algorithms
Figure 4 for Understanding the performance gap between online and offline alignment algorithms
Viaarxiv icon

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Add code
Feb 12, 2024
Figure 1 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 2 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 3 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 4 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Viaarxiv icon

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Add code
Feb 08, 2024
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Figure 1 for Nash Learning from Human Feedback
Figure 2 for Nash Learning from Human Feedback
Figure 3 for Nash Learning from Human Feedback
Figure 4 for Nash Learning from Human Feedback
Viaarxiv icon