Picture for Rémi Munos

Rémi Munos

INRIA Lille - Nord Europe

Safety Alignment of LMs via Non-cooperative Games

Add code
Dec 23, 2025
Viaarxiv icon

Positional Encoding via Token-Aware Phase Attention

Add code
Sep 16, 2025
Viaarxiv icon

On a few pitfalls in KL divergence gradient estimation for RL

Add code
Jun 11, 2025
Viaarxiv icon

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Add code
Mar 25, 2025
Viaarxiv icon

Optimizing Language Models for Inference Time Objectives using Reinforcement Learning

Add code
Mar 25, 2025
Viaarxiv icon

Learning to chain-of-thought with Jensen's evidence lower bound

Add code
Mar 25, 2025
Viaarxiv icon

Temporal Difference Flows

Add code
Mar 12, 2025
Viaarxiv icon

Multi-turn Reinforcement Learning from Preference Human Feedback

Add code
May 23, 2024
Figure 1 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 2 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 3 for Multi-turn Reinforcement Learning from Preference Human Feedback
Figure 4 for Multi-turn Reinforcement Learning from Preference Human Feedback
Viaarxiv icon

Understanding the performance gap between online and offline alignment algorithms

Add code
May 14, 2024
Figure 1 for Understanding the performance gap between online and offline alignment algorithms
Figure 2 for Understanding the performance gap between online and offline alignment algorithms
Figure 3 for Understanding the performance gap between online and offline alignment algorithms
Figure 4 for Understanding the performance gap between online and offline alignment algorithms
Viaarxiv icon

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Add code
Feb 12, 2024
Figure 1 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 2 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 3 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 4 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Viaarxiv icon