Picture for Remi Munos

Remi Munos

INRIA Lille

Automatic Textbook Formalization

Add code
Apr 03, 2026
Viaarxiv icon

Expanding the Capabilities of Reinforcement Learning via Text Feedback

Add code
Feb 02, 2026
Viaarxiv icon

Outcome-based Exploration for LLM Reasoning

Add code
Sep 08, 2025
Viaarxiv icon

Soft Policy Optimization: Online Off-Policy RL for Sequence Models

Add code
Mar 07, 2025
Figure 1 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Figure 2 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Viaarxiv icon

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Super-Exponential Regret for UCT, AlphaGo and Variants

Add code
May 07, 2024
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Model-free Posterior Sampling via Learning Rate Randomization

Add code
Oct 27, 2023
Figure 1 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 2 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 3 for Model-free Posterior Sampling via Learning Rate Randomization
Figure 4 for Model-free Posterior Sampling via Learning Rate Randomization
Viaarxiv icon

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Add code
May 02, 2023
Figure 1 for Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Figure 2 for Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Figure 3 for Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Figure 4 for Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Viaarxiv icon

Fast Rates for Maximum Entropy Exploration

Add code
Mar 14, 2023
Figure 1 for Fast Rates for Maximum Entropy Exploration
Figure 2 for Fast Rates for Maximum Entropy Exploration
Figure 3 for Fast Rates for Maximum Entropy Exploration
Figure 4 for Fast Rates for Maximum Entropy Exploration
Viaarxiv icon