Picture for Yann Ollivier

Yann Ollivier

FAIR

Likelihood-Based Reward Designs for General LLM Reasoning

Add code
Feb 03, 2026
Viaarxiv icon

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Add code
Jan 26, 2026
Viaarxiv icon

Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting

Add code
Dec 05, 2024
Figure 1 for Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
Figure 2 for Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
Figure 3 for Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
Figure 4 for Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
Viaarxiv icon

Simple Ingredients for Offline Reinforcement Learning

Add code
Mar 19, 2024
Figure 1 for Simple Ingredients for Offline Reinforcement Learning
Figure 2 for Simple Ingredients for Offline Reinforcement Learning
Figure 3 for Simple Ingredients for Offline Reinforcement Learning
Figure 4 for Simple Ingredients for Offline Reinforcement Learning
Viaarxiv icon

Does Zero-Shot Reinforcement Learning Exist?

Add code
Sep 29, 2022
Figure 1 for Does Zero-Shot Reinforcement Learning Exist?
Figure 2 for Does Zero-Shot Reinforcement Learning Exist?
Figure 3 for Does Zero-Shot Reinforcement Learning Exist?
Figure 4 for Does Zero-Shot Reinforcement Learning Exist?
Viaarxiv icon

Agnostic Physics-Driven Deep Learning

Add code
May 30, 2022
Figure 1 for Agnostic Physics-Driven Deep Learning
Figure 2 for Agnostic Physics-Driven Deep Learning
Figure 3 for Agnostic Physics-Driven Deep Learning
Figure 4 for Agnostic Physics-Driven Deep Learning
Viaarxiv icon

Unbiased Methods for Multi-Goal Reinforcement Learning

Add code
Jun 16, 2021
Figure 1 for Unbiased Methods for Multi-Goal Reinforcement Learning
Figure 2 for Unbiased Methods for Multi-Goal Reinforcement Learning
Viaarxiv icon

Learning One Representation to Optimize All Rewards

Add code
Mar 14, 2021
Figure 1 for Learning One Representation to Optimize All Rewards
Figure 2 for Learning One Representation to Optimize All Rewards
Figure 3 for Learning One Representation to Optimize All Rewards
Figure 4 for Learning One Representation to Optimize All Rewards
Viaarxiv icon

Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint

Add code
Jan 18, 2021
Figure 1 for Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint
Viaarxiv icon

Convergence of Online Adaptive and Recurrent Optimization Algorithms

Add code
May 12, 2020
Viaarxiv icon