Picture for Pierre Harvey Richemond

Pierre Harvey Richemond

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Viaarxiv icon

Understanding Self-Predictive Learning for Reinforcement Learning

Add code
Dec 06, 2022
Viaarxiv icon