Picture for Mohammad Gheshlaghi Azar

Mohammad Gheshlaghi Azar

Radboud University

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Add code
Jun 27, 2024
Viaarxiv icon

Averaging log-likelihoods in direct alignment

Add code
Jun 27, 2024
Viaarxiv icon

Self-Improving Robust Preference Optimization

Add code
Jun 03, 2024
Viaarxiv icon

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Viaarxiv icon

Nash Learning from Human Feedback

Add code
Dec 06, 2023
Viaarxiv icon

A General Theoretical Paradigm to Understand Learning from Human Preferences

Add code
Oct 18, 2023
Figure 1 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Figure 2 for A General Theoretical Paradigm to Understand Learning from Human Preferences
Viaarxiv icon

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Add code
May 22, 2023
Viaarxiv icon

An Analysis of Quantile Temporal-Difference Learning

Add code
Jan 11, 2023
Viaarxiv icon

Understanding Self-Predictive Learning for Reinforcement Learning

Add code
Dec 06, 2022
Viaarxiv icon

BYOL-Explore: Exploration by Bootstrapped Prediction

Add code
Jun 16, 2022
Figure 1 for BYOL-Explore: Exploration by Bootstrapped Prediction
Figure 2 for BYOL-Explore: Exploration by Bootstrapped Prediction
Figure 3 for BYOL-Explore: Exploration by Bootstrapped Prediction
Figure 4 for BYOL-Explore: Exploration by Bootstrapped Prediction
Viaarxiv icon