Picture for Bernardo Ávila Pires

Bernardo Ávila Pires

Understanding the performance gap between online and offline alignment algorithms

Add code
May 14, 2024
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Viaarxiv icon

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Add code
Feb 08, 2024
Viaarxiv icon

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

Add code
May 29, 2023
Figure 1 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 2 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 3 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Figure 4 for DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Viaarxiv icon

Understanding Self-Predictive Learning for Reinforcement Learning

Add code
Dec 06, 2022
Viaarxiv icon

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

Add code
Jul 15, 2022
Figure 1 for The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Figure 2 for The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Figure 3 for The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Figure 4 for The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Viaarxiv icon

Multiclass Classification Calibration Functions

Add code
Sep 20, 2016
Figure 1 for Multiclass Classification Calibration Functions
Figure 2 for Multiclass Classification Calibration Functions
Figure 3 for Multiclass Classification Calibration Functions
Figure 4 for Multiclass Classification Calibration Functions
Viaarxiv icon

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models

Add code
Sep 20, 2016
Figure 1 for Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models
Figure 2 for Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models
Viaarxiv icon