Picture for Sina Ghiassian

Sina Ghiassian

Learning in complex action spaces without policy gradients

Add code
Oct 08, 2024
Figure 1 for Learning in complex action spaces without policy gradients
Figure 2 for Learning in complex action spaces without policy gradients
Figure 3 for Learning in complex action spaces without policy gradients
Figure 4 for Learning in complex action spaces without policy gradients
Viaarxiv icon

Soft Preference Optimization: Aligning Language Models to Expert Distributions

Add code
Apr 30, 2024
Figure 1 for Soft Preference Optimization: Aligning Language Models to Expert Distributions
Viaarxiv icon

On the Importance of Uncertainty in Decision-Making with Large Language Models

Add code
Apr 03, 2024
Viaarxiv icon

In-context Exploration-Exploitation for Reinforcement Learning

Add code
Mar 11, 2024
Viaarxiv icon

Auxiliary task discovery through generate-and-test

Add code
Oct 25, 2022
Viaarxiv icon

Importance Sampling Placement in Off-Policy Temporal-Difference Methods

Add code
Mar 18, 2022
Figure 1 for Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Figure 2 for Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Viaarxiv icon

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

Add code
Sep 10, 2021
Figure 1 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 2 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 3 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Figure 4 for An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment
Viaarxiv icon

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

Add code
Jun 11, 2021
Figure 1 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 2 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 3 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Figure 4 for An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task
Viaarxiv icon

A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning

Add code
Apr 28, 2021
Figure 1 for A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Figure 2 for A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Figure 3 for A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Figure 4 for A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Viaarxiv icon

Does Standard Backpropagation Forget Less Catastrophically Than Adam?

Add code
Feb 20, 2021
Figure 1 for Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Figure 2 for Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Figure 3 for Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Figure 4 for Does Standard Backpropagation Forget Less Catastrophically Than Adam?
Viaarxiv icon