Picture for Anna Winnicki

Anna Winnicki

Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

Add code
Feb 15, 2024
Figure 1 for Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Figure 2 for Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Figure 3 for Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization
Viaarxiv icon

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

Add code
Mar 17, 2023
Viaarxiv icon

On The Convergence Of Policy Iteration-Based Reinforcement Learning With Monte Carlo Policy Evaluation

Add code
Jan 23, 2023
Viaarxiv icon

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

Add code
Oct 13, 2022
Viaarxiv icon

The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation

Add code
Sep 28, 2021
Figure 1 for The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation
Figure 2 for The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation
Figure 3 for The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation
Figure 4 for The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation
Viaarxiv icon

Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure

Add code
Feb 13, 2021
Figure 1 for Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure
Figure 2 for Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure
Figure 3 for Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure
Figure 4 for Optimistic Policy Iteration for MDPs with Acyclic Transient State Structure
Viaarxiv icon