Picture for Heyang Zhao

Heyang Zhao

Logarithmic Regret for Online KL-Regularized Reinforcement Learning

Add code
Feb 11, 2025
Viaarxiv icon

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability

Add code
Feb 09, 2025
Viaarxiv icon

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Add code
Nov 07, 2024
Figure 1 for Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Figure 2 for Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Viaarxiv icon

Feel-Good Thompson Sampling for Contextual Dueling Bandits

Add code
Apr 09, 2024
Viaarxiv icon

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Add code
Nov 26, 2023
Viaarxiv icon

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

Add code
Oct 02, 2023
Viaarxiv icon

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Add code
Oct 02, 2023
Figure 1 for Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
Viaarxiv icon

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

Add code
Feb 21, 2023
Viaarxiv icon

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

Add code
Dec 12, 2022
Viaarxiv icon

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Add code
Feb 28, 2022
Figure 1 for Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds
Viaarxiv icon