Picture for Heyang Zhao

Heyang Zhao

Logarithmic Regret for Online KL-Regularized Reinforcement Learning

Add code
Feb 11, 2025
Viaarxiv icon

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability

Add code
Feb 09, 2025
Figure 1 for Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Viaarxiv icon

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Add code
Nov 07, 2024
Figure 1 for Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Figure 2 for Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Viaarxiv icon

Feel-Good Thompson Sampling for Contextual Dueling Bandits

Add code
Apr 09, 2024
Viaarxiv icon

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Add code
Nov 26, 2023
Viaarxiv icon

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Add code
Oct 02, 2023
Figure 1 for Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
Viaarxiv icon

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

Add code
Oct 02, 2023
Figure 1 for Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
Viaarxiv icon

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

Add code
Feb 21, 2023
Figure 1 for Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
Figure 2 for Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
Viaarxiv icon

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

Add code
Dec 12, 2022
Figure 1 for Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes
Viaarxiv icon

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Add code
Feb 28, 2022
Figure 1 for Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds
Viaarxiv icon