Picture for Heyang Zhao

Heyang Zhao

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Add code
Nov 07, 2024
Viaarxiv icon

Feel-Good Thompson Sampling for Contextual Dueling Bandits

Add code
Apr 09, 2024
Viaarxiv icon

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

Add code
Nov 26, 2023
Viaarxiv icon

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

Add code
Oct 02, 2023
Figure 1 for Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
Viaarxiv icon

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

Add code
Oct 02, 2023
Viaarxiv icon

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

Add code
Feb 21, 2023
Viaarxiv icon

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

Add code
Dec 12, 2022
Viaarxiv icon

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Add code
Feb 28, 2022
Figure 1 for Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds
Viaarxiv icon

Linear Contextual Bandits with Adversarial Corruptions

Add code
Oct 25, 2021
Figure 1 for Linear Contextual Bandits with Adversarial Corruptions
Viaarxiv icon