Picture for Shengbo Wang

Shengbo Wang

Fast Convergence of Policy Regret in Learning Stochastic Optimal Control

Add code
May 25, 2026
Viaarxiv icon

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence

Add code
Mar 03, 2026
Viaarxiv icon

Non-Rectangular Average-Reward Robust MDPs: Optimal Policies and Their Transient Values

Add code
Mar 03, 2026
Viaarxiv icon

Achieving $\varepsilon^{-2}$ Dependence for Average-Reward Q-Learning with a New Contraction Principle

Add code
Jan 29, 2026
Viaarxiv icon

Preference is More Than Comparisons: Rethinking Dueling Bandits with Augmented Human Feedback

Add code
Nov 12, 2025
Viaarxiv icon

Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain

Add code
Sep 17, 2025
Viaarxiv icon

Hardware-Adaptive and Superlinear-Capacity Memristor-based Associative Memory

Add code
May 19, 2025
Viaarxiv icon

Near-Optimal Sample Complexities of Divergence-based S-rectangular Distributionally Robust Reinforcement Learning

Add code
May 18, 2025
Viaarxiv icon

Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning

Add code
May 15, 2025
Viaarxiv icon

Optimal Parameter Adaptation for Safety-Critical Control via Safe Barrier Bayesian Optimization

Add code
Mar 25, 2025
Viaarxiv icon