Picture for Qining Zhang

Qining Zhang

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

Add code
Sep 25, 2024
Viaarxiv icon

Cost Aware Best Arm Identification

Add code
Feb 26, 2024
Viaarxiv icon

Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

Add code
Sep 01, 2023
Viaarxiv icon