Picture for Qining Zhang

Qining Zhang

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

Add code
Sep 25, 2024
Figure 1 for Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Viaarxiv icon

Cost Aware Best Arm Identification

Add code
Feb 26, 2024
Figure 1 for Cost Aware Best Arm Identification
Figure 2 for Cost Aware Best Arm Identification
Figure 3 for Cost Aware Best Arm Identification
Figure 4 for Cost Aware Best Arm Identification
Viaarxiv icon

Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

Add code
Sep 01, 2023
Viaarxiv icon