Picture for Chenlu Ye

Chenlu Ye

Self-rewarding correction for mathematical reasoning

Add code
Feb 26, 2025
Viaarxiv icon

Logarithmic Regret for Online KL-Regularized Reinforcement Learning

Add code
Feb 11, 2025
Viaarxiv icon

Catoni Contextual Bandits are Robust to Heavy-tailed Rewards

Add code
Feb 04, 2025
Viaarxiv icon

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Add code
Nov 07, 2024
Figure 1 for Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Figure 2 for Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Viaarxiv icon

Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

Add code
Feb 15, 2024
Viaarxiv icon

A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference

Add code
Feb 11, 2024
Viaarxiv icon

Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF

Add code
Dec 18, 2023
Viaarxiv icon

Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks

Add code
Nov 24, 2023
Viaarxiv icon

Corruption-Robust Offline Reinforcement Learning with General Function Approximation

Add code
Oct 23, 2023
Viaarxiv icon

Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

Add code
Sep 05, 2023
Viaarxiv icon