Picture for Jincheng Mei

Jincheng Mei

Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment

Add code
Oct 28, 2024
Viaarxiv icon

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

Add code
May 31, 2024
Viaarxiv icon

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Add code
May 29, 2024
Figure 1 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Figure 2 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Figure 3 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Viaarxiv icon

Stochastic Gradient Succeeds for Bandits

Add code
Feb 27, 2024
Viaarxiv icon

Beyond Expectations: Learning with Stochastic Dominance Made Practical

Add code
Feb 05, 2024
Viaarxiv icon

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Add code
May 22, 2023
Viaarxiv icon

The Role of Baselines in Policy Gradient Optimization

Add code
Jan 16, 2023
Viaarxiv icon

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Add code
May 27, 2022
Figure 1 for KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Figure 2 for KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Figure 3 for KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Figure 4 for KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Viaarxiv icon

Understanding the Effect of Stochasticity in Policy Optimization

Add code
Oct 29, 2021
Figure 1 for Understanding the Effect of Stochasticity in Policy Optimization
Figure 2 for Understanding the Effect of Stochasticity in Policy Optimization
Figure 3 for Understanding the Effect of Stochasticity in Policy Optimization
Viaarxiv icon

Leveraging Non-uniformity in First-order Non-convex Optimization

Add code
May 13, 2021
Figure 1 for Leveraging Non-uniformity in First-order Non-convex Optimization
Figure 2 for Leveraging Non-uniformity in First-order Non-convex Optimization
Figure 3 for Leveraging Non-uniformity in First-order Non-convex Optimization
Figure 4 for Leveraging Non-uniformity in First-order Non-convex Optimization
Viaarxiv icon