Picture for Shicong Cen

Shicong Cen

Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment

Add code
Oct 28, 2024
Viaarxiv icon

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Add code
May 29, 2024
Figure 1 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Figure 2 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Figure 3 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Viaarxiv icon

Beyond Expectations: Learning with Stochastic Dominance Made Practical

Add code
Feb 05, 2024
Viaarxiv icon

Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning

Add code
Nov 01, 2023
Viaarxiv icon

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

Add code
Oct 08, 2023
Viaarxiv icon

Asynchronous Gradient Play in Zero-Sum Multi-agent Games

Add code
Nov 16, 2022
Viaarxiv icon

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Add code
Oct 04, 2022
Figure 1 for Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Figure 2 for Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Viaarxiv icon

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

Add code
Apr 12, 2022
Figure 1 for Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Figure 2 for Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Figure 3 for Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Viaarxiv icon

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Add code
May 31, 2021
Figure 1 for Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Figure 2 for Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Figure 3 for Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Viaarxiv icon

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Add code
May 24, 2021
Figure 1 for Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
Viaarxiv icon