Picture for Shicong Cen

Shicong Cen

Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment

Add code
Oct 28, 2024
Figure 1 for Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Figure 2 for Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Figure 3 for Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment
Viaarxiv icon

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Add code
May 29, 2024
Figure 1 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Figure 2 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Figure 3 for Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Viaarxiv icon

Beyond Expectations: Learning with Stochastic Dominance Made Practical

Add code
Feb 05, 2024
Figure 1 for Beyond Expectations: Learning with Stochastic Dominance Made Practical
Figure 2 for Beyond Expectations: Learning with Stochastic Dominance Made Practical
Figure 3 for Beyond Expectations: Learning with Stochastic Dominance Made Practical
Figure 4 for Beyond Expectations: Learning with Stochastic Dominance Made Practical
Viaarxiv icon

Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning

Add code
Nov 01, 2023
Viaarxiv icon

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

Add code
Oct 08, 2023
Figure 1 for Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control
Figure 2 for Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control
Viaarxiv icon

Asynchronous Gradient Play in Zero-Sum Multi-agent Games

Add code
Nov 16, 2022
Figure 1 for Asynchronous Gradient Play in Zero-Sum Multi-agent Games
Figure 2 for Asynchronous Gradient Play in Zero-Sum Multi-agent Games
Figure 3 for Asynchronous Gradient Play in Zero-Sum Multi-agent Games
Viaarxiv icon

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Add code
Oct 04, 2022
Figure 1 for Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Figure 2 for Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
Viaarxiv icon

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

Add code
Apr 12, 2022
Figure 1 for Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Figure 2 for Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Figure 3 for Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization
Viaarxiv icon

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Add code
May 31, 2021
Figure 1 for Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Figure 2 for Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Figure 3 for Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Viaarxiv icon

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Add code
May 24, 2021
Figure 1 for Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
Viaarxiv icon