Picture for Qinbo Bai

Qinbo Bai

Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

Add code
Feb 03, 2024
Viaarxiv icon

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

Add code
Sep 05, 2023
Viaarxiv icon

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

Add code
Jun 12, 2022
Figure 1 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Figure 2 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Figure 3 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
Viaarxiv icon

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Add code
Sep 13, 2021
Figure 1 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
Figure 2 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
Figure 3 for Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach
Viaarxiv icon

Concave Utility Reinforcement Learning with Zero-Constraint Violations

Add code
Sep 12, 2021
Figure 1 for Concave Utility Reinforcement Learning with Zero-Constraint Violations
Figure 2 for Concave Utility Reinforcement Learning with Zero-Constraint Violations
Figure 3 for Concave Utility Reinforcement Learning with Zero-Constraint Violations
Figure 4 for Concave Utility Reinforcement Learning with Zero-Constraint Violations
Viaarxiv icon

Markov Decision Processes with Long-Term Average Constraints

Add code
Jun 12, 2021
Figure 1 for Markov Decision Processes with Long-Term Average Constraints
Figure 2 for Markov Decision Processes with Long-Term Average Constraints
Figure 3 for Markov Decision Processes with Long-Term Average Constraints
Figure 4 for Markov Decision Processes with Long-Term Average Constraints
Viaarxiv icon

Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

Add code
May 28, 2021
Figure 1 for Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Figure 2 for Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Figure 3 for Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Viaarxiv icon

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

Add code
Jun 10, 2020
Figure 1 for Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints
Figure 2 for Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints
Viaarxiv icon

Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints

Add code
Mar 11, 2020
Figure 1 for Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints
Figure 2 for Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints
Figure 3 for Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints
Viaarxiv icon

Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent

Add code
Oct 03, 2019
Viaarxiv icon