Picture for Kaiqing Zhang

Kaiqing Zhang

Provable Partially Observable Reinforcement Learning with Privileged Information

Add code
Dec 01, 2024
Viaarxiv icon

Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Add code
Sep 02, 2024
Viaarxiv icon

Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Add code
Apr 30, 2024
Figure 1 for Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Figure 2 for Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Viaarxiv icon

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Add code
Mar 25, 2024
Viaarxiv icon

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Add code
Dec 08, 2023
Viaarxiv icon

Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use

Add code
Oct 02, 2023
Figure 1 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 2 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 3 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 4 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Viaarxiv icon

Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing

Add code
Aug 16, 2023
Viaarxiv icon

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

Add code
Jul 28, 2023
Viaarxiv icon

Multi-Player Zero-Sum Markov Games with Networked Separable Interactions

Add code
Jul 13, 2023
Viaarxiv icon

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

Add code
Jun 20, 2023
Viaarxiv icon