Picture for Qingyuan Wu

Qingyuan Wu

A Unified Framework for Rethinking Policy Divergence Measures in GRPO

Add code
Feb 05, 2026
Viaarxiv icon

Directly Forecasting Belief for Reinforcement Learning with Delays

Add code
May 01, 2025
Viaarxiv icon

VSC-RL: Advancing Autonomous Vision-Language Agents with Variational Subgoal-Conditioned Reinforcement Learning

Add code
Feb 11, 2025
Viaarxiv icon

Inverse Delayed Reinforcement Learning

Add code
Dec 04, 2024
Figure 1 for Inverse Delayed Reinforcement Learning
Figure 2 for Inverse Delayed Reinforcement Learning
Figure 3 for Inverse Delayed Reinforcement Learning
Figure 4 for Inverse Delayed Reinforcement Learning
Viaarxiv icon

Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments

Add code
Oct 04, 2024
Figure 1 for Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments
Figure 2 for Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments
Figure 3 for Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments
Figure 4 for Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments
Viaarxiv icon

Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

Add code
Jun 12, 2024
Figure 1 for Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
Figure 2 for Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
Figure 3 for Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
Figure 4 for Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
Viaarxiv icon

Highway Value Iteration Networks

Add code
Jun 05, 2024
Figure 1 for Highway Value Iteration Networks
Figure 2 for Highway Value Iteration Networks
Figure 3 for Highway Value Iteration Networks
Figure 4 for Highway Value Iteration Networks
Viaarxiv icon

Highway Reinforcement Learning

Add code
May 28, 2024
Figure 1 for Highway Reinforcement Learning
Figure 2 for Highway Reinforcement Learning
Figure 3 for Highway Reinforcement Learning
Figure 4 for Highway Reinforcement Learning
Viaarxiv icon

Variational Delayed Policy Optimization

Add code
May 23, 2024
Viaarxiv icon

Boosting Long-Delayed Reinforcement Learning with Auxiliary Short-Delayed Task

Add code
Feb 05, 2024
Viaarxiv icon