Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weizhen Wang

Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch

Mar 28, 2025

Weizhen Wang, Jianping He, Xiaoming Duan

Abstract:Policy gradient methods are one of the most successful methods for solving challenging reinforcement learning problems. However, despite their empirical successes, many SOTA policy gradient algorithms for discounted problems deviate from the theoretical policy gradient theorem due to the existence of a distribution mismatch. In this work, we analyze the impact of this mismatch on the policy gradient methods. Specifically, we first show that in the case of tabular parameterizations, the methods under the mismatch remain globally optimal. Then, we extend this analysis to more general parameterizations by leveraging the theory of biased stochastic gradient descent. Our findings offer new insights into the robustness of policy gradient methods as well as the gap between theoretical foundations and practical implementations.

Via

Access Paper or Ask Questions

Embodied Scene Understanding for Vision Language Models via MetaVQA

Jan 15, 2025

Weizhen Wang, Chenda Duan, Zhenghao Peng, Yuxin Liu, Bolei Zhou

Figure 1 for Embodied Scene Understanding for Vision Language Models via MetaVQA

Figure 2 for Embodied Scene Understanding for Vision Language Models via MetaVQA

Figure 3 for Embodied Scene Understanding for Vision Language Models via MetaVQA

Figure 4 for Embodied Scene Understanding for Vision Language Models via MetaVQA

Abstract:Vision Language Models (VLMs) demonstrate significant potential as embodied AI agents for various mobility applications. However, a standardized, closed-loop benchmark for evaluating their spatial reasoning and sequential decision-making capabilities is lacking. To address this, we present MetaVQA: a comprehensive benchmark designed to assess and enhance VLMs' understanding of spatial relationships and scene dynamics through Visual Question Answering (VQA) and closed-loop simulations. MetaVQA leverages Set-of-Mark prompting and top-down view ground-truth annotations from nuScenes and Waymo datasets to automatically generate extensive question-answer pairs based on diverse real-world traffic scenarios, ensuring object-centric and context-rich instructions. Our experiments show that fine-tuning VLMs with the MetaVQA dataset significantly improves their spatial reasoning and embodied scene comprehension in safety-critical simulations, evident not only in improved VQA accuracies but also in emerging safety-aware driving maneuvers. In addition, the learning demonstrates strong transferability from simulation to real-world observation. Code and data will be publicly available at https://metadriverse.github.io/metavqa .

* for the project webpage, see https://metadriverse.github.io/metavqa

Via

Access Paper or Ask Questions