Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zheng Pan

FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving

May 23, 2025

Shuang Zeng, Xinyuan Chang, Mengwei Xie, Xinran Liu, Yifan Bai, Zheng Pan, Mu Xu, Xing Wei

Abstract:Visual language models (VLMs) have attracted increasing interest in autonomous driving due to their powerful reasoning capabilities. However, existing VLMs typically utilize discrete text Chain-of-Thought (CoT) tailored to the current scenario, which essentially represents highly abstract and symbolic compression of visual information, potentially leading to spatio-temporal relationship ambiguity and fine-grained information loss. Is autonomous driving better modeled on real-world simulation and imagination than on pure symbolic logic? In this paper, we propose a spatio-temporal CoT reasoning method that enables models to think visually. First, VLM serves as a world model to generate unified image frame for predicting future world states: where perception results (e.g., lane divider and 3D detection) represent the future spatial relationships, and ordinary future frame represent the temporal evolution relationships. This spatio-temporal CoT then serves as intermediate reasoning steps, enabling the VLM to function as an inverse dynamics model for trajectory planning based on current observations and future predictions. To implement visual generation in VLMs, we propose a unified pretraining paradigm integrating visual generation and understanding, along with a progressive visual CoT enhancing autoregressive image generation. Extensive experimental results demonstrate the effectiveness of the proposed method, advancing autonomous driving towards visual reasoning.

Via

Access Paper or Ask Questions

Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Oct 31, 2024

Xinyuan Chang, Maixuan Xue, Xinran Liu, Zheng Pan, Xing Wei

Figure 1 for Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Figure 2 for Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Figure 3 for Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Figure 4 for Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map

Abstract:Ensuring adherence to traffic sign regulations is essential for both human and autonomous vehicle navigation. While current benchmark datasets concentrate on lane perception or basic traffic sign recognition, they often overlook the intricate task of integrating these regulations into lane operations. Addressing this gap, we introduce MapDR, a novel dataset designed for the extraction of Driving Rules from traffic signs and their association with vectorized, locally perceived HD Maps. MapDR features over 10,000 annotated video clips that capture the intricate correlation between traffic sign regulations and lanes. We define two pivotal sub-tasks: 1) Rule Extraction from Traffic Sign, which accurately deciphers regulatory instructions, and 2) Rule-Lane Correspondence Reasoning, which aligns these rules with their respective lanes. Built upon this benchmark, we provide a multimodal solution that offers a strong baseline for advancing autonomous driving technologies. It fills a critical gap in the integration of traffic sign rules, contributing to the development of reliable autonomous navigation systems.

* 27 pages, 13 figures

Via

Access Paper or Ask Questions

Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping

Sep 11, 2024

Shuang Zeng, Xinyuan Chang, Xinran Liu, Zheng Pan, Xing Wei

Figure 1 for Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping

Figure 2 for Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping

Figure 3 for Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping

Figure 4 for Driving with Prior Maps: Unified Vector Prior Encoding for Autonomous Vehicle Mapping

Abstract:High-Definition Maps (HD maps) are essential for the precise navigation and decision-making of autonomous vehicles, yet their creation and upkeep present significant cost and timeliness challenges. The online construction of HD maps using on-board sensors has emerged as a promising solution; however, these methods can be impeded by incomplete data due to occlusions and inclement weather. This paper proposes the PriorDrive framework to addresses these limitations by harnessing the power of prior maps, significantly enhancing the robustness and accuracy of online HD map construction. Our approach integrates a variety of prior maps, such as OpenStreetMap's Standard Definition Maps (SD maps), outdated HD maps from vendors, and locally constructed maps from historical vehicle data. To effectively encode this prior information into online mapping models, we introduce a Hybrid Prior Representation (HPQuery) that standardizes the representation of diverse map elements. At the core of PriorDrive is the Unified Vector Encoder (UVE), which employs a dual encoding mechanism to process vector data. The intra-vector encoder captures fine-grained local features, while the inter-vector encoder integrates global context. Furthermore, we propose a segment-level and point-level pre-training strategy that enables the UVE to learn the prior distribution of vector data, thereby improving the encoder's generalizability and performance. Through extensive testing on the nuScenes dataset, we demonstrate that PriorDrive is highly compatible with various online mapping models and substantially improves map prediction capabilities. The integration of prior maps through the PriorDrive framework offers a robust solution to the challenges of single-perception data, paving the way for more reliable autonomous vehicle navigation.

Via

Access Paper or Ask Questions

Adaptive Approximate Implicitization of Planar Parametric Curves via Weak Gradient Constraints

Feb 23, 2023

Minghao Guo, Yan Gao, Zheng Pan

Abstract:Converting a parametric curve into the implicit form, which is called implicitization, has always been a popular but challenging problem in geometric modeling and related applications. However, the existing methods mostly suffer from the problems of maintaining geometric features and choosing a reasonable implicit degree. The present paper has two contributions. We first introduce a new regularization constraint(called the weak gradient constraint) for both polynomial and non-polynomial curves, which efficiently possesses shape preserving. We then propose two adaptive algorithms of approximate implicitization for polynomial and non-polynomial curves respectively, which find the ``optimal'' implicit degree based on the behavior of the weak gradient constraint. More precisely, the idea is gradually increasing the implicit degree, until there is no obvious improvement in the weak gradient loss of the outputs. Experimental results have shown the effectiveness and high quality of our proposed methods.

Via

Access Paper or Ask Questions

Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video Services

May 05, 2022

Fan Zhang, Qiuying Peng, Yulin Wu, Zheng Pan, Rong Zeng, Da Lin, Yue Qi

Figure 1 for Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video Services

Figure 2 for Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video Services

Figure 3 for Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video Services

Figure 4 for Multi-Graph based Multi-Scenario Recommendation in Large-scale Online Video Services

Abstract:Recently, industrial recommendation services have been boosted by the continual upgrade of deep learning methods. However, they still face de-biasing challenges such as exposure bias and cold-start problem, where circulations of machine learning training on human interaction history leads algorithms to repeatedly suggest exposed items while ignoring less-active ones. Additional problems exist in multi-scenario platforms, e.g. appropriate data fusion from subsidiary scenarios, which we observe could be alleviated through graph structured data integration via message passing. In this paper, we present a multi-graph structured multi-scenario recommendation solution, which encapsulates interaction data across scenarios with multi-graph and obtains representation via graph learning. Extensive offline and online experiments on real-world datasets are conducted where the proposed method demonstrates an increase of 0.63% and 0.71% in CTR and Video Views per capita on new users over deployed set of baselines and outperforms regular method in increasing the number of outer-scenario videos by 25% and video watches by 116%, validating its superiority in activating cold videos and enriching target recommendation.

* Accepted to WWW 2022 Graph Learning workshop

Via

Access Paper or Ask Questions

Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Apr 05, 2019

Xin Li, Yiming Zhou, Zheng Pan, Jiashi Feng

Figure 1 for Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Figure 2 for Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Figure 3 for Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Figure 4 for Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Abstract:Achieving good speed and accuracy trade-off on a target platform is very important in deploying deep neural networks in real world scenarios. However, most existing automatic architecture search approaches only concentrate on high performance. In this work, we propose an algorithm that can offer better speed/accuracy trade-off of searched networks, which is termed "Partial Order Pruning". It prunes the architecture search space with a partial order assumption to automatically search for the architectures with the best speed and accuracy trade-off. Our algorithm explicitly takes profile information about the inference speed on the target platform into consideration. With the proposed algorithm, we present several Dongfeng (DF) networks that provide high accuracy and fast inference speed on various application GPU platforms. By further searching decoder architectures, our DF-Seg real-time segmentation networks yield state-of-the-art speed/accuracy trade-off on both the target embedded device and the high-end GPU.

* Accepted to CVPR 2019

Via

Access Paper or Ask Questions

Relaxed Sparse Eigenvalue Conditions for Sparse Estimation via Non-convex Regularized Regression

Feb 12, 2014

Zheng Pan, Changshui Zhang

Figure 1 for Relaxed Sparse Eigenvalue Conditions for Sparse Estimation via Non-convex Regularized Regression

Figure 2 for Relaxed Sparse Eigenvalue Conditions for Sparse Estimation via Non-convex Regularized Regression

Figure 3 for Relaxed Sparse Eigenvalue Conditions for Sparse Estimation via Non-convex Regularized Regression

Figure 4 for Relaxed Sparse Eigenvalue Conditions for Sparse Estimation via Non-convex Regularized Regression

Abstract:Non-convex regularizers usually improve the performance of sparse estimation in practice. To prove this fact, we study the conditions of sparse estimations for the sharp concave regularizers which are a general family of non-convex regularizers including many existing regularizers. For the global solutions of the regularized regression, our sparse eigenvalue based conditions are weaker than that of L1-regularization for parameter estimation and sparseness estimation. For the approximate global and approximate stationary (AGAS) solutions, almost the same conditions are also enough. We show that the desired AGAS solutions can be obtained by coordinate descent (CD) based methods. Finally, we perform some experiments to show the performance of CD methods on giving AGAS solutions and the degree of weakness of the estimation conditions required by the sharp concave regularizers.

Via

Access Paper or Ask Questions