Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yishen Ji

CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving

Mar 28, 2025

Yishen Ji, Ziyue Zhu, Zhenxin Zhu, Kaixin Xiong, Ming Lu, Zhiqi Li, Lijun Zhou, Haiyang Sun, Bing Wang, Tong Lu

Abstract:Recent progress in driving video generation has shown significant potential for enhancing self-driving systems by providing scalable and controllable training data. Although pretrained state-of-the-art generation models, guided by 2D layout conditions (e.g., HD maps and bounding boxes), can produce photorealistic driving videos, achieving controllable multi-view videos with high 3D consistency remains a major challenge. To tackle this, we introduce a novel spatial adaptive generation framework, CoGen, which leverages advances in 3D generation to improve performance in two key aspects: (i) To ensure 3D consistency, we first generate high-quality, controllable 3D conditions that capture the geometry of driving scenes. By replacing coarse 2D conditions with these fine-grained 3D representations, our approach significantly enhances the spatial consistency of the generated videos. (ii) Additionally, we introduce a consistency adapter module to strengthen the robustness of the model to multi-condition control. The results demonstrate that this method excels in preserving geometric fidelity and visual realism, offering a reliable video generation solution for autonomous driving.

Via

Access Paper or Ask Questions

ImagineMap: Enhanced HD Map Construction with SD Maps

Dec 22, 2024

Yishen Ji, Zhiqi Li, Tong Lu

Abstract:Track Mapless demands models to process multi-view images and Standard-Definition (SD) maps, outputting lane and traffic element perceptions along with their topological relationships. We propose a novel architecture that integrates SD map priors to improve lane line and area detection performance. Inspired by TopoMLP, our model employs a two-stage structure: perception and reasoning. The downstream topology head uses the output from the upstream detection head, meaning accuracy improvements in detection significantly boost downstream performance.

* 4 pages, 1 figures, technical report

Via

Access Paper or Ask Questions

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Jun 11, 2024

Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu(+2 more)

Figure 1 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Figure 2 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Figure 3 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Figure 4 for Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Abstract:We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment influences the planning in an end-to-end manner instead of resorting to non-differentiable post-processing. This method achieves the $1^{st}$ place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. Code will be available at \url{https://github.com/woxihuanjiangguo/Hydra-MDP}

* The 1st place solution of End-to-end Driving at Scale at the CVPR 2024 Autonomous Grand Challenge

Via

Access Paper or Ask Questions