Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Brian Cheong

ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting

Aug 09, 2025

Sandro Papais, Letian Wang, Brian Cheong, Steven L. Waslander

Figure 1 for ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting

Figure 2 for ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting

Figure 3 for ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting

Figure 4 for ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting

Abstract:We introduce ForeSight, a novel joint detection and forecasting framework for vision-based 3D perception in autonomous vehicles. Traditional approaches treat detection and forecasting as separate sequential tasks, limiting their ability to leverage temporal cues. ForeSight addresses this limitation with a multi-task streaming and bidirectional learning approach, allowing detection and forecasting to share query memory and propagate information seamlessly. The forecast-aware detection transformer enhances spatial reasoning by integrating trajectory predictions from a multiple hypothesis forecast memory queue, while the streaming forecast transformer improves temporal consistency using past forecasts and refined detections. Unlike tracking-based methods, ForeSight eliminates the need for explicit object association, reducing error propagation with a tracking-free model that efficiently scales across multi-frame sequences. Experiments on the nuScenes dataset show that ForeSight achieves state-of-the-art performance, achieving an EPA of 54.9%, surpassing previous methods by 9.3%, while also attaining the best mAP and minADE among multi-view detection and forecasting models.

* Accepted to ICCV 2025

Via

Access Paper or Ask Questions

JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Jul 06, 2024

Brian Cheong, Jiachen Zhou, Steven Waslander

Figure 1 for JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Figure 2 for JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Figure 3 for JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Figure 4 for JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention

Abstract:Tracking-by-detection (TBD) methods achieve state-of-the-art performance on 3D tracking benchmarks for autonomous driving. On the other hand, tracking-by-attention (TBA) methods have the potential to outperform TBD methods, particularly for long occlusions and challenging detection settings. This work investigates why TBA methods continue to lag in performance behind TBD methods using a LiDAR-based joint detector and tracker called JDT3D. Based on this analysis, we propose two generalizable methods to bridge the gap between TBD and TBA methods: track sampling augmentation and confidence-based query propagation. JDT3D is trained and evaluated on the nuScenes dataset, achieving 0.574 on the AMOTA metric on the nuScenes test set, outperforming all existing LiDAR-based TBA approaches by over 6%. Based on our results, we further discuss some potential challenges with the existing TBA model formulation to explain the continued gap in performance with TBD methods. The implementation of JDT3D can be found at the following link: https://github.com/TRAILab/JDT3D.

Via

Access Paper or Ask Questions