Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peiliang Li

SIMPL: A Simple and Efficient Multi-agent Motion Prediction Baseline for Autonomous Driving

Feb 04, 2024

Lu Zhang, Peiliang Li, Sikang Liu, Shaojie Shen

Abstract:This paper presents a Simple and effIcient Motion Prediction baseLine (SIMPL) for autonomous vehicles. Unlike conventional agent-centric methods with high accuracy but repetitive computations and scene-centric methods with compromised accuracy and generalizability, SIMPL delivers real-time, accurate motion predictions for all relevant traffic participants. To achieve improvements in both accuracy and inference speed, we propose a compact and efficient global feature fusion module that performs directed message passing in a symmetric manner, enabling the network to forecast future motion for all road users in a single feed-forward pass and mitigating accuracy loss caused by viewpoint shifting. Additionally, we investigate the continuous trajectory parameterization using Bernstein basis polynomials in trajectory decoding, allowing evaluations of states and their higher-order derivatives at any desired time point, which is valuable for downstream planning tasks. As a strong baseline, SIMPL exhibits highly competitive performance on Argoverse 1 & 2 motion forecasting benchmarks compared with other state-of-the-art methods. Furthermore, its lightweight design and low inference latency make SIMPL highly extensible and promising for real-world onboard deployment. We open-source the code at https://github.com/HKUST-Aerial-Robotics/SIMPL.

* Code is available at https://github.com/HKUST-Aerial-Robotics/SIMPL

Via

Access Paper or Ask Questions

Are All Point Clouds Suitable for Completion? Weakly Supervised Quality Evaluation Network for Point Cloud Completion

Mar 03, 2023

Jieqi Shi, Peiliang Li, Xiaozhi Chen, Shaojie Shen

Abstract:In the practical application of point cloud completion tasks, real data quality is usually much worse than the CAD datasets used for training. A small amount of noisy data will usually significantly impact the overall system's accuracy. In this paper, we propose a quality evaluation network to score the point clouds and help judge the quality of the point cloud before applying the completion model. We believe our scoring method can help researchers select more appropriate point clouds for subsequent completion and reconstruction and avoid manual parameter adjustment. Moreover, our evaluation model is fast and straightforward and can be directly inserted into any model's training or use process to facilitate the automatic selection and post-processing of point clouds. We propose a complete dataset construction and model evaluation method based on ShapeNet. We verify our network using detection and flow estimation tasks on KITTI, a real-world dataset for autonomous driving. The experimental results show that our model can effectively distinguish the quality of point clouds and help in practical tasks.

* ICRA 2023

Via

Access Paper or Ask Questions

You Only Label Once: 3D Box Adaptation from Point Cloud to Image via Semi-Supervised Learning

Nov 17, 2022

Jieqi Shi, Peiliang Li, Xiaozhi Chen, Shaojie Shen

Abstract:The image-based 3D object detection task expects that the predicted 3D bounding box has a ``tightness'' projection (also referred to as cuboid), which fits the object contour well on the image while still keeping the geometric attribute on the 3D space, e.g., physical dimension, pairwise orthogonal, etc. These requirements bring significant challenges to the annotation. Simply projecting the Lidar-labeled 3D boxes to the image leads to non-trivial misalignment, while directly drawing a cuboid on the image cannot access the original 3D information. In this work, we propose a learning-based 3D box adaptation approach that automatically adjusts minimum parameters of the 360$^{\circ}$ Lidar 3D bounding box to perfectly fit the image appearance of panoramic cameras. With only a few 2D boxes annotation as guidance during the training phase, our network can produce accurate image-level cuboid annotations with 3D properties from Lidar boxes. We call our method ``you only label once'', which means labeling on the point cloud once and automatically adapting to all surrounding cameras. As far as we know, we are the first to focus on image-level cuboid refinement, which balances the accuracy and efficiency well and dramatically reduces the labeling effort for accurate cuboid annotation. Extensive experiments on the public Waymo and NuScenes datasets show that our method can produce human-level cuboid annotation on the image without needing manual adjustment.

Via

Access Paper or Ask Questions

MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

Mar 16, 2022

Qing Lian, Peiliang Li, Xiaozhi Chen

Figure 1 for MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

Figure 2 for MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

Figure 3 for MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

Figure 4 for MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection

Abstract:Due to the inherent ill-posed nature of 2D-3D projection, monocular 3D object detection lacks accurate depth recovery ability. Although the deep neural network (DNN) enables monocular depth-sensing from high-level learned features, the pixel-level cues are usually omitted due to the deep convolution mechanism. To benefit from both the powerful feature representation in DNN and pixel-level geometric constraints, we reformulate the monocular object depth estimation as a progressive refinement problem and propose a joint semantic and geometric cost volume to model the depth error. Specifically, we first leverage neural networks to learn the object position, dimension, and dense normalized 3D object coordinates. Based on the object depth, the dense coordinates patch together with the corresponding object features is reprojected to the image space to build a cost volume in a joint semantic and geometric error manner. The final depth is obtained by feeding the cost volume to a refinement network, where the distribution of semantic and geometric error is regularized by direct depth supervision. Through effectively mitigating depth error by the refinement framework, we achieve state-of-the-art results on both the KITTI and Waymo datasets.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

Temporal Point Cloud Completion with Pose Disturbance

Feb 07, 2022

Jieqi Shi, Lingyun Xu, Peiliang Li, Xiaozhi Chen, Shaojie Shen

Figure 1 for Temporal Point Cloud Completion with Pose Disturbance

Figure 2 for Temporal Point Cloud Completion with Pose Disturbance

Figure 3 for Temporal Point Cloud Completion with Pose Disturbance

Figure 4 for Temporal Point Cloud Completion with Pose Disturbance

Abstract:Point clouds collected by real-world sensors are always unaligned and sparse, which makes it hard to reconstruct the complete shape of object from a single frame of data. In this work, we manage to provide complete point clouds from sparse input with pose disturbance by limited translation and rotation. We also use temporal information to enhance the completion model, refining the output with a sequence of inputs. With the help of gated recovery units(GRU) and attention mechanisms as temporal units, we propose a point cloud completion framework that accepts a sequence of unaligned and sparse inputs, and outputs consistent and aligned point clouds. Our network performs in an online manner and presents a refined point cloud for each frame, which enables it to be integrated into any SLAM or reconstruction pipeline. As far as we know, our framework is the first to utilize temporal information and ensure temporal consistency with limited transformation. Through experiments in ShapeNet and KITTI, we prove that our framework is effective in both synthetic and real-world datasets.

* 8 pages; Accepted by RAL with ICRA 2022

Via

Access Paper or Ask Questions

Trajectory Prediction with Graph-based Dual-scale Context Fusion

Nov 02, 2021

Lu Zhang, Peiliang Li, Jing Chen, Shaojie Shen

Figure 1 for Trajectory Prediction with Graph-based Dual-scale Context Fusion

Figure 2 for Trajectory Prediction with Graph-based Dual-scale Context Fusion

Figure 3 for Trajectory Prediction with Graph-based Dual-scale Context Fusion

Figure 4 for Trajectory Prediction with Graph-based Dual-scale Context Fusion

Abstract:Motion prediction for traffic participants is essential for a safe and robust automated driving system, especially in cluttered urban environments. However, it is highly challenging due to the complex road topology as well as the uncertain intentions of the other agents. In this paper, we present a graph-based trajectory prediction network named the Dual Scale Predictor (DSP), which encodes both the static and dynamical driving context in a hierarchical manner. Different from methods based on a rasterized map or sparse lane graph, we consider the driving context as a graph with two layers, focusing on both geometrical and topological features. Graph neural networks (GNNs) are applied to extract features with different levels of granularity, and features are subsequently aggregated with attention-based inter-layer networks, realizing better local-global feature fusion. Following the recent goal-driven trajectory prediction pipeline, goal candidates with high likelihood for the target agent are extracted, and predicted trajectories are generated conditioned on these goals. Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories. We evaluate the proposed method on the large-scale Argoverse motion forecasting benchmark, and it achieves promising results, outperforming the recent state-of-the-art methods.

Via

Access Paper or Ask Questions

Tracking from Patterns: Learning Corresponding Patterns in Point Clouds for 3D Object Tracking

Oct 20, 2020

Jieqi Shi, Peiliang Li, Shaojie Shen

Figure 1 for Tracking from Patterns: Learning Corresponding Patterns in Point Clouds for 3D Object Tracking

Figure 2 for Tracking from Patterns: Learning Corresponding Patterns in Point Clouds for 3D Object Tracking

Figure 3 for Tracking from Patterns: Learning Corresponding Patterns in Point Clouds for 3D Object Tracking

Abstract:A robust 3D object tracker which continuously tracks surrounding objects and estimates their trajectories is key for self-driving vehicles. Most existing tracking methods employ a tracking-by-detection strategy, which usually requires complex pair-wise similarity computation and neglects the nature of continuous object motion. In this paper, we propose to directly learn 3D object correspondences from temporal point cloud data and infer the motion information from correspondence patterns. We modify the standard 3D object detector to process two lidar frames at the same time and predict bounding box pairs for the association and motion estimation tasks. We also equip our pipeline with a simple yet effective velocity smoothing module to estimate consistent object motion. Benifiting from the learned correspondences and motion refinement, our method exceeds the existing 3D tracking methods on both the KITTI and larger scale Nuscenes dataset.

* ECCV2020 Workshop on Perception for Autonomous Driving(PAD2020)
* 4 pages, ECCV2020 Workshop on Perception for Autonomous Driving(PAD2020)

Via

Access Paper or Ask Questions

Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

Apr 20, 2020

Peiliang Li, Jieqi Shi, Shaojie Shen

Figure 1 for Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

Figure 2 for Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

Figure 3 for Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

Figure 4 for Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking

Abstract:Directly learning multiple 3D objects motion from sequential images is difficult, while the geometric bundle adjustment lacks the ability to localize the invisible object centroid. To benefit from both the powerful object understanding skill from deep neural network meanwhile tackle precise geometry modeling for consistent trajectory estimation, we propose a joint spatial-temporal optimization-based stereo 3D object tracking method. From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box. Dense object cues (local depth and local coordinates) that associating to the object centroid are then predicted using a region-based network. Considering both the instant localization accuracy and motion consistency, our optimization models the relations between the object centroid and observed cues into a joint spatial-temporal error function. All historic cues will be summarized to contribute to the current estimation by a per-frame marginalization strategy without repeated computation. Quantitative evaluation on the KITTI tracking dataset shows our approach outperforms previous image-based 3D tracking methods by significant margins. We also report extensive results on multiple categories and larger datasets (KITTI raw and Argoverse Tracking) for future benchmarking.

* cvpr2020

Via

Access Paper or Ask Questions

Multi-Sensor 3D Object Box Refinement for Autonomous Driving

Sep 11, 2019

Peiliang Li, Siqi Liu, Shaojie Shen

Figure 1 for Multi-Sensor 3D Object Box Refinement for Autonomous Driving

Figure 2 for Multi-Sensor 3D Object Box Refinement for Autonomous Driving

Figure 3 for Multi-Sensor 3D Object Box Refinement for Autonomous Driving

Figure 4 for Multi-Sensor 3D Object Box Refinement for Autonomous Driving

Abstract:We propose a 3D object detection system with multi-sensor refinement in the context of autonomous driving. In our framework, the monocular camera serves as the fundamental sensor for 2D object proposal and initial 3D bounding box prediction. While the stereo cameras and LiDAR are treated as adaptive plug-in sensors to refine the 3D box localization performance. For each observed element in the raw measurement domain (e.g., pixels for stereo, 3D points for LiDAR), we model the local geometry as an instance vector representation, which indicates the 3D coordinate of each element respecting to the object frame. Using this unified geometric representation, the 3D object location can be unified refined by the stereo photometric alignment or point cloud alignment. We demonstrate superior 3D detection and localization performance compared to state-of-the-art monocular, stereo methods and competitive performance compared with the baseline LiDAR method on the KITTI object benchmark.

Via

Access Paper or Ask Questions

Stereo R-CNN based 3D Object Detection for Autonomous Driving

Apr 10, 2019

Peiliang Li, Xiaozhi Chen, Shaojie Shen

Figure 1 for Stereo R-CNN based 3D Object Detection for Autonomous Driving

Figure 2 for Stereo R-CNN based 3D Object Detection for Autonomous Driving

Figure 3 for Stereo R-CNN based 3D Object Detection for Autonomous Driving

Figure 4 for Stereo R-CNN based 3D Object Detection for Autonomous Driving

Abstract:We propose a 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery. Our method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to simultaneously detect and associate object in left and right images. We add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints, viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate a coarse 3D object bounding box. We then recover the accurate 3D bounding box by a region-based photometric alignment using left and right RoIs. Our method does not require depth input and 3D position supervision, however, outperforms all existing fully supervised image-based methods. Experiments on the challenging KITTI dataset show that our method outperforms the state-of-the-art stereo-based method by around 30% AP on both 3D detection and 3D localization tasks. Code has been released at https://github.com/HKUST-Aerial-Robotics/Stereo-RCNN.

* Accepted by cvpr2019

Via

Access Paper or Ask Questions