Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lingpeng Wang

3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Jul 26, 2022

Le Hui, Lingpeng Wang, Linghua Tang, Kaihao Lan, Jin Xie, Jian Yang

Figure 1 for 3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Figure 2 for 3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Figure 3 for 3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Figure 4 for 3D Siamese Transformer Network for Single Object Tracking on Point Clouds

Abstract:Siamese network based trackers formulate 3D single object tracking as cross-correlation learning between point features of a template and a search area. Due to the large appearance variation between the template and search area during tracking, how to learn the robust cross correlation between them for identifying the potential target in the search area is still a challenging problem. In this paper, we explicitly use Transformer to form a 3D Siamese Transformer network for learning robust cross correlation between the template and the search area of point clouds. Specifically, we develop a Siamese point Transformer network to learn shape context information of the target. Its encoder uses self-attention to capture non-local information of point clouds to characterize the shape information of the object, and the decoder utilizes cross-attention to upsample discriminative point features. After that, we develop an iterative coarse-to-fine correlation network to learn the robust cross correlation between the template and the search area. It formulates the cross-feature augmentation to associate the template with the potential target in the search area via cross attention. To further enhance the potential target, it employs the ego-feature augmentation that applies self-attention to the local k-NN graph of the feature space to aggregate target features. Experiments on the KITTI, nuScenes, and Waymo datasets show that our method achieves state-of-the-art performance on the 3D single object tracking task.

* Accepted to ECCV'22

Via

Access Paper or Ask Questions

3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds

Nov 17, 2021

Le Hui, Lingpeng Wang, Mingmei Cheng, Jin Xie, Jian Yang

Figure 1 for 3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds

Figure 2 for 3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds

Figure 3 for 3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds

Figure 4 for 3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds

Abstract:3D object tracking in point clouds is still a challenging problem due to the sparsity of LiDAR points in dynamic environments. In this work, we propose a Siamese voxel-to-BEV tracker, which can significantly improve the tracking performance in sparse 3D point clouds. Specifically, it consists of a Siamese shape-aware feature learning network and a voxel-to-BEV target localization network. The Siamese shape-aware feature learning network can capture 3D shape information of the object to learn the discriminative features of the object so that the potential target from the background in sparse point clouds can be identified. To this end, we first perform template feature embedding to embed the template's feature into the potential target and then generate a dense 3D shape to characterize the shape information of the potential target. For localizing the tracked target, the voxel-to-BEV target localization network regresses the target's 2D center and the $z$-axis center from the dense bird's eye view (BEV) feature map in an anchor-free manner. Concretely, we compress the voxelized point cloud along $z$-axis through max pooling to obtain a dense BEV feature map, where the regression of the 2D center and the $z$-axis center can be performed more effectively. Extensive evaluation on the KITTI and nuScenes datasets shows that our method significantly outperforms the current state-of-the-art methods by a large margin.

* Accepted by NeurIPS 2021

Via

Access Paper or Ask Questions