Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuo Gu

UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Jun 04, 2024

Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan(+3 more)

Figure 1 for UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Figure 2 for UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Figure 3 for UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Figure 4 for UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Abstract:3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises owing to various factors during motion observation by cameras, especially occlusions and the small size of target objects, resulting in an inaccurate estimation of the object's position, label, and identity. To this end, we propose an Uncertainty-Aware 3D MOT framework, UA-Track, which tackles the uncertainty problem from multiple aspects. Specifically, we first introduce an Uncertainty-aware Probabilistic Decoder to capture the uncertainty in object prediction with probabilistic attention. Secondly, we propose an Uncertainty-guided Query Denoising strategy to further enhance the training process. We also utilize Uncertainty-reduced Query Initialization, which leverages predicted 2D object location and depth information to reduce query uncertainty. As a result, our UA-Track achieves state-of-the-art performance on the nuScenes benchmark, i.e., 66.3% AMOTA on the test split, surpassing the previous best end-to-end solution by a significant margin of 8.9% AMOTA.

Via

Access Paper or Ask Questions

Implicit Obstacle Map-driven Indoor Navigation Model for Robust Obstacle Avoidance

Aug 24, 2023

Wei Xie, Haobo Jiang, Shuo Gu, Jin Xie

Abstract:Robust obstacle avoidance is one of the critical steps for successful goal-driven indoor navigation tasks.Due to the obstacle missing in the visual image and the possible missed detection issue, visual image-based obstacle avoidance techniques still suffer from unsatisfactory robustness. To mitigate it, in this paper, we propose a novel implicit obstacle map-driven indoor navigation framework for robust obstacle avoidance, where an implicit obstacle map is learned based on the historical trial-and-error experience rather than the visual image. In order to further improve the navigation efficiency, a non-local target memory aggregation module is designed to leverage a non-local network to model the intrinsic relationship between the target semantic and the target orientation clues during the navigation process so as to mine the most target-correlated object clues for the navigation decision. Extensive experimental results on AI2-Thor and RoboTHOR benchmarks verify the excellent obstacle avoidance and navigation efficiency of our proposed method. The core source code is available at https://github.com/xwaiyy123/object-navigation.

* 9 pages, 7 figures, 43 references. This paper has been accepted for ACM MM 2023

Via

Access Paper or Ask Questions

Semantics-Guided Moving Object Segmentation with 3D LiDAR

May 06, 2022

Shuo Gu, Suling Yao, Jian Yang, Hui Kong

Figure 1 for Semantics-Guided Moving Object Segmentation with 3D LiDAR

Figure 2 for Semantics-Guided Moving Object Segmentation with 3D LiDAR

Figure 3 for Semantics-Guided Moving Object Segmentation with 3D LiDAR

Figure 4 for Semantics-Guided Moving Object Segmentation with 3D LiDAR

Abstract:Moving object segmentation (MOS) is a task to distinguish moving objects, e.g., moving vehicles and pedestrians, from the surrounding static environment. The segmentation accuracy of MOS can have an influence on odometry, map construction, and planning tasks. In this paper, we propose a semantics-guided convolutional neural network for moving object segmentation. The network takes sequential LiDAR range images as inputs. Instead of segmenting the moving objects directly, the network conducts single-scan-based semantic segmentation and multiple-scan-based moving object segmentation in turn. The semantic segmentation module provides semantic priors for the MOS module, where we propose an adjacent scan association (ASA) module to convert the semantic features of adjacent scans into the same coordinate system to fully exploit the cross-scan semantic features. Finally, by analyzing the difference between the transformed features, reliable MOS result can be obtained quickly. Experimental results on the SemanticKITTI MOS dataset proves the effectiveness of our work.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Transformationally Identical and Invariant Convolutional Neural Networks through Symmetric Element Operators

Jul 10, 2018

Shih Chung B. Lo, Matthew T. Freedman, Seong K. Mun, Shuo Gu

Figure 1 for Transformationally Identical and Invariant Convolutional Neural Networks through Symmetric Element Operators

Figure 2 for Transformationally Identical and Invariant Convolutional Neural Networks through Symmetric Element Operators

Abstract:Mathematically speaking, a transformationally invariant operator, such as a transformationally identical (TI) matrix kernel (i.e., K= T{K}), commutes with the transformation (T{.}) itself when they operate on the first operand matrix. We found that by consistently applying the same type of TI kernels in a convolutional neural networks (CNN) system, the commutative property holds throughout all layers of convolution processes with and without involving an activation function and/or a 1D convolution across channels within a layer. We further found that any CNN possessing the same TI kernel property for all convolution layers followed by a flatten layer with weight sharing among their transformation corresponding elements would output the same result for all transformation versions of the original input vector. In short, CNN[ Vi ] = CNN[ T{Vi} ] providing every K = T{K} in CNN, where Vi denotes input vector and CNN[.] represents the whole CNN process as a function of input vector that produces an output vector. With such a transformationally identical CNN (TI-CNN) system, each transformation, that is not associated with a predefined TI used in data augmentation, would inherently include all of its corresponding transformation versions of the input vector for the training. Hence the use of same TI property for every kernel in the CNN would serve as an orientation or a translation independent training guide in conjunction with the error-backpropagation during the training. This TI kernel property is desirable for applications requiring a highly consistent output result from corresponding transformation versions of an input. Several C programming routines are provided to facilitate interested parties of using the TI-CNN technique which is expected to produce a better generalization performance than its ordinary CNN counterpart.

* 20 pages, 2 figures

Via

Access Paper or Ask Questions