Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhihui Hao

GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

Dec 13, 2024

Wenzhao Zheng, Junjie Wu, Yao Zheng, Sicheng Zuo, Zixun Xie, Longchao Yang, Yong Pan, Zhihui Hao, Peng Jia, Xianpeng Lang(+1 more)

Figure 1 for GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

Figure 2 for GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

Figure 3 for GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

Figure 4 for GaussianAD: Gaussian-Centric End-to-End Autonomous Driving

Abstract:Vision-based autonomous driving shows great potential due to its satisfactory performance and low costs. Most existing methods adopt dense representations (e.g., bird's eye view) or sparse representations (e.g., instance boxes) for decision-making, which suffer from the trade-off between comprehensiveness and efficiency. This paper explores a Gaussian-centric end-to-end autonomous driving (GaussianAD) framework and exploits 3D semantic Gaussians to extensively yet sparsely describe the scene. We initialize the scene with uniform 3D Gaussians and use surrounding-view images to progressively refine them to obtain the 3D Gaussian scene representation. We then use sparse convolutions to efficiently perform 3D perception (e.g., 3D detection, semantic map construction). We predict 3D flows for the Gaussians with dynamic semantics and plan the ego trajectory accordingly with an objective of future scene forecasting. Our GaussianAD can be trained in an end-to-end manner with optional perception labels when available. Extensive experiments on the widely used nuScenes dataset verify the effectiveness of our end-to-end GaussianAD on various tasks including motion planning, 3D occupancy prediction, and 4D occupancy forecasting. Code: https://github.com/wzzheng/GaussianAD.

* Code is available at: https://github.com/wzzheng/GaussianAD

Via

Access Paper or Ask Questions

UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Jun 04, 2024

Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan(+3 more)

Figure 1 for UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Figure 2 for UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Figure 3 for UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Figure 4 for UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

Abstract:3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises owing to various factors during motion observation by cameras, especially occlusions and the small size of target objects, resulting in an inaccurate estimation of the object's position, label, and identity. To this end, we propose an Uncertainty-Aware 3D MOT framework, UA-Track, which tackles the uncertainty problem from multiple aspects. Specifically, we first introduce an Uncertainty-aware Probabilistic Decoder to capture the uncertainty in object prediction with probabilistic attention. Secondly, we propose an Uncertainty-guided Query Denoising strategy to further enhance the training process. We also utilize Uncertainty-reduced Query Initialization, which leverages predicted 2D object location and depth information to reduce query uncertainty. As a result, our UA-Track achieves state-of-the-art performance on the nuScenes benchmark, i.e., 66.3% AMOTA on the test split, surpassing the previous best end-to-end solution by a significant margin of 8.9% AMOTA.

Via

Access Paper or Ask Questions

Deep Scene Text Detection with Connected Component Proposals

Aug 17, 2017

Fan Jiang, Zhihui Hao, Xinran Liu

Figure 1 for Deep Scene Text Detection with Connected Component Proposals

Figure 2 for Deep Scene Text Detection with Connected Component Proposals

Figure 3 for Deep Scene Text Detection with Connected Component Proposals

Figure 4 for Deep Scene Text Detection with Connected Component Proposals

Abstract:A growing demand for natural-scene text detection has been witnessed by the computer vision community since text information plays a significant role in scene understanding and image indexing. Deep neural networks are being used due to their strong capabilities of pixel-wise classification or word localization, similar to being used in common vision problems. In this paper, we present a novel two-task network with integrating bottom and top cues. The first task aims to predict a pixel-by-pixel labeling and based on which, word proposals are generated with a canonical connected component analysis. The second task aims to output a bundle of character candidates used later to verify the word proposals. The two sub-networks share base convolutional features and moreover, we present a new loss to strengthen the interaction between them. We evaluate the proposed network on public benchmark datasets and show it can detect arbitrary-orientation scene text with a finer output boundary. In ICDAR 2013 text localization task, we achieve the state-of-the-art performance with an F-score of 0.919 and a much better recall of 0.915.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Totally Corrective Multiclass Boosting with Binary Weak Learners

Sep 20, 2010

Zhihui Hao, Chunhua Shen, Nick Barnes, Bo Wang

Figure 1 for Totally Corrective Multiclass Boosting with Binary Weak Learners

Figure 2 for Totally Corrective Multiclass Boosting with Binary Weak Learners

Figure 3 for Totally Corrective Multiclass Boosting with Binary Weak Learners

Figure 4 for Totally Corrective Multiclass Boosting with Binary Weak Learners

Abstract:In this work, we propose a new optimization framework for multiclass boosting learning. In the literature, AdaBoost.MO and AdaBoost.ECC are the two successful multiclass boosting algorithms, which can use binary weak learners. We explicitly derive these two algorithms' Lagrange dual problems based on their regularized loss functions. We show that the Lagrange dual formulations enable us to design totally-corrective multiclass algorithms by using the primal-dual optimization technique. Experiments on benchmark data sets suggest that our multiclass boosting can achieve a comparable generalization capability with state-of-the-art, but the convergence speed is much faster than stage-wise gradient descent boosting. In other words, the new totally corrective algorithms can maximize the margin more aggressively.

* 11 pages

Via

Access Paper or Ask Questions