Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ningkai Mo

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Aug 21, 2024

Wanshui Gan, Fang Liu, Hongbin Xu, Ningkai Mo, Naoto Yokoya

Abstract:We introduce GaussianOcc, a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D poses from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps, semantic maps), which is both time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering).

* Project page: https://ganwanshui.github.io/GaussianOcc/

Via

Access Paper or Ask Questions

A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving

Apr 04, 2023

Wanshui Gan, Ningkai Mo, Hongbin Xu, Naoto Yokoya

Abstract:The task of estimating 3D occupancy from surrounding view images is an exciting development in the field of autonomous driving, following the success of Birds Eye View (BEV) perception.This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. However, there is still a lack of a baseline to define the task, such as network design, optimization, and evaluation. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets.The relevant code will be available in https://github.com/GANWANSHUI/SimpleOccupancy

Via

Access Paper or Ask Questions

ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework

Apr 03, 2022

Ningkai Mo, Wanshui Gan, Naoto Yokoya, Shifeng Chen

Figure 1 for ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework

Figure 2 for ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework

Figure 3 for ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework

Figure 4 for ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework

Abstract:In this paper, a computation efficient regression framework is presented for estimating the 6D pose of rigid objects from a single RGB-D image, which is applicable to handling symmetric objects. This framework is designed in a simple architecture that efficiently extracts point-wise features from RGB-D data using a fully convolutional network, called XYZNet, and directly regresses the 6D pose without any post refinement. In the case of symmetric object, one object has multiple ground-truth poses, and this one-to-many relationship may lead to estimation ambiguity. In order to solve this ambiguity problem, we design a symmetry-invariant pose distance metric, called average (maximum) grouped primitives distance or A(M)GPD. The proposed A(M)GPD loss can make the regression network converge to the correct state, i.e., all minima in the A(M)GPD loss surface are mapped to the correct poses. Extensive experiments on YCB-Video and T-LESS datasets demonstrate the proposed framework's substantially superior performance in top accuracy and low computational cost.

* Accepted by CVPR 2022

Via

Access Paper or Ask Questions