Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junming Zhang

Hyperspherical Embedding for Point Cloud Completion

Jul 11, 2023

Junming Zhang, Haomeng Zhang, Ram Vasudevan, Matthew Johnson-Roberson

Abstract:Most real-world 3D measurements from depth sensors are incomplete, and to address this issue the point cloud completion task aims to predict the complete shapes of objects from partial observations. Previous works often adapt an encoder-decoder architecture, where the encoder is trained to extract embeddings that are used as inputs to generate predictions from the decoder. However, the learned embeddings have sparse distribution in the feature space, which leads to worse generalization results during testing. To address these problems, this paper proposes a hyperspherical module, which transforms and normalizes embeddings from the encoder to be on a unit hypersphere. With the proposed module, the magnitude and direction of the output hyperspherical embedding are decoupled and only the directional information is optimized. We theoretically analyze the hyperspherical embedding and show that it enables more stable training with a wider range of learning rates and more compact embedding distributions. Experiment results show consistent improvement of point cloud completion in both single-task and multi-task learning, which demonstrates the effectiveness of the proposed method.

Via

Access Paper or Ask Questions

Learning Rotation-Invariant Representations of Point Clouds Using Aligned Edge Convolutional Neural Networks

Jan 02, 2021

Junming Zhang, Ming-Yuan Yu, Ram Vasudevan, Matthew Johnson-Roberson

Figure 1 for Learning Rotation-Invariant Representations of Point Clouds Using Aligned Edge Convolutional Neural Networks

Figure 2 for Learning Rotation-Invariant Representations of Point Clouds Using Aligned Edge Convolutional Neural Networks

Figure 3 for Learning Rotation-Invariant Representations of Point Clouds Using Aligned Edge Convolutional Neural Networks

Figure 4 for Learning Rotation-Invariant Representations of Point Clouds Using Aligned Edge Convolutional Neural Networks

Abstract:Point cloud analysis is an area of increasing interest due to the development of 3D sensors that are able to rapidly measure the depth of scenes accurately. Unfortunately, applying deep learning techniques to perform point cloud analysis is non-trivial due to the inability of these methods to generalize to unseen rotations. To address this limitation, one usually has to augment the training data, which can lead to extra computation and require larger model complexity. This paper proposes a new neural network called the Aligned Edge Convolutional Neural Network (AECNN) that learns a feature representation of point clouds relative to Local Reference Frames (LRFs) to ensure invariance to rotation. In particular, features are learned locally and aligned with respect to the LRF of an automatically computed reference point. The proposed approach is evaluated on point cloud classification and part segmentation tasks. This paper illustrates that the proposed technique outperforms a variety of state of the art approaches (even those trained on augmented datasets) in terms of robustness to rotation without requiring any additional data augmentation.

* 3D Vision Conference 2020

Via

Access Paper or Ask Questions

Point Set Voting for Partial Point Cloud Analysis

Jul 09, 2020

Junming Zhang, Weijia Chen, Yuping Wang, Ram Vasudevan, Matthew Johnson-Roberson

Figure 1 for Point Set Voting for Partial Point Cloud Analysis

Figure 2 for Point Set Voting for Partial Point Cloud Analysis

Figure 3 for Point Set Voting for Partial Point Cloud Analysis

Figure 4 for Point Set Voting for Partial Point Cloud Analysis

Abstract:The continual improvement of 3D sensors has driven the development of algorithms to perform point cloud analysis. In fact, techniques for point cloud classification and segmentation have in recent years achieved incredible performance driven in part by leveraging large synthetic datasets. Unfortunately these same state-of-the-art approaches perform poorly when applied to incomplete point clouds. This limitation of existing algorithms is particularly concerning since point clouds generated by 3D sensors in the real world are usually incomplete due to perspective view or occlusion by other objects. This paper proposes a general model for partial point clouds analysis wherein the latent feature encoding a complete point clouds is inferred by applying a local point set voting strategy. In particular, each local point set constructs a vote that corresponds to a distribution in the latent space, and the optimal latent feature is the one with the highest probability. This approach ensures that any subsequent point cloud analysis is robust to partial observation while simultaneously guaranteeing that the proposed model is able to output multiple possible results. This paper illustrates that this proposed method achieves state-of-the-art performance on shape classification, part segmentation and point cloud completion.

Via

Access Paper or Ask Questions

LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery

May 07, 2019

Junming Zhang, Manikandasriram Srinivasan Ramanagopalg, Ram Vasudevan, Matthew Johnson-Roberson

Figure 1 for LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery

Figure 2 for LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery

Figure 3 for LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery

Figure 4 for LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery

Abstract:An accurate depth map of the environment is critical to the safe operation of autonomous robots and vehicles. Currently, either light detection and ranging (LIDAR) or stereo matching algorithms are used to acquire such depth information. However, a high-resolution LIDAR is expensive and produces sparse depth map at large range; stereo matching algorithms are able to generate denser depth maps but are typically less accurate than LIDAR at long range. This paper combines these approaches together to generate high-quality dense depth maps. Unlike previous approaches that are trained using ground-truth labels, the proposed model adopts a self-supervised training process. Experiments show that the proposed method is able to generate high-quality dense depth maps and performs robustly even with low-resolution inputs. This shows the potential to reduce the cost by using LIDARs with lower resolution in concert with stereo systems while maintaining high resolution.

* 14 pages, 3 figures, 5 tables

Via

Access Paper or Ask Questions

DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery

Sep 13, 2018

Junming Zhang, Katherine A. Skinner, Ram Vasudevan, Matthew Johnson-Roberson

Figure 1 for DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery

Figure 2 for DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery

Figure 3 for DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery

Figure 4 for DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation from Stereo Imagery

Abstract:Recent work has shown that convolutional neural networks (CNNs) can be applied successfully in disparity estimation, but these methods still suffer from errors in regions of low-texture, occlusions and reflections. Concurrently, deep learning for semantic segmentation has shown great progress in recent years. In this paper, we design a CNN architecture that combines these two tasks to improve the quality and accuracy of disparity estimation with the help of semantic segmentation. Specifically, we propose a network structure in which these two tasks are highly coupled. One key novelty of this approach is the two-stage refinement process. Initial disparity estimates are refined with an embedding learned from the semantic segmentation branch of the network. The proposed model is trained using an unsupervised approach, in which images from one half of the stereo pair are warped and compared against images from the other camera. Another key advantage of the proposed approach is that a single network is capable of outputting disparity estimates and semantic labels. These outputs are of great use in autonomous vehicle operation; with real-time constraints being key, such performance improvements increase the viability of driving applications. Experiments on KITTI and Cityscapes datasets show that our model can achieve state-of-the-art results and that leveraging embedding learned from semantic segmentation improves the performance of disparity estimation.

* 8 pages, 4 figures, 4 tables

Via

Access Paper or Ask Questions