Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ning-Hsu Wang

Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation

Jun 18, 2024

Ning-Hsu Wang, Yu-Lun Liu

Abstract:Accurately estimating depth in 360-degree imagery is crucial for virtual reality, autonomous navigation, and immersive media applications. Existing depth estimation methods designed for perspective-view imagery fail when applied to 360-degree images due to different camera projections and distortions, whereas 360-degree methods perform inferior due to the lack of labeled data pairs. We propose a new depth estimation framework that utilizes unlabeled 360-degree data effectively. Our approach uses state-of-the-art perspective depth estimation models as teacher models to generate pseudo labels through a six-face cube projection technique, enabling efficient labeling of depth in 360-degree images. This method leverages the increasing availability of large datasets. Our approach includes two main stages: offline mask generation for invalid regions and an online semi-supervised joint training regime. We tested our approach on benchmark datasets such as Matterport3D and Stanford2D3D, showing significant improvements in depth estimation accuracy, particularly in zero-shot scenarios. Our proposed training pipeline can enhance any 360 monocular depth estimator and demonstrates effective knowledge transfer across different camera projections and data types. See our project page for results: https://albert100121.github.io/Depth-Anywhere/

* Project page: https://albert100121.github.io/Depth-Anywhere/

Via

Access Paper or Ask Questions

Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision

Aug 24, 2021

Ning-Hsu Wang, Ren Wang, Yu-Lun Liu, Yu-Hao Huang, Yu-Lin Chang, Chia-Ping Chen, Kevin Jou

Figure 1 for Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision

Figure 2 for Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision

Figure 3 for Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision

Figure 4 for Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision

Abstract:Depth estimation is a long-lasting yet important task in computer vision. Most of the previous works try to estimate depth from input images and assume images are all-in-focus (AiF), which is less common in real-world applications. On the other hand, a few works take defocus blur into account and consider it as another cue for depth estimation. In this paper, we propose a method to estimate not only a depth map but an AiF image from a set of images with different focus positions (known as a focal stack). We design a shared architecture to exploit the relationship between depth and AiF estimation. As a result, the proposed method can be trained either supervisedly with ground truth depth, or \emph{unsupervisedly} with AiF images as supervisory signals. We show in various experiments that our method outperforms the state-of-the-art methods both quantitatively and qualitatively, and also has higher efficiency in inference time.

* ICCV 2021. Project page: https://albert100121.github.io/AiFDepthNet/ Code: https://github.com/albert100121/AiFDepthNet

Via

Access Paper or Ask Questions

Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Jun 27, 2021

Cheng Sun, Chi-Wei Hsiao, Ning-Hsu Wang, Min Sun, Hwann-Tzong Chen

Figure 1 for Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Figure 2 for Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Figure 3 for Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Figure 4 for Indoor Panorama Planar 3D Reconstruction via Divide and Conquer

Abstract:Indoor panorama typically consists of human-made structures parallel or perpendicular to gravity. We leverage this phenomenon to approximate the scene in a 360-degree image with (H)orizontal-planes and (V)ertical-planes. To this end, we propose an effective divide-and-conquer strategy that divides pixels based on their plane orientation estimation; then, the succeeding instance segmentation module conquers the task of planes clustering more easily in each plane orientation group. Besides, parameters of V-planes depend on camera yaw rotation, but translation-invariant CNNs are less aware of the yaw change. We thus propose a yaw-invariant V-planar reparameterization for CNNs to learn. We create a benchmark for indoor panorama planar reconstruction by extending existing 360 depth datasets with ground truth H\&V-planes (referred to as PanoH&V dataset) and adopt state-of-the-art planar reconstruction methods to predict H\&V-planes as our baselines. Our method outperforms the baselines by a large margin on the proposed dataset.

Via

Access Paper or Ask Questions

360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume

Nov 11, 2019

Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun

Figure 1 for 360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume

Figure 2 for 360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume

Figure 3 for 360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume

Figure 4 for 360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume

Abstract:Recently, end-to-end trainable deep neural networks have significantly improved stereo depth estimation for perspective images. However, 360{\deg} images captured under equirectangular projection cannot benefit from directly adopting existing methods due to distortion introduced (i.e., lines in 3D are not projected onto lines in 2D). To tackle this issue, we present a novel architecture specifically designed for spherical disparity using the setting of top-bottom 360{\deg} camera pairs. Moreover, we propose to mitigate the distortion issue by (1) an additional input branch capturing the position and relation of each pixel in the spherical coordinate, and (2) a cost volume built upon a learnable shifting filter. Due to the lack of 360{\deg} stereo data, we collect two 360{\deg} stereo datasets from Matterport3D and Stanford3D for training and evaluation. Extensive experiments and ablation study are provided to validate our method against existing algorithms. Finally, we show promising results on real-world environments capturing images with two consumer-level cameras.

* Project page and code are at https://albert100121.github.io/360SD-Net-Project-Page

Via

Access Paper or Ask Questions