Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sing Bing Kang

SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

Jun 27, 2024

John Lambert, Yuguang Li, Ivaylo Boyadzhiev, Lambert Wixson, Manjunath Narayana, Will Hutchcroft, James Hays, Frank Dellaert, Sing Bing Kang

Abstract:We propose a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, our novel pairwise learned alignment verifier. The inputs to our system are sparsely located 360$^\circ$ panoramas, whose semantic features (windows, doors, and openings) are inferred and used to hypothesize pairwise room adjacency or overlap. SALVe initializes a pose graph, which is subsequently optimized using GTSAM. Once the room poses are computed, room layouts are inferred using HorizonNet, and the floorplan is constructed by stitching the most confident layout boundaries. We validate our system qualitatively and quantitatively as well as through ablation studies, showing that it outperforms state-of-the-art SfM systems in completeness by over 200%, without sacrificing accuracy. Our results point to the significance of our work: poses of 81% of panoramas are localized in the first 2 connected components (CCs), and 89% in the first 3 CCs. Code and models are publicly available at https://github.com/zillow/salve.

* Accepted at ECCV 2022

Via

Access Paper or Ask Questions

iBARLE: imBalance-Aware Room Layout Estimation

Aug 29, 2023

Taotao Jing, Lichen Wang, Naji Khosravan, Zhiqiang Wan, Zachary Bessinger, Zhengming Ding, Sing Bing Kang

Figure 1 for iBARLE: imBalance-Aware Room Layout Estimation

Figure 2 for iBARLE: imBalance-Aware Room Layout Estimation

Figure 3 for iBARLE: imBalance-Aware Room Layout Estimation

Figure 4 for iBARLE: imBalance-Aware Room Layout Estimation

Abstract:Room layout estimation predicts layouts from a single panorama. It requires datasets with large-scale and diverse room shapes to train the models. However, there are significant imbalances in real-world datasets including the dimensions of layout complexity, camera locations, and variation in scene appearance. These issues considerably influence the model training performance. In this work, we propose the imBalance-Aware Room Layout Estimation (iBARLE) framework to address these issues. iBARLE consists of (1) Appearance Variation Generation (AVG) module, which promotes visual appearance domain generalization, (2) Complex Structure Mix-up (CSMix) module, which enhances generalizability w.r.t. room structure, and (3) a gradient-based layout objective function, which allows more effective accounting for occlusions in complex layouts. All modules are jointly trained and help each other to achieve the best performance. Experiments and ablation studies based on ZInD~\cite{cruz2021zillow} dataset illustrate that iBARLE has state-of-the-art performance compared with other layout estimation baselines.

Via

Access Paper or Ask Questions

Graph-CoVis: GNN-based Multi-view Panorama Global Pose Estimation

Apr 26, 2023

Negar Nejatishahidin, Will Hutchcroft, Manjunath Narayana, Ivaylo Boyadzhiev, Yuguang Li, Naji Khosravan, Jana Kosecka, Sing Bing Kang

Figure 1 for Graph-CoVis: GNN-based Multi-view Panorama Global Pose Estimation

Figure 2 for Graph-CoVis: GNN-based Multi-view Panorama Global Pose Estimation

Figure 3 for Graph-CoVis: GNN-based Multi-view Panorama Global Pose Estimation

Figure 4 for Graph-CoVis: GNN-based Multi-view Panorama Global Pose Estimation

Abstract:In this paper, we address the problem of wide-baseline camera pose estimation from a group of 360$^\circ$ panoramas under upright-camera assumption. Recent work has demonstrated the merit of deep-learning for end-to-end direct relative pose regression in 360$^\circ$ panorama pairs [11]. To exploit the benefits of multi-view logic in a learning-based framework, we introduce Graph-CoVis, which non-trivially extends CoVisPose [11] from relative two-view to global multi-view spherical camera pose estimation. Graph-CoVis is a novel Graph Neural Network based architecture that jointly learns the co-visible structure and global motion in an end-to-end and fully-supervised approach. Using the ZInD [4] dataset, which features real homes presenting wide-baselines, occlusion, and limited visual overlap, we show that our model performs competitively to state-of-the-art approaches.

Via

Access Paper or Ask Questions

U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Apr 17, 2023

Pooya Fayyazsanavi, Zhiqiang Wan, Will Hutchcroft, Ivaylo Boyadzhiev, Yuguang Li, Jana Kosecka, Sing Bing Kang

Figure 1 for U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Figure 2 for U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Figure 3 for U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Figure 4 for U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Abstract:While the existing deep learning-based room layout estimation techniques demonstrate good overall accuracy, they are less effective for distant floor-wall boundary. To tackle this problem, we propose a novel uncertainty-guided approach for layout boundary estimation introducing new two-stage CNN architecture termed U2RLE. The initial stage predicts both floor-wall boundary and its uncertainty and is followed by the refinement of boundaries with high positional uncertainty using a different, distance-aware loss. Finally, outputs from the two stages are merged to produce the room layout. Experiments using ZInD and Structure3D datasets show that U2RLE improves over current state-of-the-art, being able to handle both near and far walls better. In particular, U2RLE outperforms current state-of-the-art techniques for the most distant walls.

* To be Appear on CVPR 2023

Via

Access Paper or Ask Questions

LASER: LAtent SpacE Rendering for 2D Visual Localization

Apr 01, 2022

Zhixiang Min, Naji Khosravan, Zachary Bessinger, Manjunath Narayana, Sing Bing Kang, Enrique Dunn, Ivaylo Boyadzhiev

Figure 1 for LASER: LAtent SpacE Rendering for 2D Visual Localization

Figure 2 for LASER: LAtent SpacE Rendering for 2D Visual Localization

Figure 3 for LASER: LAtent SpacE Rendering for 2D Visual Localization

Figure 4 for LASER: LAtent SpacE Rendering for 2D Visual Localization

Abstract:We present LASER, an image-based Monte Carlo Localization (MCL) framework for 2D floor maps. LASER introduces the concept of latent space rendering, where 2D pose hypotheses on the floor map are directly rendered into a geometrically-structured latent space by aggregating viewing ray features. Through a tightly coupled rendering codebook scheme, the viewing ray features are dynamically determined at rendering-time based on their geometries (i.e. length, incident-angle), endowing our representation with view-dependent fine-grain variability. Our codebook scheme effectively disentangles feature encoding from rendering, allowing the latent space rendering to run at speeds above 10KHz. Moreover, through metric learning, our geometrically-structured latent space is common to both pose hypotheses and query images with arbitrary field of views. As a result, LASER achieves state-of-the-art performance on large-scale indoor localization datasets (i.e. ZInD and Structured3D) for both panorama and perspective image queries, while significantly outperforming existing learning-based methods in speed.

* CVPR2022-Oral

Via

Access Paper or Ask Questions

PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation

Mar 30, 2022

Haiyan Wang, Will Hutchcroft, Yuguang Li, Zhiqiang Wan, Ivaylo Boyadzhiev, Yingli Tian, Sing Bing Kang

Figure 1 for PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation

Figure 2 for PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation

Figure 3 for PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation

Figure 4 for PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation

Abstract:In this paper, we propose a new deep learning-based method for estimating room layout given a pair of 360 panoramas. Our system, called Position-aware Stereo Merging Network or PSMNet, is an end-to-end joint layout-pose estimator. PSMNet consists of a Stereo Pano Pose (SP2) transformer and a novel Cross-Perspective Projection (CP2) layer. The stereo-view SP2 transformer is used to implicitly infer correspondences between views, and can handle noisy poses. The pose-aware CP2 layer is designed to render features from the adjacent view to the anchor (reference) view, in order to perform view fusion and estimate the visible layout. Our experiments and analysis validate our method, which significantly outperforms the state-of-the-art layout estimators, especially for large and complex room spaces.

* Accepted at CVPR 2022

Via

Access Paper or Ask Questions

DepthTransfer: Depth Extraction from Video Using Non-parametric Sampling

Dec 24, 2019

Kevin Karsch, Ce Liu, Sing Bing Kang

Figure 1 for DepthTransfer: Depth Extraction from Video Using Non-parametric Sampling

Figure 2 for DepthTransfer: Depth Extraction from Video Using Non-parametric Sampling

Figure 3 for DepthTransfer: Depth Extraction from Video Using Non-parametric Sampling

Figure 4 for DepthTransfer: Depth Extraction from Video Using Non-parametric Sampling

Abstract:We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling. We demonstrate our technique in cases where past methods fail (non-translating cameras and dynamic scenes). Our technique is applicable to single images as well as videos. For videos, we use local motion cues to improve the inferred depth maps, while optical flow is used to ensure temporal depth consistency. For training and evaluation, we use a Kinect-based system to collect a large dataset containing stereoscopic videos with known depths. We show that our depth estimation technique outperforms the state-of-the-art on benchmark databases. Our technique can be used to automatically convert a monoscopic video into stereo for 3D visualization, and we demonstrate this through a variety of visually pleasing results for indoor and outdoor scenes, including results from the feature film Charade.

* IEEE Transactions on Pattern Analysis and Machine Intelligence Volume: 36 Issue: 11 pgs 2144-2158 (2014)

Via

Access Paper or Ask Questions

Depth Extraction from Video Using Non-parametric Sampling

Dec 24, 2019

Kevin Karsch, Ce Liu, Sing Bing Kang

* ECCV 2012: Computer Vision ECCV 2012: Lecture Notes in Computer Science, vol 7576 pp 775-788
* arXiv admin note: text overlap with arXiv:2001.00987

Via

Access Paper or Ask Questions

Privacy-Preserving Action Recognition using Coded Aperture Videos

Apr 16, 2019

Zihao W. Wang, Vibhav Vineet, Francesco Pittaluga, Sudipta Sinha, Oliver Cossairt, Sing Bing Kang

Figure 1 for Privacy-Preserving Action Recognition using Coded Aperture Videos

Figure 2 for Privacy-Preserving Action Recognition using Coded Aperture Videos

Figure 3 for Privacy-Preserving Action Recognition using Coded Aperture Videos

Figure 4 for Privacy-Preserving Action Recognition using Coded Aperture Videos

Abstract:The risk of unauthorized remote access of streaming video from networked cameras underlines the need for stronger privacy safeguards. We propose a lens-free coded aperture camera system for human action recognition that is privacy-preserving. While coded aperture systems exist, we believe ours is the first system designed for action recognition without the need for image restoration as an intermediate step. Action recognition is done using a deep network that takes in as input, non-invertible motion features between pairs of frames computed using phase correlation and log-polar transformation. Phase correlation encodes translation while the log polar transformation encodes in-plane rotation and scaling. We show that the translation features are independent of the coded aperture design, as long as its spectral response within the bandwidth has no zeros. Stacking motion features computed on frames at multiple different strides in the video can improve accuracy. Preliminary results on simulated data based on a subset of the UCF and NTU datasets are promising. We also describe our prototype lens-free coded aperture camera system, and results for real captured videos are mixed.

* CVCOPS2019

Via

Access Paper or Ask Questions

Revealing Scenes by Inverting Structure from Motion Reconstructions

Apr 05, 2019

Francesco Pittaluga, Sanjeev J. Koppal, Sing Bing Kang, Sudipta N. Sinha

Figure 1 for Revealing Scenes by Inverting Structure from Motion Reconstructions

Figure 2 for Revealing Scenes by Inverting Structure from Motion Reconstructions

Figure 3 for Revealing Scenes by Inverting Structure from Motion Reconstructions

Figure 4 for Revealing Scenes by Inverting Structure from Motion Reconstructions

Abstract:Many 3D vision systems localize cameras within a scene using 3D point clouds. Such point clouds are often obtained using structure from motion (SfM), after which the images are discarded to preserve privacy. In this paper, we show, for the first time, that such point clouds retain enough information to reveal scene appearance and compromise privacy. We present a privacy attack that reconstructs color images of the scene from the point cloud. Our method is based on a cascaded U-Net that takes as input, a 2D multichannel image of the points rendered from a specific viewpoint containing point depth and optionally color and SIFT descriptors and outputs a color image of the scene from that viewpoint. Unlike previous feature inversion methods, we deal with highly sparse and irregular 2D point distributions and inputs where many point attributes are missing, namely keypoint orientation and scale, the descriptor image source and the 3D point visibility. We evaluate our attack algorithm on public datasets and analyze the significance of the point cloud attributes. Finally, we show that novel views can also be generated thereby enabling compelling virtual tours of the underlying scene.

* 10 pages, 8 figures, to be published in IEEE Conference on Computer Vision and Pattern Recognition 2019

Via

Access Paper or Ask Questions