Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hung-Kuo Chu

CTGAN: Semantic-guided Conditional Texture Generator for 3D Shapes

Feb 08, 2024

Yi-Ting Pan, Chai-Rong Lee, Shu-Ho Fan, Jheng-Wei Su, Jia-Bin Huang, Yung-Yu Chuang, Hung-Kuo Chu

Abstract:The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guided Conditional Texture Generator (CTGAN), producing high-quality textures for 3D shapes that are consistent with the viewing angle while respecting shape semantics. CTGAN utilizes the disentangled nature of StyleGAN to finely manipulate the input latent codes, enabling explicit control over both the style and structure of the generated textures. A coarse-to-fine encoder architecture is introduced to enhance control over the structure of the resulting textures via input segmentation. Experimental results show that CTGAN outperforms existing methods on multiple quality metrics and achieves state-of-the-art performance on texture generation in both conditional and unconditional settings.

Via

Access Paper or Ask Questions

Layout-guided Indoor Panorama Inpainting with Plane-aware Normalization

Jan 13, 2023

Chao-Chen Gao, Cheng-Hsiu Chen, Jheng-Wei Su, Hung-Kuo Chu

Abstract:We present an end-to-end deep learning framework for indoor panoramic image inpainting. Although previous inpainting methods have shown impressive performance on natural perspective images, most fail to handle panoramic images, particularly indoor scenes, which usually contain complex structure and texture content. To achieve better inpainting quality, we propose to exploit both the global and local context of indoor panorama during the inpainting process. Specifically, we take the low-level layout edges estimated from the input panorama as a prior to guide the inpainting model for recovering the global indoor structure. A plane-aware normalization module is employed to embed plane-wise style features derived from the layout into the generator, encouraging local texture restoration from adjacent room structures (i.e., ceiling, floor, and walls). Experimental results show that our work outperforms the current state-of-the-art methods on a public panoramic dataset in both qualitative and quantitative evaluations. Our code is available at https://ericsujw.github.io/LGPN-net/

* Accepted by ACCV 2022

Via

Access Paper or Ask Questions

Sampling Neural Radiance Fields for Refractive Objects

Nov 27, 2022

Jen-I Pan, Jheng-Wei Su, Kai-Wen Hsiao, Ting-Yu Yen, Hung-Kuo Chu

$Figure 1 for Sampling Neural Radiance Fields for Refractive Objects$

$Figure 2 for Sampling Neural Radiance Fields for Refractive Objects$

$Figure 3 for Sampling Neural Radiance Fields for Refractive Objects$

$Figure 4 for Sampling Neural Radiance Fields for Refractive Objects$

Abstract:Recently, differentiable volume rendering in neural radiance fields (NeRF) has gained a lot of popularity, and its variants have attained many impressive results. However, existing methods usually assume the scene is a homogeneous volume so that a ray is cast along the straight path. In this work, the scene is instead a heterogeneous volume with a piecewise-constant refractive index, where the path will be curved if it intersects the different refractive indices. For novel view synthesis of refractive objects, our NeRF-based framework aims to optimize the radiance fields of bounded volume and boundary from multi-view posed images with refractive object silhouettes. To tackle this challenging problem, the refractive index of a scene is reconstructed from silhouettes. Given the refractive index, we extend the stratified and hierarchical sampling techniques in NeRF to allow drawing samples along a curved path tracked by the Eikonal equation. The results indicate that our framework outperforms the state-of-the-art method both quantitatively and qualitatively, demonstrating better performance on the perceptual similarity metric and an apparent improvement in the rendering quality on several synthetic and real scenes.

* SIGGRAPH Asia 2022 Technical Communications. 4 pages, 4 figures, 1 table. Project: https://alexkeroro86.github.io/SampleNeRFRO/ Code: https://github.com/alexkeroro86/SampleNeRFRO

Via

Access Paper or Ask Questions

GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network

Oct 20, 2022

Jheng-Wei Su, Chi-Han Peng, Peter Wonka, Hung-Kuo Chu

Figure 1 for GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network

Figure 2 for GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network

Figure 3 for GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network

Figure 4 for GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network

Abstract:Reconstructing 3D layouts from multiple $360^{\circ}$ panoramas has received increasing attention recently as estimating a complete layout of a large-scale and complex room from a single panorama is very difficult. The state-of-the-art method, called PSMNet, introduces the first learning-based framework that jointly estimates the room layout and registration given a pair of panoramas. However, PSMNet relies on an approximate (i.e., "noisy") registration as input. Obtaining this input requires a solution for wide baseline registration which is a challenging problem. In this work, we present a complete multi-view panoramic layout estimation framework that jointly learns panorama registration and layout estimation given a pair of panoramas without relying on a pose prior. The major improvement over PSMNet comes from a novel Geometry-aware Panorama Registration Network or GPR-Net that effectively tackles the wide baseline registration problem by exploiting the layout geometry and computing fine-grained correspondences on the layout boundaries, instead of the global pixel-space. Our architecture consists of two parts. First, given two panoramas, we adopt a vision transformer to learn a set of 1D horizon features sampled on the panorama. These 1D horizon features encode the depths of individual layout boundary samples and the correspondence and covisibility maps between layout boundaries. We then exploit a non-linear registration module to convert these 1D horizon features into a set of corresponding 2D boundary points on the layout. Finally, we estimate the final relative camera pose via RANSAC and obtain the complete layout simply by taking the union of registered layouts. Experimental results indicate that our method achieves state-of-the-art performance in both panorama registration and layout estimation on a large-scale indoor panorama dataset ZInD.

Via

Access Paper or Ask Questions

Instance-aware Image Colorization

May 21, 2020

Jheng-Wei Su, Hung-Kuo Chu, Jia-Bin Huang

Figure 1 for Instance-aware Image Colorization

Figure 2 for Instance-aware Image Colorization

Figure 3 for Instance-aware Image Colorization

Figure 4 for Instance-aware Image Colorization

Abstract:Image colorization is inherently an ill-posed problem with multi-modal uncertainty. Previous methods leverage the deep neural network to map input grayscale images to plausible color outputs directly. Although these learning-based methods have shown impressive performance, they usually fail on the input images that contain multiple objects. The leading cause is that existing models perform learning and colorization on the entire image. In the absence of a clear figure-ground separation, these models cannot effectively locate and learn meaningful object-level semantics. In this paper, we propose a method for achieving instance-aware colorization. Our network architecture leverages an off-the-shelf object detector to obtain cropped object images and uses an instance colorization network to extract object-level features. We use a similar network to extract the full-image features and apply a fusion module to full object-level and image-level features to predict the final colors. Both colorization networks and fusion modules are learned from a large-scale dataset. Experimental results show that our work outperforms existing methods on different quality metrics and achieves state-of-the-art performance on image colorization.

* CVPR 2020. Project: https://ericsujw.github.io/InstColorization/ Code: https://github.com/ericsujw/InstColorization

Via

Access Paper or Ask Questions

Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video

May 20, 2020

Peng Wang, Lingjie Liu, Nenglun Chen, Hung-Kuo Chu, Christian Theobalt, Wenping Wang

Figure 1 for Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video

Figure 2 for Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video

Figure 3 for Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video

Figure 4 for Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video

Abstract:Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera. Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on. Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures. Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods.

* Accepted by SIGGRAPH 2020

Via

Access Paper or Ask Questions

3D Manhattan Room Layout Reconstruction from a Single 360 Image

Oct 24, 2019

Chuhang Zou, Jheng-Wei Su, Chi-Han Peng, Alex Colburn, Qi Shan, Peter Wonka, Hung-Kuo Chu, Derek Hoiem

Figure 1 for 3D Manhattan Room Layout Reconstruction from a Single 360 Image

Figure 2 for 3D Manhattan Room Layout Reconstruction from a Single 360 Image

Figure 3 for 3D Manhattan Room Layout Reconstruction from a Single 360 Image

Figure 4 for 3D Manhattan Room Layout Reconstruction from a Single 360 Image

Abstract:Recent approaches for predicting layouts from 360 panoramas produce excellent results. These approaches build on a common framework consisting of three steps: a pre-processing step based on edge-based alignment, prediction of layout elements, and a post-processing step by fitting a 3D layout to the layout elements. Until now, it has been difficult to compare the methods due to multiple different design decisions, such as the encoding network (e.g. SegNet or ResNet), type of elements predicted (e.g. corners, wall/floor boundaries, or semantic segmentation), or method of fitting the 3D layout. To address this challenge, we summarize and describe the common framework, the variants, and the impact of the design decisions. For a complete evaluation, we also propose extended annotations for the Matterport3D dataset, and introduce two depth-based evaluation metrics.

Via

Access Paper or Ask Questions

DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama

Nov 29, 2018

Shang-Ta Yang, Fu-En Wang, Chi-Han Peng, Peter Wonka, Min Sun, Hung-Kuo Chu

Figure 1 for DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama

Figure 2 for DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama

Figure 3 for DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama

Figure 4 for DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama

Abstract:We present a deep learning framework, called DuLa-Net, to predict Manhattan-world 3D room layouts from a single RGB panorama. To achieve better prediction accuracy, our method leverages two projections of the panorama at once, namely the equirectangular panorama-view and the perspective ceiling-view, that each contains different clues about the room layouts. Our network architecture consists of two encoder-decoder branches for analyzing each of the two views. In addition, a novel feature fusion structure is proposed to connect the two branches, which are then jointly trained to predict the 2D floor plans and layout heights. To learn more complex room layouts, we introduce the Realtor360 dataset that contains panoramas of Manhattan-world room layouts with different numbers of corners. Experimental results show that our work outperforms recent state-of-the-art in prediction accuracy and performance, especially in the rooms with non-cuboid layouts.

Via

Access Paper or Ask Questions

Self-Supervised Learning of Depth and Camera Motion from 360° Videos

Nov 13, 2018

Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, Min Sun

Figure 1 for Self-Supervised Learning of Depth and Camera Motion from 360° Videos

Figure 2 for Self-Supervised Learning of Depth and Camera Motion from 360° Videos

Figure 3 for Self-Supervised Learning of Depth and Camera Motion from 360° Videos

Figure 4 for Self-Supervised Learning of Depth and Camera Motion from 360° Videos

Abstract:As 360{\deg} cameras become prevalent in many autonomous systems (e.g., self-driving cars and drones), efficient 360{\deg} perception becomes more and more important. We propose a novel self-supervised learning approach for predicting the omnidirectional depth and camera motion from a 360{\deg} video. In particular, starting from the SfMLearner, which is designed for cameras with normal field-of-view, we introduce three key features to process 360{\deg} images efficiently. Firstly, we convert each image from equirectangular projection to cubic projection in order to avoid image distortion. In each network layer, we use Cube Padding (CP), which pads intermediate features from adjacent faces, to avoid image boundaries. Secondly, we propose a novel "spherical" photometric consistency constraint on the whole viewing sphere. In this way, no pixel will be projected outside the image boundary which typically happens in images with normal field-of-view. Finally, rather than naively estimating six independent camera motions (i.e., naively applying SfM-Learner to each face on a cube), we propose a novel camera pose consistency loss to ensure the estimated camera motions reaching consensus. To train and evaluate our approach, we collect a new PanoSUNCG dataset containing a large amount of 360{\deg} videos with groundtruth depth and camera motion. Our approach achieves state-of-the-art depth prediction and camera motion estimation on PanoSUNCG with faster inference speed comparing to equirectangular. In real-world indoor videos, our approach can also achieve qualitatively reasonable depth prediction by acquiring model pre-trained on PanoSUNCG.

* ACCV 2018 Oral

Via

Access Paper or Ask Questions

SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images

Mar 23, 2014

Yu-Shiang Wong, Hung-Kuo Chu, Niloy J. Mitra

Figure 1 for SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images

Figure 2 for SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images

Figure 3 for SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images

Figure 4 for SmartAnnotator: An Interactive Tool for Annotating RGBD Indoor Images

Abstract:RGBD images with high quality annotations in the form of geometric (i.e., segmentation) and structural (i.e., how do the segments are mutually related in 3D) information provide valuable priors to a large number of scene and image manipulation applications. While it is now simple to acquire RGBD images, annotating them, automatically or manually, remains challenging especially in cluttered noisy environments. We present SmartAnnotator, an interactive system to facilitate annotating RGBD images. The system performs the tedious tasks of grouping pixels, creating potential abstracted cuboids, inferring object interactions in 3D, and comes up with various hypotheses. The user simply has to flip through a list of suggestions for segment labels, finalize a selection, and the system updates the remaining hypotheses. As objects are finalized, the process speeds up with fewer ambiguities to resolve. Further, as more scenes are annotated, the system makes better suggestions based on structural and geometric priors learns from the previous annotation sessions. We test our system on a large number of database scenes and report significant improvements over naive low-level annotation tools.

* 10 pages

Via

Access Paper or Ask Questions