Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dong Huo

IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement

Mar 06, 2025

Zhihao Shi, Dong Huo, Yuhongze Zhou, Kejia Yin, Yan Min, Juwei Lu, Xinxin Zuo

Abstract:Current 3D inpainting and object removal methods are largely limited to front-facing scenes, facing substantial challenges when applied to diverse, "unconstrained" scenes where the camera orientation and trajectory are unrestricted. To bridge this gap, we introduce a novel approach that produces inpainted 3D scenes with consistent visual quality and coherent underlying geometry across both front-facing and unconstrained scenes. Specifically, we propose a robust 3D inpainting pipeline that incorporates geometric priors and a multi-view refinement network trained via test-time adaptation, building on a pre-trained image inpainting model. Additionally, we develop a novel inpainting mask detection technique to derive targeted inpainting masks from object masks, boosting the performance in handling unconstrained scenes. To validate the efficacy of our approach, we create a challenging and diverse benchmark that spans a wide range of scenes. Comprehensive experiments demonstrate that our proposed method substantially outperforms existing state-of-the-art approaches.

* Accepted at CVPR 2025, \href{https://xinxinzuo2353.github.io/imfine/}{Project Page}

Via

Access Paper or Ask Questions

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Aug 02, 2024

Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang

Abstract:Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion model. For view consistent sampling, first of all we maintain a texture map in RGB space that is parameterized by the denoising step and updated after each sampling step of the diffusion model to progressively reduce the view discrepancy. An attention-guided multi-view sampling strategy is exploited to broadcast the appearance information across views. To preserve texture details, we develop a noise resampling technique that aids in the estimation of noise, generating inputs for subsequent denoising steps, as directed by the text prompt and current texture map. Through an extensive amount of qualitative and quantitative evaluations, we demonstrate that our proposed method produces significantly better texture quality for diverse 3D objects with a high degree of view consistency and rich appearance details, outperforming current state-of-the-art methods. Furthermore, our proposed texture generation technique can also be applied to texture editing while preserving the original identity. More experimental results are available at https://dong-huo.github.io/TexGen/

* European Conference on Computer Vision (ECCV) 2024

Via

Access Paper or Ask Questions

Learning to Recover Spectral Reflectance from RGB Images

Apr 04, 2023

Dong Huo, Jian Wang, Yiming Qian, Yee-Hong Yang

Abstract:This paper tackles spectral reflectance recovery (SRR) from RGB images. Since capturing ground-truth spectral reflectance and camera spectral sensitivity are challenging and costly, most existing approaches are trained on synthetic images and utilize the same parameters for all unseen testing images, which are suboptimal especially when the trained models are tested on real images because they never exploit the internal information of the testing images. To address this issue, we adopt a self-supervised meta-auxiliary learning (MAXL) strategy that fine-tunes the well-trained network parameters with each testing image to combine external with internal information. To the best of our knowledge, this is the first work that successfully adapts the MAXL strategy to this problem. Instead of relying on naive end-to-end training, we also propose a novel architecture that integrates the physical relationship between the spectral reflectance and the corresponding RGB images into the network based on our mathematical analysis. Besides, since the spectral reflectance of a scene is independent to its illumination while the corresponding RGB images are not, we recover the spectral reflectance of a scene from its RGB images captured under multiple illuminations to further reduce the unknown. Qualitative and quantitative evaluations demonstrate the effectiveness of our proposed network and of the MAXL. Our code and data are available at https://github.com/Dong-Huo/SRR-MAXL.

Via

Access Paper or Ask Questions

Structured Light with Redundancy Codes

Jun 18, 2022

Zhanghao Sun, Yu Zhang, Yicheng Wu, Dong Huo, Yiming Qian, Jian Wang

Figure 1 for Structured Light with Redundancy Codes

Figure 2 for Structured Light with Redundancy Codes

Figure 3 for Structured Light with Redundancy Codes

Figure 4 for Structured Light with Redundancy Codes

Abstract:Structured light (SL) systems acquire high-fidelity 3D geometry with active illumination projection. Conventional systems exhibit challenges when working in environments with strong ambient illumination, global illumination and cross-device interference. This paper proposes a general-purposed technique to improve the robustness of SL by projecting redundant optical signals in addition to the native SL patterns. In this way, projected signals become more distinguishable from errors. Thus the geometry information can be more easily recovered using simple signal processing and the ``coding gain" in performance is obtained. We propose three applications using our redundancy codes: (1) Self error-correction for SL imaging under strong ambient light, (2) Error detection for adaptive reconstruction under global illumination, and (3) Interference filtering with device-specific projection sequence encoding, especially for event camera-based SL and light curtain devices. We systematically analyze the design rules and signal processing algorithms in these applications. Corresponding hardware prototypes are built for evaluations on real-world complex scenes. Experimental results on the synthetic and real data demonstrate the significant performance improvements in SL systems with our redundancy codes.

Via

Access Paper or Ask Questions

Glass Segmentation with RGB-Thermal Image Pairs

Apr 13, 2022

Dong Huo, Jian Wang, Yiming Qian, Yee-Hong Yang

Figure 1 for Glass Segmentation with RGB-Thermal Image Pairs

Figure 2 for Glass Segmentation with RGB-Thermal Image Pairs

Figure 3 for Glass Segmentation with RGB-Thermal Image Pairs

Figure 4 for Glass Segmentation with RGB-Thermal Image Pairs

Abstract:This paper proposes a new glass segmentation method utilizing paired RGB and thermal images. Due to the large difference between the transmission property of visible light and that of the thermal energy through the glass where most glass is transparent to the visible light but opaque to thermal energy, glass regions of a scene are made more distinguishable with a pair of RGB and thermal images than solely with an RGB image. To exploit such a unique property, we propose a neural network architecture that effectively combines an RGB-thermal image pair with a new multi-modal fusion module based on attention, and integrate CNN and transformer to extract local features and long-range dependencies, respectively. As well, we have collected a new dataset containing 5551 RGB-thermal image pairs with ground-truth segmentation annotations. The qualitative and quantitative evaluations demonstrate the effectiveness of the proposed approach on fusing RGB and thermal data for glass segmentation. Our code and data are available at https://github.com/Dong-Huo/RGB-T-Glass-Segmentation.

Via

Access Paper or Ask Questions

Blind Image Deconvolution Using Variational Deep Image Prior

Feb 01, 2022

Dong Huo, Abbas Masoumzadeh, Rafsanjany Kushol, Yee-Hong Yang

Figure 1 for Blind Image Deconvolution Using Variational Deep Image Prior

Figure 2 for Blind Image Deconvolution Using Variational Deep Image Prior

Figure 3 for Blind Image Deconvolution Using Variational Deep Image Prior

Figure 4 for Blind Image Deconvolution Using Variational Deep Image Prior

Abstract:Conventional deconvolution methods utilize hand-crafted image priors to constrain the optimization. While deep-learning-based methods have simplified the optimization by end-to-end training, they fail to generalize well to blurs unseen in the training dataset. Thus, training image-specific models is important for higher generalization. Deep image prior (DIP) provides an approach to optimize the weights of a randomly initialized network with a single degraded image by maximum a posteriori (MAP), which shows that the architecture of a network can serve as the hand-crafted image prior. Different from the conventional hand-crafted image priors that are statistically obtained, it is hard to find a proper network architecture because the relationship between images and their corresponding network architectures is unclear. As a result, the network architecture cannot provide enough constraint for the latent sharp image. This paper proposes a new variational deep image prior (VDIP) for blind image deconvolution, which exploits additive hand-crafted image priors on latent sharp images and approximates a distribution for each pixel to avoid suboptimal solutions. Our mathematical analysis shows that the proposed method can better constrain the optimization. The experimental results further demonstrate that the generated images have better quality than that of the original DIP on benchmark datasets. The source code of our VDIP is available at https://github.com/Dong-Huo/VDIP-Deconvolution.

Via

Access Paper or Ask Questions

Blind Non-Uniform Motion Deblurring using Atrous Spatial Pyramid Deformable Convolution and Deblurring-Reblurring Consistency

Jun 27, 2021

Dong Huo, Abbas Masoumzadeh, Yee-Hong Yang

Figure 1 for Blind Non-Uniform Motion Deblurring using Atrous Spatial Pyramid Deformable Convolution and Deblurring-Reblurring Consistency

Figure 2 for Blind Non-Uniform Motion Deblurring using Atrous Spatial Pyramid Deformable Convolution and Deblurring-Reblurring Consistency

Figure 3 for Blind Non-Uniform Motion Deblurring using Atrous Spatial Pyramid Deformable Convolution and Deblurring-Reblurring Consistency

Figure 4 for Blind Non-Uniform Motion Deblurring using Atrous Spatial Pyramid Deformable Convolution and Deblurring-Reblurring Consistency

Abstract:Many deep learning based methods are designed to remove non-uniform (spatially variant) motion blur caused by object motion and camera shake without knowing the blur kernel. Some methods directly output the latent sharp image in one stage, while others utilize a multi-stage strategy (\eg multi-scale, multi-patch, or multi-temporal) to gradually restore the sharp image. However, these methods have the following two main issues: 1) The computational cost of multi-stage is high; 2) The same convolution kernel is applied in different regions, which is not an ideal choice for non-uniform blur. Hence, non-uniform motion deblurring is still a challenging and open problem. In this paper, we propose a new architecture which consists of multiple Atrous Spatial Pyramid Deformable Convolution (ASPDC) modules to deblur an image end-to-end with more flexibility. Multiple ASPDC modules implicitly learn the pixel-specific motion with different dilation rates in the same layer to handle movements of different magnitude. To improve the training, we also propose a reblurring network to map the deblurred output back to the blurred input, which constrains the solution space. Our experimental results show that the proposed method outperforms state-of-the-art methods on the benchmark datasets.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Blind Image Super-Resolution with Spatial Context Hallucination

Sep 25, 2020

Dong Huo, Yee-Hong Yang

Figure 1 for Blind Image Super-Resolution with Spatial Context Hallucination

Figure 2 for Blind Image Super-Resolution with Spatial Context Hallucination

Figure 3 for Blind Image Super-Resolution with Spatial Context Hallucination

Figure 4 for Blind Image Super-Resolution with Spatial Context Hallucination

Abstract:Deep convolution neural networks (CNNs) play a critical role in single image super-resolution (SISR) since the amazing improvement of high performance computing. However, most of the super-resolution (SR) methods only focus on recovering bicubic degradation. Reconstructing high-resolution (HR) images from randomly blurred and noisy low-resolution (LR) images is still a challenging problem. In this paper, we propose a novel Spatial Context Hallucination Network (SCHN) for blind super-resolution without knowing the degradation kernel. We find that when the blur kernel is unknown, separate deblurring and super-resolution could limit the performance because of the accumulation of error. Thus, we integrate denoising, deblurring and super-resolution within one framework to avoid such a problem. We train our model on two high quality datasets, DIV2K and Flickr2K. Our method performs better than state-of-the-art methods when input images are corrupted with random blur and noise.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions